User:Krexer/sandbox

OLD Version (FULL set of references)

 * 1) 2011 Survey:  52-item survey; 1319 participants from over 60 countries. Citations include


 * 1) 2010 Survey:  50-item survey; 735 participants from 60 countries.  Citations include


 * 1) 2009 Survey:  40-item survey; 710 participants from 58 countries. Citations include


 * 1) 2008 Survey:  34-item survey; 348 participants from 44 countries. Citations include


 * 1) 2007 Survey:  27-item survey; 314 participants from 35 countries.  Citations include

NEW Version (REDUCED set of references)

 * 1) 2011 Survey:  52-item survey; 1319 participants from over 60 countries. Citations include


 * 1) 2010 Survey:  50-item survey; 735 participants from 60 countries.  Citations include


 * 1) 2009 Survey:  40-item survey; 710 participants from 58 countries. Citations include


 * 1) 2008 Survey:  34-item survey; 348 participants from 44 countries. Citations include


 * 1) 2007 Survey:  27-item survey; 314 participants from 35 countries.

Recent survey results (previously posted to the Data Miner Survey page, but now removed)
Results from the most recent survey were unveiled at the October 2011 Predictive Analytics World (PAW) conference held in New York City. Survey participants included data miners working in corporations, consulting firms, tool vendors, academia, and government and non-government organizations.


 * Fields & Goals: Data miners work in a diverse set of fields.  CRM/Marketing has been the #1 field in each of the past five years.  Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.  This is consistent with independent polls of data miners conducted by KDnuggets over the years.
 * Algorithms: Each year, decision trees, regression, and cluster analysis form the consistent triad of core algorithms for most data miners.  However, a wide variety of algorithms are being used.   This is consistent with independent polls of data miners conducted by KDnuggets over the years.
 * Text Mining: A third of data miners currently use text mining and another third plan to in the future.  Text mining is most often used to analyze customer surveys and blogs/social media.
 * Data Mining Tools: Data miners report using an average of 4 software tool to conduct their analyses.  Over the survey years, R has risen in popularity.  In 2010 it overtook SPSS Statistics and SAS to become the tool used by the most data miners.  And the 2011 survey showed that R is now being used by close to half of all data miners (47%).  STATISTICA has also grown in popularity.  From 2007-2009 more data miners indicated that SPSS Clementine (now IBM SPSS Modeler) was their primary data mining tool than any other tool.  However, in 2010 and 2011, STATISTICA was cited most frequently as data miners' primary tool.  In terms of satisfaction with their tools, in the past few years, STATISTICA, IBM SPSS Modeler, R, KNIME, RapidMiner and Salford Systems have received the strongest satisfaction ratings from data miners in these surveys.  The growing popularity of R is consistent with independent polls of data miners conducted by KDnuggets, but the KDnuggets polls show a different picture regarding the popularity of commercial data mining software.   Robert Muenchen has taken a multi-faceted approach to assessing the popularity of data analysis software - an approach that includes blog post counts, Google Scholar data, listserv subscribers, use in competitions, book publications, Google PageRank, and more. His analyses are consistent with the Rexer Analytics Surveys and KDnuggets in outlining the growth of R, but Muenchen illustrates that the popularity of software is more nuanced and one's conclusions will be different depending on what measure of popularity is used.  The Rexer Analytics survey summary reports include analyses of the data miners' satisfaction with 20 dimensions of their software.  Haughton et al. and Nisbet have also produced reviews of data mining software.
 * Analytic Capabilities & Success: Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, data miners at companies with better analytic capabilities report that their companies are outperforming their peers. Participants in the 2011 survey shared best practices for measuring analytic success.
 * Challenges: Consistently across the years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face.  Participants in the 2010 survey shared best practices for overcoming these challenges.
 * Future Trends: Data miners are optimistic about continued growth in data mining adoption and the positive impact data mining will have.