User talk:Krexer/2012 Archive

Nov-2012: Rexer's Annual Data Miner Survey
Welcome to Wikipedia. Everyone is welcome to contribute constructively to the encyclopedia. However, please do not add promotional material to articles or other Wikipedia pages. Advertising and using Wikipedia as a "soapbox" are against Wikipedia policy and not permitted. Take a look at the welcome page to learn more about Wikipedia. Thank you.  MrOllie (talk) 16:37, 3 November 2012 (UTC)


 * MrOllie -- Thank you for your comment. I am an infrequent Wikipedia contributor, but I am hoping to contribute more in the coming months.  I found the Rexer's Annual Data Miner Survey wikipedia entry.  Yes, I am president of Rexer Analytics, but please note that I did not create this wikipedia entry.  It looks like someone else created it about 18 months ago.  I thought it would be a good idea to update the page with more recent information.  I also thought it would be good to add links to a number of other wikipedia pages and to add several references.  So I made these additions.


 * My goal in adding this information to this wikipedia entry and to other wikipedia entries is to help expand people's understanding of data mining and to provide useful information to the data mining community. This is the same goal we have had since 2007 in conducting the Data Miner Surveys.  We do the surveys as a way of giving back to the academic and data mining community.  We do not sell the survey reports (e.g., the 2011 report is a 37 page PDF) - we make them freely available to everyone who requests them (several thousand people requested the 2011 report).


 * I feel that the annual data miner surveys provide a useful overview of the tool & algorithm preferences, challenges, concerns, and views of data mining professionals. As such, I think it is appropriate to have a wikipedia page that helps to share this information.


 * I seek your guidance as to what type of information can be added to this wikipedia entry to improve it and make it a suitable wikipedia entry.


 * I also noticed that you have removed references that I added to other wikipedia entries: e.g., R (programming language), where I added a sentence that indicated that in addition to R's use among statisticians, R is used by many data miners (and I provided references to several KDnuggets polls, the Rexer surveys, and also Robert Muenchen's analyses of PageRank, competition usage, and many other factors).  This seems to me to be a useful addition to the R (programming language) entry.  I will go to the Talk page for the R page, and express my POV there.  But, to help me understand the standards being used, can you please explain in more detail why you removed my additions to the R page.  Please note that in the R page's "Statistical features" section there previously was a reference to the 2010 Rexer Survey showing that 43% of data miners used R.  I removed this sentence earlier today when I was trying to improve this page.  My reasoning was that the sentence did not describe a statistical feature, and therefore did not belong in that section of the R wikipedia page.


 * --Karl (talk) 18:37, 3 November 2012 (UTC)


 * To improve/keep the page, we need independently written sources about the Rexer survey. You can read more about the relevant guideline here. Right now all the sources cited are either written by you, or are citations that aren't actually about the survey - because they are citing other things, that is OK. But we do need some independent secondary sources about the survey. - MrOllie (talk) 19:40, 3 November 2012 (UTC)


 * Because you have disclosed that you are affiliated with the subject and therefore have a conflict of interest, you may also benefit from reading Wikipedia's plain and simple conflict of interest guide. --Drm310 (talk) 06:06, 4 November 2012 (UTC)


 * Mr. Ollie & Drm310 -- Thank you for your guidance. Yes, I am an inexperienced wikipedia contributor, and I'm trying to start with the topic that I know best:  Data Mining.  I want to be totally transparent about who I am:  I am the president of Rexer Analytics (we are a very small consulting firm - we have 3 full-time staff).  I have reviewed the conflict of interest and the plain and simple conflict of interest guide.  I appreciate you working with me to help modify the Rexer's Annual Data Miner Survey wikipedia entry.  As I tried to state in my previous TALK entry, I am not trying to promote or sell anything.  Please help me to modify anything on the page that appears that way.  I did not create the original wikipedia entry on this topic.  But I (obviously) know a lot about the surveys -- we are the people who conduct this survey research each year.  My motivation is to provide information that helps inform wikipedia readers.


 * When I edited Rexer's Annual Data Miner Survey a few days ago, I did not add citations to independently written sources about the Rexer Data Miner survey. I initially did not add citations to independent sources solely because I had wanted to keep the entry short, and I did not want to appear like I was trying to draw attention to myself or my company.  However, I now see the wisdom of Mr. Ollie's point:  I used to be a professor, and as a former academic, I can see that one really does need to cite external sources.  So, I'm now adding a bunch of independently written references to Rexer's Annual Data Miner Survey.  But I again seek your guidance.  I added a bunch of references, but to me the number of references really seems excessive -- I think the wikipedia entry would be better with a somewhat shorter reference list.  Can you please help pare the list down.  Another idea I have is that if you feel it would be better, I can ask someone in my academic network (whom I am NOT doing business with) to edit Rexer's Annual Data Miner Survey.


 * Thank you for your patience with me as a new Wikipedia author. We don't sell our Data Miner survey reports - we give them away free to everyone who asks for them.  I also spend substantial time mentoring young data miners, and corresponding with data miners.  My goal is to give back, and provide good information.  That is my goal in editing the Rexer's Annual Data Miner Survey page.  More importantly, if I can ell to establish the Rexer's Annual Data Miner Survey page as a page in good standing you you more experienced wikipedians, I can then proceed to go add value to a variety of data mining algorithm pages.  E.g., many data mining college students ask me what are the core data mining algorithms that most data miners use.  This is exactly the type of question that our data miner survey has been answering since 2007 (in the 2011 survey over 1300 data miners participated), and the answer as been clear and unchanging from 2007-2011: regression, decision trees and cluster analysis are the three core algorithms that most data miners rely on.  This can be very useful info for inexperienced data miners to hear.  I would like to be able to add this useful information to the Regression analysis, Cluster analysis, and Decision tree learning pages.  I tried to add this info to those pages, and I understand why Mr. Ollie removed my edits to those pages.  Yes, first I need to establish the validity of the Rexer's Annual Data Miner Survey page with external citations to show that it is being taken seriously by both the academic and business worlds.  Once it is complete, it is my hope that my other 1-sentence edits to those other pages can be reinstated.


 * Thank you.Karl (talk) 07:12, 5 November 2012 (UTC)


 * minor edit - I'm reading a lot, and learning wikipedia formatting and conventions, so I added indenting to the Nov-2012 discussion to make it easier to read.Karl (talk) 22:14, 5 November 2012 (UTC) Some parts of this discussion are now also at Talk:Rexer's Annual Data Miner Survey.Karl (talk) 21:07, 8 November 2012 (UTC)


 * minor edit: learning more, so I created an archive for old 2009 Talk items.Karl (talk) 21:07, 8 November 2012 (UTC)

Nov-2012: R (programming language)
Hi, Karl. I noticed your edits on the R page. If you have questions you might want to drop by the teahouse. That's a forum designed for new (and not quite so new) editors. I've found it really helpful there, esp. for basic questions. One other point: the more of an expert you are, the more difficult it can be to become an effective editor in your area of expertise. I found it much easier to start with a reliable source in hand (usually some kind of single-volume encyclopdia) and contribute to topics where I knew relatively little, even if that just meant copyediting or adding a citation. If someone reverted my efforts then I didn't mind too much, and there were no issues with COI or unreliable sources. When you start getting the hang of things then you can go back and try to integrate your own expertise into articles.

Anyway, I hope you decide to become a regular contributor. The few mistakes you're making have been done in good faith and you're ahead of the curve in being willing to discuss them on the talk pages. Stick with it a bit longer and I think you'll be fine. If you have any questions that you don't think are appropriate for the teahouse, don't be shy about posting them on my talk page. I don't have all the answers, but I'm gradually compiling a list of people who do who I can ask.

Best,

Garamond Lethe 08:20, 10 November 2012 (UTC)


 * Thanks Garamond. I appreciate your welcoming attitude.  I've been doing little things.  This is working out better.  And I'm using (perhaps OVER-using) the talk pages, to explain my logic if any small change might be questioned by anyone.  Some of my edits get changed, but overall, I see the pages evolving, and that's satisfying.  E.g., the first paragraph of the SPSS and SPSS Modeler pages.  I'm mostly doing edits on pages that are in my area of expertise (data mining) and interest, but that are not controversial or are on the periphery -- all things that I have no direct relationships with:  e.g., R page, other software tools, SIGKDD organization, etc.  Plus fixing links in other data mining and statistics pages (e.g., algorithms).  After I get more experience, maybe I'll create a new page or join a wikiproject.  I'll see how things evolve.  Thanks again.  Karl (talk) 00:30, 13 November 2012 (UTC)

Nov-2012: Data mining
Hi, I saw your web link additions there. I figure when the list of conferences was created (actually it is a copy from List of computer science conferences, it was intentional that these are not web links, but point to "red link" Wikipedia pages. All these conferences are notable, and thus deserve their own Wikipedia page, or at least a major paragraph in an appropriate other page (e.g. SIGKDD, which covers both the interest group and the conference). VLDB is an example for a page about only a conference. Maybe instead of adding web links, you could start articles about these conferences - a stub will do. But it should include links to the body (e.g. SIAM) and maybe the latest and the upcoming conference web pages. The link you added for SDM for example is actually obsolete already: is the upcoming SDM13. There also is an activity group in SIAM by now:  SIAG/DMA. The conference proceedings are online, but often a bit tricky to find, so it might be good to have links to these as well. You see, it makes a lot of sense to have dedicated articles for the conferences. I'd appreciate if you could contribute by starting these articles. They don't need to be exhaustive, just a start. But it helps more than just adding web links to 4 out of 14 conferences. Thank you. --Chire (talk) 07:23, 13 November 2012 (UTC)


 * Great idea. I'm a novice wikipedia author, so I've been hesitant to add new pages.  I was simply trying to clean up links on this page that didn't point anywhere.  I'll be a bit slow at doing this, but I will review a couple other organization and conference pages, use them as templates, and try to make a few of the new articles (pages) that you suggest.  do you think the SIGKDD page is a good page to use as a template, or are you aware of another SIG, or organization, or conference page that would be a better template?  Thank you.  Since you are both knowledgeable about these organizations and a more experienced wikipedia editor, can I ask for your perspective about the edits I tried making to the SIGKDD page.  See Talk:SIGKDD for my discussion.  I tried adding a list of the old KDD conferences, but my edits were quickly undone.  The undoing was when in the same timeframe when I had made some COI edits to other pages before I understood the COI guidelines.  But I think the other editor was over-zealous about removing all my edits everywhere, even when I was simply making grammatical edits or adding wiki-links.  If you think the list of KDD conferences is OK, can you please look at the history of the page, and re-instate the list?  Thanks.  Also, if you have time, can you help me to better understand when external links are OK.  My list of old KDD conferences contained external links to the old conference websites, because I thought that would be helpful.  However, please let me know if I shouldn't put links on those, or if I should put the links somewhere else (e.g., at the bottom of the page in a separate external link section; but this seems repetitive and cumbersome).  Thanks again for your help and coaching.  Karl (talk) 15:41, 13 November 2012 (UTC)


 * VLDB might be a more canonical example. I havn't looked much at these wikipedia pages, so I don't know much about them. Maybe just click through some of them at List of computer science conferences that aren't red to get an impression of what a good wikipedia page about a conference covers. From what I've seen it is common to have a table listing the previous conferences with date, location and link, for example. Related bodies and topics should also be linked. Steeting committee and high profile persons are good crosslinks, too. Not every red link is bad. Gregory Piatetsky-Shapiro and Usama Fayyad are noteable for Wikipedia, and at some point, someone might write a page about them. --Chire (talk) 15:36, 14 November 2012 (UTC)


 * WRT coaching. Sorry, I'm incredibly busy these days. And I don't know that much about these organizations. I have not yet been at an ACM KDD conference, for example. I know probably not even half as much as you imagine me to know. A lot of the time, I just use the search function, for example: Redlinks within reason, Bluelinks within context. If you use wikipedia search using e.g. "WP:redlink", these pages are usually quite easy to find. So when I'm uncertain about how to do things, I usually just try searching if maybe anyone has documented that yet. An in Wikipedia, a major part of the communication happens via Wikipedia pages. --Chire (talk) 16:00, 14 November 2012 (UTC)


 * Thanks for the additional info. That's all I was looking for when I said "coaching" - just a few pointers, not a bigger investment of time.  I appreciate what you've done so far.  It's very helpful.  In the coming days/weeks I'll review VLDB and other entries, and then create some new pages for a few of the data mining conferences.  It will be a slow process at first, but something fun for me to tackle. It's funny that you mention Gregory and Usama.  Last week I made a short list on my User page of possible new pages to add.  Both of them are on my list.  Bharat Rao (winner of 2 KDD best paper awards, KDD Service Award, and holder of 40+ patents) is also on my list.  Karl (talk) 16:17, 14 November 2012 (UTC)


 * When starting new Wikipedia articles, always consider the reachability. If you start an article that is not connected it is as good as dead. "Fixing" red links by replacing them with an article is good, because it means these articles can be reached from Wikipedia. Usama Fayyad is mentioned in Wikipedia a couple of times - he is easy to connect (e.g. from the Data Mining article). Bharat Rao can be mentioned from the SIGKDD page, which already is one of the more obscure pages in the quite obscure (by Wikipedia standards) domain of data mining. But essentially: nobody will reach the page. I strongly advise focusing on pages that have a lot of incoming connections. I have seen a lot of pages get deleted because the topic was not considered to be relevant by Wikipedia standards, or because it was too standalone (called "orphaned" in Wikipedia). Before creating an orphaned pages, try to grow Wikipedia contents into the direction of this page. So before adding Usama Fayyad, I would first write on SIGKDD, Data mining and Yahoo! Research. Given that there are three good articles to strongly connect to him, one can start writing his article. Maybe make that a rule of thumb: before starting a page, find three articles that are more than just a stub and that strongly relate to the proposed topic. Otherwise, improve these articles first. :-) --Chire (talk) 17:23, 14 November 2012 (UTC)

Nov-2012: Conflict of interest
It might be prudent to review the Conflict of interest guidelines. You have declared a connection with IBM/SPSS on your user page. Deltahedron (talk) 19:54, 22 November 2012 (UTC)


 * Thank you, Deltahedron for pointing out this potential conflict. However, I do not feel that I have a conflict of interest with IBM/SPSS.  It has been several years since I was on the SPSS Customer Advisor Board. Yes, I professionally know several people who work there - but I am also know people who work for at least 6 of their competitors.  I do use SPSS software, but I also use statistics software produced by several competitors.  I have never been paid by IBM/SPSS, nor have they ever provided referrals/leads to my company.
 * Additionally, if you look at the contributions I have made to the SPSS Modeler page and Talk:SPSS Modeler, I think you will see that I have largely been trying to encourage others to submit material. I have also been making edits to remove the "promotional" tone in the material that others have added, and telling others that they need to provide citations to independent sources.  Most of my remaining edits have been to do little things like alphabetize a list, add wiki-links, adding a competitor to the list, and correcting wiki-links.
 * Please let me know if I am missing anything in regards to the potential COI you're pointing out. I am a novice wikipedia author, but I am trying to be very open about who I am, and what my areas of expertise are. All my edits are in good faith, but I welcome coaching from more experienced wikipedians. Karl (talk) 05:15, 26 November 2012 (UTC)


 * Thanks for that full and frank response. As far as I can tell, that doesn't add up to a formal COI and other editors will be able to read your disclosure.  Deltahedron (talk) 07:28, 26 November 2012 (UTC)


 * Thanks, Deltahedron. You appear knowledgable in this domain, so let me ask for your POV on a somewhat related page.  Please see Talk:Anomaly detection.  Similar to the SPSS Modeler page, I feel that the citation of independent sources are needed in paragraph 2 of the Anomaly detection page.  However, the citation needed tags I inserted yesterday were removed.  I'm new, and don't want to be pushy, so I made a mention of my perceived need on the talk page.  What is your POV?  Is the talk page a good way to proceed?  Do you agree that citations are needed?  Karl (talk) 13:54, 26 November 2012 (UTC)

I continue to be concerned about SPSS Modeler and conflicts of interest. In this comment, User:Bradhill14 suggests a change to SPSS Modeler which you made here. In the opposite direction, and less than 10 minutes previoiusly, Bradhill14 had made this edit which introduced a link in Data mining to Rexer's Annual Data Miner Survey, a commercial product in which you have declared an interest. It seems quite possible that Bradhill14 is an employee of IBM. Do you know User:Bradhill14 in real life? Were these edits mutually promoting the commercial products in which you and he might have an interest, the result of collusion? Deltahedron (talk) 17:53, 17 December 2012 (UTC)
 * You are right. I just looked at his user page (User:Bradhill14), and I see that Bradhill14 states there that he is an IBM employee.  I do not know him in real life. I'm guessing that the timing of the edits may be due to this user and me having similar pages on our watchlists. However, I do not know a way to check on someone else's watchlist (or even if this is possible).  But I guess I'm not fully understanding the nature of your question.  Do you think there is anything wrong with the edits I made to SPSS Modeler?  When I saw this user's suggestion on Talk:SPSS Modeler, I thought it was a good idea, so I made the edit. I'm guessing that he did not make the edit himself due to COI, but it seems to me that it's a good use of the Talk page for him to post his ideas for page improvements there.
 * BTW, Within the last few days, I've also made edits to STATISTICA (to remove promotional-sounding language and update it to list the newest version number), Data mining (correcting heading formatting), StatSoft (adding a citation), SAS (fixing formatting), a data mining journal page, and other pages. These pages and others are on my watch list.  I am trying to build community service into my weekly schedule, and wikipedia editing is one thing I'm trying to do.  I do not think I have a COI with the SPSS Modeler page, and I do not focus my editing work there.  I am new at wikipedia editing, so if I'm missing something, and not understanding the nature of your question, please help me better understand what you're inquiring about.  Thanks.  Karl (talk) 18:34, 17 December 2012 (UTC)
 * Very simply put, then. Did you and Bradhill14 arrange to make edits related to each other's commercial interests in order to avoid COI guidelines or the scrutiny of the community?  Deltahedron (talk) 18:54, 17 December 2012 (UTC)
 * Short answer: No.
 * Longer answer to be transparent and help keep potential COI info public: About 5 weeks ago, I posted material about wikipedia to 3 of the 50 LinkedIn groups that I belong to. I was trying to encourage more people to contribute to wikipedia.  In one LinkedIn group I said, "Wow, I noticed that the wikipedia pages for SPSS have really old outdated information. I've started updating these. But the pages could really benefit from several of us contributing material. Even if you've never edited anything in wikipedia, try it - it's really easy!"  I've made similar statements to users of other commercial and open-source tools, as well as to people I know who I think would be knowledgeable about other wikipedia page topics -- I've tried to encourage people to get involved in updating any wikipedia pages that they know about.  I don't personally feel that my encouraging people to contribute to wikipedia is a COI.  Looking at the LinkedIn discussion pages, I see that Brad Hill was one of the 5 people to contribute to discussions about wikipedia that took place after my posting.  His only comment was in late November, and it was to say that he thought recent changes to SPSS wikipedia pages looked good.  BTW, It is my personal POV that bradhill14's posting his idea to the SPSS Modeler TALK page was a great and totally appropriate way for him to put forth an idea about a way to potentially improve a page for which he has a COI.  He put the idea out there, and let others who don't have a COI decide whether to make the edit or not.  Karl (talk) 01:20, 18 December 2012 (UTC)
 * I am glad to hear the short answer. You will appreciate that the question is inevitable when two editors each write about the other's commercial interests within a time span of ten minutes.  The longer answer is still somewhat problematic.  Inviting people to contribute to Wikipedia in general is good.  Encouraging employees to write about their employers' products is not good.  Please make that clear when soliciting input from your professional networks.  Suggesting material on the talk page is appropriate for cases of potential conflict of interest.  It will be easier for you to avoid even the appearance of collusion if you edit some of the millions of articles not directly related to these commercial products.  Deltahedron (talk) 07:42, 18 December 2012 (UTC)

Can you please try to see the glass being half full instead of half empty. I'm certainly not trying to do anything that you or others would see as "problematic". Five weeks ago I was getting involved in wikipedia and I was feeling enthusiastic, so I posted something to some user groups (reaching 10,000+ members), encouraging them to contribute to wikipedia. I thought, at the time, that long-time wikipedia people would be more welcoming of new people, and that wikipedians would be happy that I encouraged others to participate. I was not trying to target and encourage employees of a commercial product to contribute. The groups I posted to were user groups, not employee groups. But yes, I guess that some employees belong to these groups as well. And my comment about Hill using the talk page was simply to comment that my personal POV is that he found a good way for someone with COI to contribute.

I feel that the negative and accusing tone of your last sentence is unwarranted. It is un-welcoming comments like that that drive away new wikipedia contributors. I see my wikipedia contributions as a way of giving unselfishly to the world, trying to spread knowledge and improve understanding. I think that even if I only have time to make a small number of contributions, I feel I am helping, and that this should be the spirit of wikipedia. So seeing your criticism that I am not making enough edits to non-commercial products is just frustrating and disheartening to me. I hate feeling that I have to defend my reputation here, and I don't agree with you that editing non-commercial pages should improve anyone's appearance of trustworthiness. However, your claim that I am only editing the wikipedia entries of "these commercial products" is also just incorrect. I don't know if there is a way to view the wikipedia contributions of others, but if there is, you would see that I've been making edits to many non-commercial pages, including CRISP-DM, Data mining, the journal Data Mining and Knowledge Discovery, Predictive analytics, Statistics, Customer retention, Predictive Model Markup Language, Anomaly detection, SEMMA, Machine learning, Jacques Cousteau, SIGKDD, Decision tree learning. Sometimes I'm making simple formatting edits, and sometimes I'm spending hours trying to track down the right references to put on the CRISP-DM page. Yes, my edits are mostly in the general area of data mining, because that is where I have the greatest interest and knowledge. Similarly, when I guest lecture at local universities, I lecture to these types of classes, not due to any collusion, but because that's what I know.

Thanks for listening to my mini-rant. I know I'm taking this more personally than you probably meant it. But it bothers me when someone questions that I am not working hard enough or they question my ethics, and that's what your comments feel like. All I'm asking is that long-established wikipedia people be more welcoming and encouraging, and give new people the benefit of the doubt that they are acting in good faith trying to make meaningful contributions (however small). New people grow and adjust - but it takes a while to learn the conventions of the wikipedia community. Please be patient and encouraging to new people.

I'm going to leave this discussion now. If needed, I will return to it after the holiday season. But I hope there is no need to. Karl (talk) 15:03, 18 December 2012 (UTC)

Dec-2012: Disambiguation link notification for December 8
Hi. Thank you for your recent edits. Wikipedia appreciates your help. We noticed though that when you edited Predictive analytics, you added a link pointing to the disambiguation page Orange (check to confirm | fix with Dab solver). Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ* Join us at the DPL WikiProject.

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 11:25, 8 December 2012 (UTC)


 * Great that this bot caught my error. I've now corrected the wiki-link.Karl (talk) 15:42, 8 December 2012 (UTC)