Talk:Text mining

Identifying link spam
What are the relevant guidelines for deciding when something is link spam? It like many of the external links are posted by people with vested interest. Some of the sites are truly helpful in understanding what text mining is, whereas others are really about general NLP or information extraction, and some look suspiciously like SEO or ad nests. -Serapio 05:23, 6 June 2006 (UTC)
 * Have a look at: WP:SPAM and Wikiproject no ads and user with the following contributions is now warned: Special:Contributions/24.6.82.35. He/She added unspecific external link out's, so since this far from being neutral I have already deleted some of them. Even for more general topics like data mining and machine learning the links are questionable.
 * JKW 18:22, 6 June 2006 (UTC)

Applications
Could someone with more expertise in the subject than I add to the Security and Commercial applications subsections, and possibly write a better introduction to the applications section in general? Thank you! -Hobbes 10:10, 13 Jun 06

Commercial Software
Can we add a list to the bottom of this page of commercial software vendors that provide text mining software? I don't see why we shouldn't provide pointers to solution providers in this space. There are lots of toolkits and academic (free) packages listed, but many readers of this page may be looking for something more packaged. We've added some links to vendors before, only to have them agressively removed by other users. Many other similar pages to this text mining page list companies who provide products in the area that the pages refer to. Can we agree to add a list of commercial links at the bottom of this page, for people who are looking for commercial text mining packages, without having it continuously deleted by other users?

There is still a problem of spam with External links. Some links are removed and we still see other commercial links. According to Wikipedia Policies and guidelines we must remove all commercial software vendors here (for example Topicalizer and Textalyser) or include more vendors in the list (for example Clearforest, Semantic-knowledge, SPSS). What's up? —Preceding unsigned comment added by 81.66.159.36 (talk • contribs) 18:43, 22 November 2006


 * That seems like a sensible idea. Consistency should prevail. —Preceding unsigned comment added by 81.247.173.243 (talk • contribs) 12:59, 26 April 2007
 * All of the links have been removed. Is there still a problem? --Ronz 14:36, 26 April 2007 (UTC)
 * I think it is fine now. We need now rather more information on general tests for text mining and object extraction (there are some free documents available) and perhaps add more on standard methodologies used.Periergeia

This list continues to attract spam and non-notable entries. To manage this, I think we need to follow WP:LIST by having inclusion criteria. Until we come up with something better, we should only list entries that have their own articles. --Ronz (talk) 03:37, 8 December 2007 (UTC)
 * I've removed the entries that don't have their own articles. --Ronz (talk) 00:28, 20 February 2008 (UTC)

Suggestion for additional software
Open source software
 * NLTK has classifiers in it. See http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html Python
 * Lingpipe http://alias-i.com/lingpipe/ Java
 * Mallet http://mallet.cs.umass.edu/ Java

Proprietary software
 * Today, IBM is introducing a new social media monitoring tool. See http://mashable.com/2010/05/11/ibm-social-media-analytics-tool/ —Preceding unsigned comment added by Phauly (talk • contribs) 10:37, 11 May 2010 (UTC)

Need an Overhaul
This entry has some good material, but it really needs someone to clean it up overall. It's probably a good idea to remove all references to commercial products and place them in a separate entry, following the model of Enterprise search and List of enterprise search vendors. Dtunkelang (talk) 03:56, 8 December 2009 (UTC)

Learning by reading
The article doesn't mention "learning by reading" which, according to a recent workshop proceedings, is the capture of knowledge from naturally occurring texts. Learning by Reading is a two-part process. The first part is information extraction from texts, and the second part is the use of this incorporation of this knowledge into the knowledge base. (First International Workshop on Formalisms and Methodology for Learning by Reading. 2010) http://portal.acm.org/citation.cfm?id=1866775 pgr94 (talk) 17:58, 19 December 2010 (UTC)

Feels a bit niche for now. Text mining is a broad topic. Dtunkelang (talk) 05:33, 22 December 2010 (UTC)

text mining vs. t. analytics: merge proposal
Although BI people might tend to distinguish themselves from the rest of the world by introducing new terms for old concepts, WP should endeavour to emphasis the pronounced technical similarities here. My suggestion is to make "text mining" the main lemma and "text analytics" a redirect (after transfer of non-redundant content from the latter to the former, of course). -- Kku 09:51, 1 June 2012 (UTC)

I agree. Dtunkelang (talk) 19:20, 9 June 2012 (UTC)
 * done. the overlap was remarkable. Kku 11:37, 4 July 2012 (UTC)

Broken link for Feedback Ferret
I notice that the Feedback Ferret link on this page doesn't lead anywhere as there is no Feedback Ferret page. All the other organisations listed seem to have their own page.

Is there any reason why there couldn't also be a page for Feedback Ferret? I do work for the company so obviously have an interest in this and I'm not quite clear if I would be allowed to write any of the content due to potential conflict of interest.

Friskyferret (talk) 16:11, 13 September 2013 (UTC)
 * See the discussion above, Talk:Text_mining. I've removed the link along with a couple others.
 * I'm not sure how much scrutiny you'll get if you try to create an article yourself. If you can find clearly independent and reliable sources that demonstrate Feedback Ferret is notable, go ahead and start an article on it. Please closely follow our conflict of interest policy and don't add anything that could be considered promotional in nature. --Ronz (talk) 16:52, 13 September 2013 (UTC)

Lists of software and applications
Would anyone object to turning the 'Software and applications' section into a separate list article, and linking to that instead? I think that's the most common way of dealing with long lists like this within articles. - Lawsonstu (talk) 14:10, 12 February 2014 (UTC)
 * I have now done this. The article can be found at List of text mining software. Watchers of this page may like to watch that one too. - Lawsonstu (talk) 08:35, 20 February 2014 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 2 one external links on Text mining. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive https://web.archive.org/20091129171151/http://intelligent-enterprise.informationweek.com:80/blog/archives/2007/02/defining_text_a.html to http://intelligent-enterprise.informationweek.com/blog/archives/2007/02/defining_text_a.html
 * Added archive https://web.archive.org/20140609020315/http://www.out-law.com:80/en/articles/2014/june/researchers-given-data-mining-right-under-new-uk-copyright-laws/ to http://www.out-law.com/en/articles/2014/june/researchers-given-data-mining-right-under-new-uk-copyright-laws/

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Cheers. —cyberbot II  Talk to my owner :Online 06:16, 18 October 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 2 one external links on Text mining. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive https://web.archive.org/20120303042253/http://www.ir.iit.edu/cikm2004/tutorials.html to http://www.ir.iit.edu/cikm2004/tutorials.html#T2
 * Added archive https://web.archive.org/20131004224652/http://yatsko.zohosites.com/texor-a-chat-mining-program.html to http://yatsko.zohosites.com/texor-a-chat-mining-program.html

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Cheers.—cyberbot II  Talk to my owner :Online 21:00, 5 January 2016 (UTC)

Aika
Would it be possible to add the Aika Projekt to the list of open source text mining software? Lukas.Molzberger (talk) 17:29, 13 July 2016 (UTC)