Talk:Biomedical text mining

Untitled
Greetings, fellow wikipedians! I noticed that this stub didn't have a discussion page, while several people have contributed to this article. I'd love to get to know whoever else is interested in the subject. Even though the references so far are all centered around Hoffman/Valencia et al., I'm surprised nobody brought up iHOP yet, so I added it as an example in a new section. Since I am currently writing a thesis on the subject of biomedical text mining, I expect to be able to give a much more complete view of the subject, and eventually lift the stub status of this article. My edits so far have been only a warming up. Ste1n 19:05, 17 April 2006 (UTC)

Examples, please?
Should this article not have one or two specfic examples where text mining advanced research, helped with drug dscovery, established the etiology of a disease? Are there any such examples in medicine / health? I doubt it[I have looked e.g. on PubMed]. The use of word clouds has been questioned, text mining produces nice stats and graphs but does it tell us anything new? BTW I am not talking about plagiarism detection... Sleuth21 (talk) 07:30, 30 May 2011 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Biomedical text mining. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20060901073846/https://lists.ccs.neu.edu/pipermail/bionlp/ to https://lists.ccs.neu.edu/pipermail/bionlp/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 23:40, 2 November 2016 (UTC)

Additional of references for BioNLP Shared Tasks
Information to be added or removed: In the section "Availability of annotated text data", I would like to add mention of the BioNLP shared tasks following the mention of the Informatics for Integrating Biology and the Bedside (i2b2) challenges. The first pagagraph of this section would then be the following:

Large annotated corpora used in the development and training of general purpose text mining methods (e.g., sets of movie dialogue, product reviews, or Wikipedia article text) are not specific for biomedical language. While they may provide evidence of general text properties such as parts of speech, they rarely contain concepts of interest to biologists or clinicians. Development of new methods to identify features specific to biomedical documents therefore requires assembly of specialized corpora. Resources designed to aid in building new biomedical text mining methods have been developed through the Informatics for Integrating Biology and the Bedside (i2b2) challenges, BioNLP shared tasks       and biomedical informatics researchers. Text mining researchers frequently combine these corpora with the controlled vocabularies and ontologies available through the National Library of Medicine' s Unified Medical Language System (UMLS) and Medical Subject Headings (MeSH).

Explanation of issue: The BioNLP shared tasks (and the corpora created as part of them) represent important community efforts and resources for the biomedical text minning community. The tasks and resources were created by various members of the community, including my own group. I tried to to add this directly, but it was removed as an "Apparent COI cite". Howeever, this represents not only the work of my group, but the work of others. Apologies if I have done something incorrecly - I have not got a great deal of experience in editing Wikipedia pages.

References supporting change: Supporting references included in the changes shown above

Daisylagata (talk) 14:38, 27 August 2019 (UTC)

Reply 27-AUG-2019
Regards, Spintendo  15:16, 27 August 2019 (UTC)
 * 1) Of the provided sources, 50% of them contain page parameters covering 4 or more cited pages of text. It is highly unlikely that the information contained in five sentences results from all 96 pages of this cited text. Thus, the request should specify which particular page the information is contained upon in sources containing multiple cited pages.
 * 2) The grouping of eight separate references to source only three words suggests WP:TOOMANYREFS.
 * 3) The COI editor is invited to redraft their proposal incorporating exact page numbers, and is asked to make use of only the minimum references needed.

Reply 28-AUG-2019
I have reduced the number of references to four. There have been four of the BioNLP shared tasks in different years. Now, there is a link to an overview paper, or the conference proceedings, for each of these tasks. I hope that this is more accptable. Please see below. Daisylagata (talk) 10:35, 28 August 2019 (UTC)

Large annotated corpora used in the development and training of general purpose text mining methods (e.g., sets of movie dialogue, product reviews, or Wikipedia article text) are not specific for biomedical language. While they may provide evidence of general text properties such as parts of speech, they rarely contain concepts of interest to biologists or clinicians. Development of new methods to identify features specific to biomedical documents therefore requires assembly of specialized corpora. Resources designed to aid in building new biomedical text mining methods have been developed through the Informatics for Integrating Biology and the Bedside (i2b2) challenges, BioNLP shared tasks    and biomedical informatics researchers. Text mining researchers frequently combine these corpora with the controlled vocabularies and ontologies available through the National Library of Medicine' s Unified Medical Language System (UMLS) and Medical Subject Headings (MeSH).