Talk:Natural language processing

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 26 August 2019 and 11 December 2019. Further details are available on the course page. Student editor(s): Wendell guan.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 05:00, 17 January 2022 (UTC)

(Computational Linguistics) Merge
PRO  I think that this article should probably be merged with Computational linguistics, but I'm fairly new to the Wikipedia, so I'm not sure.
 * Lambda 22:55, 22 Feb 2004 (UTC)

CON  While they're related, they're not really the same thing. Computational linguistics tries to use computer techniques to better understand linguistics as a discipline, while NLP tries to build ways for a computer to understand language. Obviously many things overlap, but they have much different focus: NLP doesn't explicitly care if it's making new contributions to linguistics, and computational linguistics doesn't explicitly care if it's making it easier for computers to understand natural languages. --Delirium 22:58, Feb 22, 2004 (UTC)

Unclear My take on this (I'm a grad student studying NLP/CL) is that CL and NLP are the endpoints on a continuum, and so a lot of work in the middle is hard to classify as one or the other. They don't have separate conferences - the Association for Computational Linguistics (annual) and Computational Linguistics (biannual) are the main conferences for both NLP and CL research. 24.59.194.44 13:26, 23 June 2006 (UTC)


 * PRO I agree -- we should merge. Whether you call it NLP or CL is mostly a question of what aspect you stress. In addition, my impression is that the NLP tendency is currently stronger than the CL tendency in the field. Articles in the Computational Linguistics journal, and at the Coling and ACL conferences, are judged on whether they are useful rather than on whether they give any insight on how humans process language.Kallerdis (talk) 19:35, 29 February 2008 (UTC)

CON  There's a fine distinction between NLP and Computational Linguistics that has to do primarily with the distinction between computing and linguistics. Historically, NLP is associated with computing and CL with linguistics. I would be opposed to the merge for that reason. Investigations into the nature of language are misplaced in applied computing and practical aspects of parsing for say commercial applications are misplaced in Linguistics. 74.78.162.229 (talk) 21:30, 10 July 2008 (UTC)


 * PRO/Rebuttal Both NLP and CL have the same objectives, and this "fine distinction" is irrelevant when both CL and NLP involve computing and linguistics (who cares about the mixture proportions?). Dustin


 * CON I am agree. -- AKA MBG (talk) 09:58, 11 July 2008 (UTC)

PRO CL and NLP should be be merged. There are other fields: (I call) "Natural Language Understanding" or "Machine Reading" that have more ambitious goals:  get a computer to "understand" some natural language. NLP and CL have made more progress, but are application driven --the technology behind them is often just perl scrips making statistics from NL corpora. In any case, certainly NLP should merge with NLU or CL, but definitely not both. Dustin


 * PRO My understanding has always been that CL is the term used by people with linguistics backgrounds, while NLP is more often used by computer scientists. At worst, I'd call CL a core subfield of NLP. &mdash;/M endaliv /2¢/Δ's/ 23:31, 17 October 2008 (UTC)


 * CON CL is not a subfield of NLP, unless you're an NLP researcher ;) I've studied at both NLP-oriented departments and more CL-oriented ones. There's a lot of overlap, sure, but the approaches differ a lot (do we build corpora to study eg. the limits of case alignment in natural language, or to get testbeds for parsers?). Also, there is a difference in methods used, eg. in both fields there are those who swear to statistical methods, but NLP (eg. for parsing, MT) puts a lot of credit in Bayesian methods, while CL (eg. corpus linguists) uses more standard hypothesis tests. Kiwibird (talk) 11:57, 16 November 2008 (UTC)


 * PRO Excuse me for perhaps erroneously correcting an expert, but aren't you comparing NLP/CL to Corpus Linguistics rather than NLP to CL, you basically state as much.

PRO  I think that this article should probably be merged with Computational linguistics, but I'm fairly new to the Wikipedia, so I'm not sure.

CON -- see my suggestions under. --Thüringer ☼ (talk) 08:47, 15 January 2009 (UTC)

PRO  I have worked in CL/NLP for two decades, and as far as I am aware, there is no clear distinction in practice between CL and NLP, both have the same conferences, the same publications, the same research communities. In my opinion, it would be better to have one merged article, with mention of the different subfields within CL/NLP. Gor (talk) 06:45, 27 March 2009 (UTC)

PRO I work as a researcher in CL/NLP/Text Analytics/AI/Machine Learning/etc. I think CL and NLP should be merged, in the grand scheme of things, there is not much difference (if any). Either way, as I said under the CL article: It seems to me that the state of things is that the boundary between NLP and CL is unclear. I think the goal of any related Wikipedia articles should be to represent the state of things as accurately as possible, NOT to solve the clarity problem. Thus, both articles should clearly :) state that various opinions about these fields. Indquimal (talk) 23:15, 20 June 2009 (UTC)

PRO There might be a fine difference between NLP and CL but the difference is tiny and unclear. As some have mentioned, the term NLP is used more by people with Computer Science backgrounds and the term CL is used more by people with Linguistics background, also I believe that CL is somewhat more the theoretical side and NLP the practical side. However, you cannot do one without the other, all NLP applications are based on CL theory, and all CL research is based on experimenting with NLP applications.

CON It is important to keep them apart even if today they are both dominated by the computing-oriented approaches. NLP people generally don't have any formal background in Linguistics, and don't really know much about language (and virtually nothing about Linguistics), and the people tend to sit in Computer Science Departments. CL people have the formal background in Linguistics, and CL is often taught in Linguistics Departments along with the necessary computing skills. The aims of CL are to understand more about language, whilst the aims of NLP are to achieve specific performance goals in a computational context - e.g. specific computer applications, or as an abstract problem in machine learning. Both are valid approaches from their own disciplinary perspectives, but the current dominance of NLP tends to stifle CL. dP (ML/NL/AI/CogSci) (talk) 03:51, 25 August 2012 (UTC)

PRO This is misleading to have two pages for the same thing. At least at Paris III University, Master degrees in NLP/CL accept people with background in linguistic and math/cs. I think we should keep Computational Linguistics to avoid the confusion with the other NLP.i⋅am⋅amz3 (talk) 23:18, 17 March 2018 (UTC)

PRO I think they should be merged. Not that there are no differences, but at least, these differences (and overlaps) could be made transparent then. The Computational Linguistics page is comparably weak, and merging both would lead to a better article. Likewise, Language technology should be merged as well, for the same reasons. In either way, Both terms should be defined in their respective subsections after merge. Chiarcos (talk) 21:47, 19 January 2021 (UTC)

Content from The Natural Language Processing
I would like to mention my company, Creative Virtual, because we have over 10 years experience working with virtual assistant natural language web applications, and link to the automated online assistant page. — Preceding unsigned comment added by 75.99.227.213 (talk) 20:46, 9 November 2011 (UTC)

I append the content from that page, in case anyone wants to merge it in here.

Charles Matthews 09:35, 6 May 2004 (UTC)

 The Natural Language Processing 

Natural Language Processing (NLP) is inside the topic of the Artificial Intelligence and linguistics. It treats the problems inherent in the processing and manipulation of natural language.

Some examples of the major tasks in Natural Language Processing are:


 * Text to speech
 * Speech Recognitions
 * Natural language generation
 * Translation made by Machine
 * Question answering
 * Information retrieval
 * Information extraction
 * Text-proofing

Some problematic things in NLP are:

 Word boundary detection 

In the known spoken language, there are no gaps between words; where to situate the word boundary many times depends on what choice makes the most sense grammatically and given the context.

 Word sense disambiguation 

Any word that we can think of has many different meanings. That is why, we have to select the meaning which makes the most sense in our context.

– Sign Syntactic ambiguity 

The grammar for natural languages is ambiguous. Selecting the most appropriate grammatical element requires semantic and contextual information.



Speech acts and plans 

Sometimes what we write doesn't mean literaly what is written; for instance a good answer to "Can you give the pencil?" is to give the pencil; in most contexts "Yes" is not the best thing to answer; when you want to say literaly "No" it is better to say "I'm afraid that I can't see it".

Question edited into the article by User:129.27.236.115:
 * The Morphix-NLP link is not valid anymore. Does anybody know where to get Morphix-NLP?

Cadr

It is now. Yaron 22:40, May 17, 2004 (UTC)

Remove external link
Removed a spam link (several times) to a website called ivrdictionary. This is a thinly veiled attempt to put advertising on Wikipedia. Links were added by several anonymous users within a tight IP range. Website purports to list ivr terminology, but in reality it prominently displays an advertisement to Angel dot com, which is a commercial company that sells IVR related products. The same links were added to other articles that are related to IVR technology. Calltech 16:59, 17 November 2006 (UTC)

Incorporate stemming?
I suggest adding a link to stemming in the see also or subtasks or challenges. I am not sure who is responsible for editing this article though, and I don't want to edit it myself without asking. Is stemming too detailed, or a subtask of another subtask only like IR? Not sure. I thought it was a pretty popular problem. Josh Froelich 19:46, 13 December 2006 (UTC)

"I am not sure who is responsible for editing this article though" You are, feel free to edit any wikipedia page. Yes it feels very wrong the first few time, but your fine to do so. Someone will fix it if your wrong anyhow. Scott A Herbert (talk) 13:56, 24 February 2011 (UTC)

I would disagree with the removal of links to software
I expected to find the word "software" to be used more than once on a topic like this. Software is sort of important in this field, and having a page that lists extant software (regardless of license) with a meaningful comparison of the various options (e.g. key features, license, programming language, APIs)

Maximum entropy methods
My vague understanding is that maximum entropy methods represent the state of the art in NLP these days; yet this article seems to fail to mention them. Could an expert clarify/elucidate? linas 13:17, 13 June 2007 (UTC)


 * If an article is lacking a notable subject, it's usually the case that nobody got around to adding it. Please be bold and add a review of maxent NLP stuff to the article as you see fit, remembering to cite your sources.  –jonsafari 20:47, 14 June 2007 (UTC)


 * In most subareas of current NLP, machine learning is at the core of most implementations. It's true that Maxent (or logistic regression, as it's also known) and its generalizations (e.g. Conditional random fields) usually perform well for these tasks, but they are not the only method. I'd say that margin-based methods such as Support Vector Machines are at least as popular. Anyway, it's more important to expand the section about machine learning/statistical modeling rather than just adding a section about Maxent. Kallerdis (talk) 19:43, 29 February 2008 (UTC)

Human Language Technology
Does anyone feel it necessary to distinguish between NLP and HLT? If so, please visit that article—it desperately needs work. On the other hand, perhaps it should simply redirect here to the NLP article. —johndburger 02:47, 22 June 2007 (UTC)

Papers
The following were added to the External links section. Perhaps one or more might be used as a reference someday? --Ronz 17:36, 14 November 2007 (UTC)
 * Goutam Kumar Saha, English to Bangla Translator: The BANGANUBAD, International Journal -CPOL, Vol.18(4), pp.281-290, December 2005, WSPC, USA.
 * Goutam Kumar Saha, Parsing Bengali Text - an Intelligent Approach, ACM Ubiquity, Vol. 7 Issue 13, April,  2006. ACM Press, USA.
 * Goutam Kumar Saha, The EB-ANUBAD Translator: A Hybrid Scheme, International Journal ZUS, Vol. 6A(10), ZUS Press, 2005.
 * Goutam Kumar Saha, A Novel 3-Tier XML Schematic Approach for Web Page Translation, ACM Ubiquity, Vol. 6(43), ACM Press, 2005, USA.

Add confusion about accenting words?
I was going to add this in, but I thought it might not be a good Idea. If you guys can incorporate it well and fit it in, please do: (I was going to put it after the 'I never said she stole my money' part.) Accenting words can be very helpful in giving meaning to a sentence that contains negatives, because the speaker is saying that a specific fact is not true, and usually something else without one expressed specific is. Sometimes accenting words in a sentence can still lead to confusion, like in "Go over there" because "over" is being used to describe the relative position of the destination, but when taken by itself, "over" means ontop of something. The accent in this case implies a literal meaning of the word...

24.250.97.223 (talk) 04:56, 14 December 2007 (UTC)

(NLU) Merge
PRO As stated on my talk page. Not much there but don't see anything here either so maybe better to do a little something here. Perhaps a § (NLU, Semantics, Discourse, Top Level Protocols, etc.) to which the NLU article can redirect. 74.78.162.229 (talk) 21:38, 10 July 2008 (UTC)

PRO Similarly to Computer linguistics, I think NLU should be merged into CL because all three of them deal with natural language comprehension by computers. i⋅am⋅amz3 (talk) 01:36, 18 March 2018 (UTC)

Rating and Importance
Set these to values that seemed reasonable to me and manually created the Comments page. 74.78.162.229 (talk) 22:01, 10 July 2008 (UTC)

Clean-up/Major edit
As noted in the article header, this article needs major rewriting, restructuring and clean-up. Would anyone like to team up with me to get it done? I'm a wiki-novice but know a fair amount about NLP (and have plenty of references that I can consult). Sunfishy (talk) 17:39, 5 November 2008 (UTC)sunfishy


 * Yes, I can see the necessity, and I am willing to help. Let's perhaps start with a non-controversial, easy restructuring: The section Major tasks in NLP is quite a random list of NLP-related articles at the moment. I think it would be wise to differentiate between (1) NLP tasks in the sense of NLP modules a comprehensive NLP system can have (speech recognition, morphological analysis, NLU, word sense disambiguation, semantic role labeling, semantic interpretation, perhaps NLG), and (2) NLP applications such as those currently listed under this heading.
 * More generally, the relationship to Computational linguistics should finally be clarified. It is not a rare thing that industry uses terms different from academia, and in this case, I can see that it makes sense to have two articles. They should simply be linked to each other in a reasonable way, and then the big warning signs will no longer be necessary. This article could describe the applied side while Computational linguistics could focus on the theoretical underpinnings (which is already the case, by and large). --Thüringer ☼ (talk) 08:43, 15 January 2009 (UTC)

Subproblems
A significant subproblem not mentioned (directly) is that the great majority of people use words and grammar incorrectly. For example, one of the most frequently seen errors in written text is using "loose" for "lose", as in "Did anyone loose this book?". A typical grammatical error is a golf analyst talking about something being "between he and the hole" instead of "between the hole and him". In fact, if you listen to sportscasters on TV, hardly five minutes will go by without some kind of gross grammatical error or misuse of words. Tens of millions of people are often subjected to this for hours at a time, week after week, possibly having a negative effect on the way they speak.

Ironically, even the article is guilty of speech misuse under the "Subproblems: Speech acts and plans" heading where it says: ''"Can you pass the salt?" is requesting a physical action to be performed.'' Actually, the verb "can" means "able to" and as such, DOES request a yes or no answer rather than requesting a physical action. The correct, unambiguous wording is: "Please pass the salt." or at the very least: "Would you pass the salt, please." The question mark is intentionally not used because we are not really asking a question. Also notice that adding "please", like your mother surely told you, instantly clarifies that a physical action is being requested.

Speech is only half of communication; the other half is the cooperation of the listener in trying to understand what the speaker means regardless of errors in speech. So any computerized natural language processor must be programmed not only with proper grammar and word meanings, but also with the ability to recognize and correct for IMPROPER speech. Any NLP program which requires perfect word usage, spelling, and grammar is not going to work very well. 71.154.253.96 (talk) 14:02, 8 October 2009 (UTC)

Forgotten Merge? Better forgotten
I do not see a discussion of the July 2008 merge suggestion. Natural language understanding is a field unto itself, and I am going to rewrite that article 99.99999% and put a "main link" so there is really no need for a merge. This article is not in good shape either, but it is a much larger field and will need much more attention. It does have several good points in it, but overall a new computer science student would be well advised not to read it until it has been cleaned up. Unless there are objections I will remove the merge flag later. Cheers. History2007 (talk) 21:12, 18 February 2010 (UTC)

Section 'Concrete problems'
The second bullet point in the section 'Concrete problems' is copied verbatim from its source, http://www.kurzweilai.net/articles/art0311.html?printable=1. Is there permission? —Preceding unsigned comment added by Jann.poppinga (talk • contribs) 14:17, 3 May 2010 (UTC)


 * Good observation; thank you for pointing this out. I am working on this section and the problem should naturally drop out as the restructure progresses. TehMorp (talk) 14:55, 23 June 2010 (UTC)

Sections 'Concrete Problems' and 'Major tasks'
When I began, concrete problems was essentially a list of largely unelucidated examples; It seems better to work the examples in with some level of explanation (or work some level of explanation in with the examples). I began to do that, and now I'm wondering whether ultimately it wouldn't be better to end up combining this section with the Major tasks section. What that would entail would be including examples along with appropriate tasks to illustrate why that particularly task isn't yet solved, or what's difficult about the task. There's one fairly rich example, the "time flies like an arrow" example, subparts of which could be used under several different problems, so perhaps this example would be set up at the beginning of the list and then different aspects of it referred to appropriately.

Alternately, it could be interesting to use the examples before the task list as sort of a teaser, a "this is what we have to deal with", followed by a sort of "because of that, these are tasks that must be handled" type thematic progression.

Opinions? TehMorp (talk) 15:04, 23 June 2010 (UTC)

I think that the 'Concrete Problems' section should be dropped. The "problems" all boil down to the same issue: not being able to determine the intended meanings of words outside of their context.

The letter "A" can have many different meanings: the first letter of the English alphabet, a musical note, a grade, etc., just as the phrase "pretty little girls' school" (or any of the other phrases given) can have any of the meanings shown in the section. In each case, the meaning should be determinable by the surrounding context. It is ridiculous to say that understanding such phrases is a problem any more than is understanding which meaning of "A" is intended when no context is given for either.

Determining the intended meanings of words based on their context is not a "problem" so much as it is the essential goal of NLP. This is not to say that there cannot be ambiguities resulting from poorly worded text, but when when an NLP program detects abiguities which cannot be resolved given the surrounding context, the simple solution is to request clarification from the source of the text. 75.46.215.114 (talk) 12:10, 11 August 2010 (UTC)

I did drop this section. It was repetitive and didn't seem especially useful. The section on tasks gives a fair amount of explanation of what the issues are for the individual tasks. For more examples, refer to the articles on specific tasks. Benwing (talk) 22:18, 3 October 2010 (UTC)

Parsing weird
"And ALL fruit flies in the same manner - like bananas do;"

I don't think any program would parse "Time flies like an arrow" this way, given that neither "fruit" nor "bananas" appears in the source sentence. I suspect this was copied incorrectly, but the original link is now dead.

Should it read "And ALL time flies in the same manner - like an arrow does"? That's a pretty big change for a typo. —Preceding unsigned comment added by 216.163.72.2 (talk) 00:45, 1 October 2010 (UTC)

Assessment comment
Substituted at 00:57, 30 April 2016 (UTC)

COI tag
What is the reason for this cleanup tag that you added to this article? Jarble (talk) 00:47, 8 January 2018 (UTC)
 * Well, I needed to inform the other editors somehow that NLPGuy is editing an article that is in violation of our COI policy. His COI edit was reverted by as a result. Since the editor have stopped his COI editing, I removed the tag, but thank you for notifying. :)--Biografer (talk) 01:26, 8 January 2018 (UTC)
 * On the other hand, I do see a lot of original research in this article, considering that some sections are not referenced, but I might be wrong.--Biografer (talk) 01:58, 8 January 2018 (UTC)

Major evaluations and tasks
I'm not happy with the mismatch between the Major_evaluations_and_tasks section (and subsections) and Category:Tasks_of_natural_language_processing. mendicott.com (talk) 19:10, 21 March 2018 (UTC)

Tried to systematize the Major_evaluations_and_tasks section a bit. Did not address mismatch with Category:Tasks_of_natural_language_processing. IMHO, this cannot be really resolved because the pages in the category focus have no consistent level of granularity. Chiarcos (talk) 20:41, 17 August 2020 (UTC)

Hyphen
Shouldn’t “natural-language processing” be written with a hyphen, as it means “processing of natural language”, not “natural processing of language”? palpalpalpal (talk) 19:20, 29 September 2019 (UTC)


 * By typical English grammar rules, yes. By typical NLP research parlance, no. I don't think it hurts to include the hyphen (I read an article recently where I saw it) but I also made the judgment call to leave the hyphen out on my résumé/CV as a signal, if that tells you anything (I do ML/NLP research). Dem1995 (talk) 20:06, 7 December 2022 (UTC)

No, conventional spelling is Natural Language Processing (with or without capitalization). Chiarcos (talk) 20:42, 17 August 2020 (UTC)

The current infobox image reinforces sexist stereotypes
The current image in the infobox in the top right shows an automated online assistant built (presumably) using NLP technologies. The problem is that it shows a cartoon woman as the assistant. Do we really want to reinforce the stereotype of women assistants by showing it as the first (and only) image for the NLP page on Wikipedia? NLP has a gender bias problem and this image only magnifies it (not to mention alienating women who might be interested in the field). I don't think this particular accurately reflects an application of NLP today in any case.

I don't have any suggestions for alternative images at the moment, but I feel that an infobox linking NLP to other research areas (Machine Learning, Computational Linguistics, etc) would be more appropriate. For example, look at the infobox for the Machine Learning page. Surely the NLP page can be part of some portal/series? — Preceding unsigned comment added by Venkatasg (talk • contribs) 20:07, 4 July 2020 (UTC)

Cognition
While the overlap between cognitive science and NLP (or CL) is important, indeed, this passage does not describe an NLP task and simply doesn't fit the overall text. Either revise and move to an independent section or remove it. I'm inclined to the latter because I see no way to repair that easily. Chiarcos (talk) 20:45, 17 August 2020 (UTC)


 * If this section on "Cognition" does not get deleted, at least one para needs editing badly. About the word 'big', it says "When used as a Stative verb, as in ”Tomorrow is a big day”."  But 'big' is not a verb at all, much less a stative verb. ('verb' is a grammatical category, not a semantic one, and 'big' doesn't take any tense/aspect/person marking like genuine English verbs do.  The recent coining 'embiggen' does take such verbal inflection, but it's not stative, and that's not the word in question here anyways.)  I'm on the verge of just deleting that para entirely, since its description of 'big' in the sentence "That is a big tree" as a comparative is also misleading, if not as plain wrong as calling it a verb in the other sentence.  (It is certainly not grammatically a comparative, and it's not clear in what other sense it might be a comparative.)  Nor is the content of the para attributed to any source, i.e. it seems like personal research.  But simply deleting the para would leave the surrounding paras without obvious rhetorical connection, so I'm reluctant to do so.Mcswell (talk) 16:25, 16 October 2020 (UTC)


 * I worked that comment in. I also tried to make some more sense of it, straightened the language and linked the Lakoff stuff with more recent developments. Originally, this was basically a Lakoff excerpt, but that wasn't obvious from the layout. I worked out that paragraph into a section on developmental trajectories (where it fits nicely, even though the Lakoff ideas are absolutely *not* what is driving that). In any case, I no longer consider this a candidate for deletion, but one may consider renaming that and describing other current tendencies in the field (other than methods, that is quite clear now from the revised history section). Chiarcos (talk) 20:38, 11 January 2021 (UTC)

Humanity and Sustainability
An experience towards humanity and all act that promotes human sustainability as ways of transforming human right act to reality including underprivileged communities and to have the greatest purpose by fighting against obstacles and challenges 41.223.132.196 (talk) 17:53, 25 May 2023 (UTC)

Disambiguation
For the disambiguation note at the top, there should be link to 'NLP' page 124.150.139.62 (talk) 00:21, 23 July 2023 (UTC)