Talk:Julie Beth Lovins

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 20 September 2018 and 21 December 2018. Further details are available on the course page. Student editor(s): Liangdanica. Peer reviewers: Taylorkeefer, BaileyArthur475, Jahimbol.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 23:31, 17 January 2022 (UTC)

Peer Review
Hi! The piece is small but there's definitely a lot to work with. With such little information, there's still a good amount of references, so keep that up. I can see in your talk page you already have lots of places to source from. I would say the next step is creating a section for her academic scientific career, and maybe look up if she's gotten any awards or acknowledgements for her work! — Preceding unsigned comment added by Taylorkeefer (talk • contribs) 21:21, 8 November 2018 (UTC)

Intro
I corrected the error in the first paragraph and added her own original paper as the reference.

The original intro was " --- who wrote the first stemming algorithm for word matching.[1]".

Changed to "-- who first published a stemming algorithm in 1968".

Her own published paper refers to three prior stemming algorithms that she was aware of as below:

(1) p24 "The algorithm developed by Professor John W. Tukey of Princeton University (personal communication) associates a lower limit with each ending. "

(2) p25 "By contrast, the algorithm developed at Harvard University by Michael Lesk, under the direction of Professor Gerard Salton [10], is based on an iterated search for a longest-match ending. "

(3) p25 "A third algorithm has been developed by James L. Dolby of R and D Consultants, Los Altos, California (personal communication). "

Ray3055 (talk) 17:25, 12 January 2019 (UTC)

Removed geeks for geeks blog ref
This blog uses as reference an academic paper that already appears in the ref list. The site itself contains glaring errors such as Potter's instead of Porter's, claims that choco gets 'reduced' to the root chocolate, and misquotes sentences from the paper. Ray3055 (talk) 12:24, 24 February 2019 (UTC)

The Lovins Stemming Algorithm
I have removed from the last para: "However, one disadvantage is that its running time is long and it consumes a lot of data." I have also edited "Furthermore, it is ineffective at forming words from the stems and matching stems that are similar in meaning.[23]" to read - "Disadvantages are many suffixes are not available in the table of endings. It is sometimes highly unreliable and frequently fails to form words from the stems or to match the stems of like-meaning words. The reason being the technical vocabulary being used by the author."

Ref [23] actually states: "The advantages of this algorithm is it is very fast and can handle removal of double letters in words like ‘getting’ being transformed to ‘get’ and also handles many irregular plurals like – mouse and mice, index and indices etc. Drawbacks of the Lovins approach are that it is time and data consuming. Furthermore, many suffixes are not available in the table of endings. It is sometimes highly unreliable and frequently fails to form words from the stems or to match the stems of like-meaning words. The reason being the technical vocabulary being used by the author. " (From IJCTA | NOV-DEC 2011 Anjali Ganesh (The Maharaja Sayajirao University of Baroda)

Here is another quote from a different Indian University paper in 2016: "The advantages of this algorithm is, it is very fast and can handle removal of double letters in words like „getting‟ being transformed to „get‟ and also handles many irregular plurals like – mouse and mice, index and indices etc. Drawbacks of the Lovins approach are that it is time and data consuming. Furthermore, many suffixes are not available in the table of endings. It is sometimes highly unreliable and frequently fails to form words from the stems or to match the stems of like-meaning words. The reason being the technical vocabulary being used by the author" (From IJARCSSE Volume 6, Issue 2, February 2016. Applications of Stemming Algorithms in Information Retrieval- A Review. Rakesh Kumar, Vibhakar Mansotra (Department of Computer Science & IT, University of Jammu, India)

Yes, the wording in both papers is identical. The second paper actually bothers to give a citation for this information, it is [9] J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computer Linguistic., vol.11, no.1/2, pp. 22-31, 1968.

However, in the original Lovins paper it simply states: "The obvious disadvantage to this method is that it requires generating all possible combinations of affixes. A second disadvantage is the amount of storage space the endings require."

Since this/these paper(s) and others agree that the method is very fast, the 'time consuming' or 'time to generate all possible combinations', or 'its running time is long' criticisms seems to be nonsense; also, although in 1968 the storage space to hold such a small table might have been an issue, in 2019 it's not an issue, if indeed the 'data consuming' criticism was referring to this.

The paper from IJCTA | NOV-DEC 2011 Anjali Ganesh - makes no such reference to "it is ineffective at forming words from the stems and matching stems that are similar in meaning" that is why I have removed it; It doesn't appear to give a citation for the "many suffixes are not available in the table of endings. It is sometimes highly unreliable and frequently fails to form words from the stems or to match the stems of like-meaning words. The reason being the technical vocabulary being used by the author", however, I believe this is based on earlier papers of others* that the author has simply overlooked to cite. For now, at least this Wikipedia para has an academic reference albeit a very poor quality one.


 * For example at it states: "The design of the algorithm was much influenced by the technical vocabulary with which Lovins found herself working" and "The subject term list may also have been slightly limiting in that certain common endings are not represented..." Ray3055 (talk) 22:58, 24 February 2019 (UTC)