Talk:Stylometry

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 21 January 2020 and 4 May 2020. Further details are available on the course page. Student editor(s): Fishnchips100, Kelly Matthews Language and Law 2020.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 10:21, 17 January 2022 (UTC)

Anybody out there who actually does stylometrics?
I'm interested in quantifying the style differences between paid and non-paid editors. My thesis is that paid editors have a particular writing style. Let's call it the PR style. Whereas non-paid editors have a different style. Let's call it the "encyclopedic style". We might test a third set of data, actual press releases, to see if "PR style" is actually closer to "press releases" than to "encyclopedic style." Data from press releases and from non-paid editors will be easy to find. There is also some data from editors who have been kicked out for paid editing, or declared their paid editing. The "declared" group might be slightly different (they aren't hiding in the shadows). Any help appreciated. Smallbones( smalltalk ) 20:13, 21 March 2015 (UTC)
 * Very interesting project! For good analysis a good corpus is essential and your suggestions are good. I would suggest taking a particular look on company articles (as I did with sentiment analysis). I believe that COI editors might have a narrow focus only writing about a particular company while the standard prolific editor edits in a more broad range of articles, so one possibility would be to make automatic labeling from the occasional narrow-focused editor to the prolific broad-range editor within a specific article. COI edits would probably be more positive than negative, so using the sentiment score for an editor from sentiment analysis as a label could perhaps also be fruitful to identify stylistic features. &mdash; fnielsen (talk) 21:43, 21 March 2015 (UTC)


 * Yes, agreed - v interesting project. Will raise with my colleagues, as we've done work on this kind of thing before (with Shakespeare and contemporaries). No shortage of material for both types - would anyone be able to direct us to plenty of "training data" ? Might be possible to automate the process via a bot, and at least flag up for more attention by human editors.....Robma (talk) 10:04, 22 March 2015 (UTC)


 * Thanks for the positive responses. It looks like both of you have experience in stylometry, which I don't.  I'd love to be able to help gather the data. Just tell me what you want and how much.  Possible problems I see:


 * non-paid or non-COI editors - I suppose that these should be matched up to the COI editors in some way, e.g. experience. Otherwise I might just randomly choose an article from about the same date as the COI editor's work.
 * declared or banned paid editors - well I'm aware of about 5 of these - so it may be somewhat limited. Maybe instead:
 * Editors reported (and more-or-less confirmed) at the WP:Conflict of interest noticeboard - should be tons of these
 * Press releases - I'd probably just go to PR newswire or the like, maybe select random dates over several months, and perhaps eliminate some topics such as staff promotions.


 * another possible topic relates to sockpuppets reported at WP:SPI - there will be a sockmaster reported (usually with lots of edits) and then a series of purported sock (usually with fewer edits)


 * Just let me know how I can help. Send me an email via my user page if you have detailed requests, discussion, etc.  Smallbones( smalltalk ) 01:50, 23 March 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Stylometry. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Corrected formatting/usage for //chronicle.com/temp/reprint.php?id=4fvlt82gn640d1rp48srbpjsvlzhmyrs

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 11:17, 4 April 2016 (UTC)

The section 'Case studies of interest'
In recent years this section has become more a place where stylometricians highlight their own work, instead of being a place where really important cases are discussed. One of the most famous stylometric use cases needs to be added yet, but I don't have the time to do it at the moment. The Federalist Papers. First studied with stylometric tools (as far as I know) by Frederick Mosteller and David L. Wallace (Reading, Addison-Wesley, 1964) and since then repeatedly. A section on criticism of stylometry is missing yet and in the case studies famous cases where early proponents of stylometry were famously wrong, for example Andrew Morton whose method, qsum, was debunked in later years but used in some cocur cases before. FJannidis (talk) 12:44, 2 January 2021 (UTC)

Source about delta measures
This may be interesting to someone:

I haven't finished reading it. WhatamIdoing (talk) 01:52, 18 January 2021 (UTC)