Wikipedia talk:Alleviate negative unsourced statements

How is this list compiled?
A Python script scans the pages (non-redirects) in Category:Living people for certain key words like "slut," "faggot," etc.

It puts the data into a MySQL database with a Django interface (pictured below).



For each entry, there are four statuses that can be set: unreviewed (the default), triage, deferred, and false positive.

The entries marked as triage are the ones I copy to this page. The review work and list compiling is all done manually.

The database stores and remembers past entries that have been reviewed, with the ultimate goal of (hopefully) reducing duplicate efforts, especially with regard to false positives.

Why not fix this dreck yourself?
There are two reasons:
 * 1) It's already a pain in the ass to review these articles and make the list; I can only do so much.
 * 2) BLPs (and content writing in general) has never been my strongest area. Could I simply remove all of this unsourced crap from the biographies? Sure. But some of this stuff is simply poorly sourced, but entirely true. To blindly remove it all would damage the articles. Instead, it needs to be researched. If reliable sources can be found, add them. If not, then the content can be removed.

Which phrases are searched for?
The phrases that were searched for in the first scan were:

If there are phrases you would like to be searched for in the next round (if there ever is one), please list them below:
 * plagiarism
 * dropped out
 * drop out
 * drop-out
 * dropout
 * failure
 * DUI (charged prolly gets most of these, though)
 * wikipedia --?
 * sexual abuse
 * scam
 * scammed
 * terrorism
 * stabbing
 * knifing
 * beating
 * allegedly
 * antisemite
 * anti-semite

What is negative?
This is probably the most difficult part of doing BLP review work: determining what is negative (and therefore needs immediate attention) versus what is simply unsourced, but not negative (thus requiring tagging or later fixing). Does saying someone died of cancer constitute a negative statement? What if instead of cancer, it's AIDS? If their brother was killed in action, is that negative? What if their mother fled Germany during the war? Or if they had a gay uncle? A few comments are necessary here:
 * 1) First, I apologize in advance for listing anything that you disagree with being negative. It is not my attention to dramatize the extent of the BLP problem (trust me, it contains enough drama on its own).
 * 2) Currently, the list is a binary affair. Something is either negative and unsourced (and listed here) or it is either negative and sourced or not negative (and not listed here). There's no spectrum currently used to distinguish between saying someone is a pedophile versus saying they are gay. And given the amount of extra work needed to implement a spectrum, it will likely never happen. (But it is something to consider for future projects.)
 * 3) Help me improve. If there are particular listings that you disagree with, make a section on this talk page and we can discuss them.

Questions
Thank you for your explanation above. A couple of questions:
 * 1) Do you remove or revert those results that are readily identifiable as vandalism or inappropriate article content?
 * 2) Do you return at a later time to the entries you mark as "deferred"?
 * 3) Are there other potential locations for the output of your listing, other than a page on Wikipedia that is accessible to any member of the general public, and whose existence could be publicized at any time? As I have mentioned off-wiki, I find the focus on cleaning up BLP violations to be laudible, but I am concerned about having an on-wiki page that sets forth at length any given day's collection of vandalisms, libels, privacy-invasions, and unsourced negative statements against potentially dozens of people.
 * 4) If you have been keeping track or have a general sense, what percentage of the entries on the list have resulted in (1) sourcing of the statement in question, (2) removal of the statement from the article, or (3) no change?

Thanks for any information you can provide on these points. Regards, Newyorkbrad (talk) 23:12, 25 February 2009 (UTC)
 * I don't remove vandalism, especially longstanding vandalism. For example, I added an entry to the page today that was quickly cleaned up by another user. I did look up how long it had been in the page though. October 25, 2008. Ouch. Also, occasionally I'll come across an entry that is vandalism that has been reverted already by another user. Those I mark as false positives.
 * At the moment, the things being marked as deferred are generally things that could simply use better sourcing (or sourcing at all, really), but aren't negative. Or sometimes I will put things I'm skeptical about into the deferred status, for example Thomas Reilly has a sentence about him coming "under fire from gun rights advocates for allegedly abusing his regulatory authority" with a reference next to it. That's the type of thing that probably needs closer examination, but doesn't really need immediate attention (in my mind, at least). Also, it's important to realize that the initial list has about 160,000 entries to be checked. I've reviewed about 1,000 entries. So there hasn't been too much time spent on a second pass while the first pass is still in its infancy.
 * Currently this interface is set up on a specific port of the domain pruebita.org (http://www.pruebita.org:1337/review/ to be specific). The entire pruebita.org domain is excluded from search engines using a robots.txt file. However, as we discussed, the bigger concern is more mirrors and other sites copying the content, like this. And no amount of de-indexing on Wikipedia will do anything about that, unfortunately.
 * Statistics are a bit difficult to come by right now. One of the most difficult aspects is that due to the nature of some of these phrases, like "dick," the false positive ratio can be dramatically skewed (Dick Cheney, Dick Clark, Moby Dick, etc.). Looking at the current numbers as a very rough estimate, I've reviewed 992 entries and listed 83 entries on the page here. That's almost exactly 1 out of 12. (An entry in this context is a specific match against one of the phrases. For example, an article on a serial killer or a terrorist suspect likely has seven entries or more for things like "allegedly," "killed," etc.) Answering your specific question (which I just realized I hadn't done), you can see the list and which entries are marked fixed. The "Facepalm" section resulted in five out of five deletions. The main list has quite a few marked fixed. A few were properly sourced. Others were trimmed down or clarified.
 * I should also mention that Nixeagle informed me he'll be outputting a similar list rather shortly (by Friday, perhaps) and he'll be using a different methodology, looking only at statements that aren't close to a &lt;ref> tag, I believe. His list will likely have many, many fewer false positives, but will only catch the most egregious examples, from my understanding. --MZMcBride (talk) 01:33, 26 February 2009 (UTC)

5. Did you intentionally name this page such that the acronym would be "ANUS"? Gurch (talk) 13:05, 18 April 2009 (UTC)
 * Seems to have just worked out that way. --MZMcBride (talk) 06:20, 24 April 2009 (UTC)

Bill Mullins-Johnson
''Moved from the subject-space page. --MZMcBride (talk) 18:07, 26 February 2009 (UTC)''

Yes &mdash; major news story in Canada for over a year, up to his acquittal, and a major national TV documentary segment on his story. There have been a series of similar wrongful convictions in Canada (see the info box at the bottom of the article), and they remain a current topic. Follow some of the sources at the end for more info. David (talk) 01:24, 26 February 2009 (UTC)

William French Anderson
I found 4 sources listed in the article. 2 linked to newspaper websites, and 1 linked to a science magazine. I deemed the sources reliable. Feel free to delete this message once it has been read. Griffinofwales (talk) 19:15, 12 March 2009 (UTC)
 * I think inline citations help a great deal in cases like this. I've added one to that paragraph. --MZMcBride (talk) 20:14, 12 March 2009 (UTC)

ok Griffinofwales (talk) 20:50, 12 March 2009 (UTC)

Quick update
As some may have noticed, the updates to this page have stopped. When this list was originally compiled, I was operating under the assumption that Category:Living people was properly filled. It turns out that at least approximately 20,000 biographies of living people are not in Category:Living people. So at the moment I've diverted my attention from finding unsourced negative statements to filling this category properly. I've posted at AN about this; hopefully I will find some people willing to share some of the workload. --MZMcBride (talk) 04:11, 24 March 2009 (UTC)
 * Hmmm... let's see: a tough, time-consuming, thankless task (what I'll now call TTTTs). I'm sure you'll have a ton of takers. :-/ Good luck. --Ali'i 16:15, 24 March 2009 (UTC)

Fricke
Took it to AFD but I didn't make a strong argument. It was kept. I just sent two to AFD because of no refs but refs were added. If anyone wants to they can make a note that it was kept at AFD. Alio The Fool 00:37, 5 April 2009 (UTC)

BLPs that require constant attention
I suggest we keep a list of BLPs that have recurrent problems with having poorly sourced negative information added, and are not watched by many editors. I've started a list at the bottom of the page; if project members could add these to their watchlists and expand the list, it will help keep these BLPs clean. Thanks. -- JN 466  19:07, 28 July 2009 (UTC)

Bio Reviewer
I like the searches you detailed above. I would help sort through articles that show up on it. Ping me if you do run the script again. Alio The Fool 20:28, 19 August 2009 (UTC)