User talk:Tony Sidaway/Living people/tranches

A suggestion
Let me start by saying that I think this is a terrific idea. However would like to offer a possible tweak that could make it more efficient. Is there a way to limit the pages to low traffic articles that are on few/no watch lists already? Within each list of 5,000 there are bound to be hundreds that are already closely monitored so clearing them would remove a lot of clutter from each sweep. J04n(talk page) 14:45, 4 November 2010 (UTC)
 * Yes, it can be tweaked. If somebody has a source of unwatched articles or those on few watchlists, I can restrict the tranches to just those articles. Meanwhile if you find that some articles are hogging the related changes of your adopted lists and are obviously being watched already, just remove them from the page you adopted. If (when, really) I refresh the trenches, such edits will be scrutinized and checked, and the articles legitimately removed will be omitted from the new lists.


 * Also be aware that there is a possible tweak for your preferences. In the "Recent changes" tab you can check or uncheck the option "Enhanced recent changes (requires JavaScript)". Enhanced recent changes aggregate all changes to a page in any one day, which reduces the amount of work you have to do. --TS 14:55, 4 November 2010 (UTC)
 * Excellent, I was going to come here and complain about old edits not being rolled up--- Enhanced recent changes was exactly what I needed. Ill add that to the main page. Gigs (talk) 16:19, 4 November 2010 (UTC)
 * For obvious reasons, WP is reluctant to hand out a list of pages with few watchers. While I have no doubt they would trust TS, if that list were used to generate tranches, it would be a problem.
 * Good point on removing high traffic sites. The tranche I picked includes Bill Clinton, with 834 watching, I don't think my eyes are going to contribute much. I'll note how I plan to handle this, partly to ask if my approach makes sense. If it does, others may want to adopt it. I copied the list to a user subpage User:Sphilbrick/tranche_017 Then I'll edit that one, to remove high traffic pages. This way, if I'm away and not monitoring my pages, and someone else wants to take over, they can work from the original, and make their own decisions about which ones to remove.--  SPhilbrick  T  16:32, 4 November 2010 (UTC)
 * How do you know the Bill Clinton article has 834 watching? That's the kind of information that I would find useful. --TS 16:49, 4 November 2010 (UTC)
 * I have replied to this here on Tony's talkpage. Off2riorob (talk) 17:03, 4 November 2010 (UTC)
 * Sphilbrick, there is a difference between "publishing unwatched pages" and "removing the most watched pages".  We just need the latter here, not the more sensitive former.   Gigs (talk) 17:06, 4 November 2010 (UTC)
 * I understand. But if TS removes all pages with any valid entry in the field, the resulting list might not be exactly what vandals want, but it would be a great start. If TS wants to remove all pages with over n watchers, where n is a few dozen or so, that might be OK.-- SPhilbrick  T  17:10, 4 November 2010 (UTC)

In a few days I'll probably refresh these tranches incorporating various ideas such as omitting pages with many watchers (thanks for the information, off2riorob).

Two things to watch out for when I refresh:
 * The population of the tranches will change a little; at the edges, articles will shift up or down a tranche. If an article matters to you, you can put it in your watchlist, so if it disappears from your adopted tranche this shouldn't be an issue.
 * The ordering will also change following the ordering of the category. So if somebody tweaks "Adam Zalik" to sort by the surname instead of the first name, the article will skip to a completely different tranche when they are refreshed.
 * If we keep using this system for long, the number of tranches will change over time as the Living people category changes in population.
 * My bot will monitor removals from the tranches and I'll check each one manually and, unless it looks like a mistake, omit the removed items from the refreshed data. For the most part I'll assume you guys know what you're doing.

One side project I intend to get going is to accumulate a list of names that can be used to search for biographies that may be candidates for membership of the category "Living people". Please if you know somebody else who is doing this let me know.

And thanks all who have adopted a tranche. It's great to see people already volunteering to try this system. Be sure to tell your friends! --TS 17:41, 4 November 2010 (UTC)

Gray tranche: articles that haven't recently been edited
I've done a preliminary pass through the half-a-million BLPs in the "Living people" category and I found that the overwhelming majority have been edited in the past two days and all but 3-4000 have been edited in the past month.

I intend to create a tranche of older articles, say those that haven't been edited since September 30th at the latest. I'll call it the gray tranche.

The traffic on the gray tranche isn't likely to be high but it would probably represent articles more vulnerable to undetected vandalism, so worth checking occasionally and, if you have the time, it's worth reading through one or two of those articles in depth for BLP issues. If somebody does the latter then the article can be removed from the gray tranche page when they're done. As the articles are already members of the other tranches they'll still be covered, this is just to provide extra prominence to vulnerable, seldom-checked articles. --TS 19:46, 5 November 2010 (UTC)
 * You might want to double check that calculation. Last I heard we were running on circa a quarter of a million edits a day total, so for the overwhelming majority of BLPs to have been edited in the last two days the overwhelming majority of edits would have been to BLPs, and only one per BLP, whilst cluebot et al tend to edit articles that someone else has just "edited".  Ϣere Spiel  Chequers  21:13, 5 November 2010 (UTC)
 * Yes, I suspect that was one of my occasional caffeine free comments. Thanks for this. Please do force me to check my sums because otherwise this kind of estimate can acquire false authority.


 * Let's see if I can tease out what I intended to say. Rerunning my quick-and-dirty analysis, I see that 493597 of the 499210 known BLPs have been edited most recently in October or November, meaning the others haven't been edited since September 30.


 * If I'm doing something wrong with the figures please let me know. The "last edited" date is derived from the API, in a batch job I ran at 1500 GMT, using the "touched" property for each article in Category:Living persons.


 * I've double checked by sorting the results by timestamp then discarding all articles with a "touched" timestamp not greater than November 3 (Wednesday this week) and dumping the remainder into a file whose records I then counted ("wc -l"). It numbers 245385 out of 499210 known BLPs.


 * So it seems as if I was trying to say the following;
 * All but a handful (5613 articles) on the Living persons category were edited since September 30.
 * A vast number of articles were edited very recently: my estimate is 245385 between midnight on the morning of November 4 and 1500 on the afternoon of November 5, all times in GMT which is the standard clock for English Wikipedia. But I was absolutely wrong when I said "the overwhelming majority have been edited in the past two days". Well spotted!
 * 986 were last edited in 2009
 * 3088 were last edited between January and August 2010
 * 1467 were last edited in September


 * Adding together 986+3088+1467+493597=499138


 * That's close enough to 499210 for Friday night. Have a good weekend! --TS 22:32, 5 November 2010 (UTC)
 * 245,385 in 15 hours is approximately twice as many edits as we normally get to the whole project every 15 hours, and that includes ANI, WT:RFA and the whole of Pokemon. Since the last touched property includes a field called views, is it possible you are measuring when these articles were last viewed rather than last edited? That would kind of fit in with the relatively small number of edits on the tranches, (I looked at a couple at random but there were dozens of edited articles not thousands). Also I have several thousand BLPs on my watchlist, and while my watchlist is ridiculous, only a tiny minority of the BLPs on it get edited in any one day. I think what we are seeing is that almost all the articles are being viewed regularly, editing is something different. My experience of editing BLPs is that I'm frequently dealing with articles that haven't been edited for months.  Ϣere Spiel  Chequers  01:04, 6 November 2010 (UTC)
 * Yeah I think something isn't adding up. My tranche of 5000 is only getting like 100 edits a day.  That's good though, it means this tranche idea has some legs, since 100 people could pretty easily cover every BLP on here. Gigs (talk) 02:22, 6 November 2010 (UTC)
 * I hope you are right and my day was anomalous. My first day wasn't very heavy, but I had 221 edits yesterday; not sure I'm up to keeping on top of that every day.-- SPhilbrick  T  02:39, 6 November 2010 (UTC)
 * Well, once the well-watched articles are culled, it should go down even more. Also like the note on the main page now says, it's OK if you don't patrol all of them every day.  Make sure you turn on the advanced watchlist feature so it rolls-up the edits, that helps a lot. Gigs (talk) 03:41, 6 November 2010 (UTC)

I'll have to get to the bottom of this somehow. I'll rerun my gray tranche job but looking specifically at the timestamp of the latest revision instead of the "touched" timestamp. This will give the results I intended. Tasty monster (=TS ) 09:58, 6 November 2010 (UTC)
 * Great, lets hope that works, and as a bonus if we can confirm that those figures were views not edits we've learned something about the uBLPs.  Ϣere Spiel  Chequers  13:10, 6 November 2010 (UTC)

Rollbacks
One way to make this more manageable would be to filter out rollbacks, if there have been multiple edits but no net change do we need to have them on this list?  Ϣere Spiel  Chequers  13:10, 6 November 2010 (UTC)


 * In related changes, when you're using the enhanced option as discussed above, you'll see a number in parentheses representing the net change. Zero doesn't infallibly represent a null change but it may be useful in helping you to prioritise sequences of edits that do produce a net change. Note that you also have some control over what is displayed, just as when using the watchlist or recent changes. Check the controls at the top of the page. Tasty monster (=TS ) 16:10, 6 November 2010 (UTC)