Wikipedia talk:Snuggle/Archives/2013/January

Filtering and ordering the newcomers for display in Snuggle
This is the biggie. There are many different ways you could allow Snuggle users to filter and order Snuggle's database of newcomers.

There are two big options:


 * 1. Query language – This would be the most open approach. Let people filter and order the options using some sort code like SQL.  You wouldn't want to allow the full SQL functionality, but you could allow people to enter code like

WHERE N_RVTS > 2 WHERE FAITH = 'good' ORDER BY DATE_LAST_RVT;


 * 2. Widgets – As you currently have, this involves having a number of GUI widget to select the categories and the ordering that you want from a predefined set. Filters and orderings can be combined to get to the data the user wants.

Of course, you can combine these approaches in a number of ways. You could allow the user either method. You could give the user the option to create buttons that follow their own query code. You could make it so that the gadgets fill in the query code whenever they are used so that it is relatively easy for the user to tweak the code to get the effect they are after.

The next issue is what variables Snuggle users might want access to. Here is a list of things I think would be useful. In each case (except for the last two), it would be useful to have two versions of the variable:
 * Number of user-talk-page messages
 * Highest level of warning
 * Number of warnings
 * Number of negative messages (includes warnings but also notifications of deletion etc.)
 * Number of edits
 * Number of reverted edits
 * Number of reverts (Not the same as number of reverted edits because one revert can revert many edits)
 * Highest STiki vandalism probability
 * Mean STiki vandalism probability
 * Whether currently blocked
 * Whether ever blocked
 * ever e.g. how many edits has the newcomer ever made?
 * since viewed e.g. how many edits has the newcomer made since a Snuggle user last looked at him/her?

Some of these variables would be useful combined together. For example, a Snuggle user may find these helpful:
 * Number of non-reverted edits
 * Number of warnings + highest level of warning + number of reverts.

Here of some examples of when a Snuggle user would want some of the above variables. I am using NV to mean number since last seen by a Snuggle user.

Looking for someone who needs a hand: WHERE FAITH = 'good' ORDER BY NV_NEGMSGS + NV_RVTS; This would also enable users to find people who have been misclassified as good faith

At the other end of the spectrum: WHERE FAITH = 'bad' WHERE CUR_BLOCKED = 0 ORDER BY NV_EDITS – NV_RVTDEDITS; This would allow Snuggle users to find both good-faith editors who had been misclassified and bad-faith edits that need to be reverted.

In the middle we have: WHERE FAITH = 'ambiguous' WHERE NV_EDITS > 2 ORDER BY NV_EDITS; This would allow Snuggle users to have another look at ambiguous editors, with a view to classifying them as good-faith or bad-faith.

Yaris678 (talk) 21:46, 23 December 2012 (UTC)


 * This is a big request. Building up a query language is a little bit outside of the scope of this project, however, I agree that more power in searching through newcomers is necessary.  One bit that I'm trying to figure out how to implement is something that highlights a newcomers' actions since the snuggler's last visit or since they were last viewed by a snuggler.  I think that, in order for this to be useful, I'll first need to allow snugglers to "follow" newcomers that they are interested in.  I imagine another tab on the top (unsorted, good-faith, bad-faith, following) what would allow a snuggle to build a cohort and watch their activities.  -- EpochFail  (talk 21:55, 2 January 2013 (UTC)


 * I forgot to add that I'll also eventually be publishing an API for looking up newcomers in my dataset. This web API could support 3rd party (or is it 4th party now?) tools and notification systems.  Your writeup above is a good starting point for specifying that. -- EpochFail  (talk 21:58, 2 January 2013 (UTC)


 * I can see that a query language could be a big ask. I thought it was worth suggesting in case you could think of an easy way to do it.  Here's a suggestion that may be impractical but it sounds doable to me in my naivety...  Say you are storing the data in a mySQL database.  You could allow the user to input mySQL queries in a box in the Snuggle UI and pass them to the database and then display the output in the Snuggle UI.  The big thing is that you would need some kind of filter on the query that is input, to prevent people editing the database.
 * Is providing an API any easier? Obviously an API may very useful, but a box for a query could probably be used by more people.
 * Either way, I think expanding the filtering and sorting widgets would still be helpful. At the moment snuggle users have the following options:
 * filter by
 * uncategorized/good/ambiguous/bad faith
 * Number of edits (refinable by namespace)
 * sort by
 * last activity
 * reverted edits
 * total edits
 * I think it would help if the "sorted by" dropdown box also had the following options:
 * Number of non-reverted edits
 * Number of messages on talk page
 * Number of negative messages on the talk page
 * I appreciate that 3 might be harder than 1 and 2... and just having 1 and/or 2 would be helpful.
 * I think it would also help if there were extra radio button to allow a Snuggle user to choose between:
 * Ever
 * Since a date and time (input by the Snuggle user)
 * Since last seen by any Snuggle user
 * Yaris678 (talk) 13:32, 11 January 2013 (UTC)

Other suggestions
Yaris678 (talk) 21:46, 23 December 2012 (UTC)
 * Identify self-reverts and treat these as distinct from general reverts.
 * ✅ - self-reverted edits are now treated as normal, unreverted edits
 * Rename the "unsorted" category/tab to "uncategorised" or "unclassified". i.e. Use the word "sort" to mean order rather than categorise. That is what most software does (and this is what you do when you say edits are "sorted by".
 * ✅ - "unsorted" tab now called "uncategorized"
 * A link to the newcomer's talk page would be handy
 * ✅ - a link to the newcomer's talk page in now accessed from the icon next to their name
 * Each item in the summary of threads on the newcomer's talk page could be a link to the actual thread.
 * Get vandalism probabilities from ClueBot NG, as well as STiki.
 * Signify if an edit is "top", i.e. the latest edit to the article.
 * Signify if an edit has already been looked at by a Snuggle user.
 * Provide some more basic stats on the user in the "meta data" area of the user interface. Perhaps the most useful would be two numbers: number of edits and number of edits reverted. If these can be supplied in both "ever" and "since last viewed" form that would be excellent.
 * ✅ - numbers for "Revisions" and "Reverted" are now given.
 * When looking at a diff in Snuggle, you can currently go to that diff in the wiki interface by clicking on the date. For edits that haven't been reverted, it would be very helpful if there was a link/button that opened a combined diff of all subsequent edits, so the user can work out if the edit was eventually removed.
 * Not sure how practical this is, but it would be nice if there was an option to combine all adjacent edits. i.e. if a newcomer has made several edits to an article, without another user editing the the article, these are shown as one diff. Many newcomers make many edits one after the other (in some cases this is because they haven't found the preview button).
 * Maybe this one is too meta, but I think it would help to be able to see how the categorisation of a newcomer (as good-faith etc.) changes over time. This could be done as a boring list, but it I think it would be cool to have different colour blobs on the graph of edits over time. If a user clicked on the blob it would say, for example, "21:25, 23 Dec 2012, Yaris678 classified Celine03 as good-faith".

Bugs
Yaris678 (talk) 21:46, 23 December 2012 (UTC)
 * In the help menu, I clicked on "Full documentation" and it took me to WP:SN which redirects to Survey notification. I assume it's supposed to link to WP:Snuggle.

Edit persistence
I assume that the current test for an edit being reverted is based on edit summaries. Is this checked by comparing the diffs? Or checking for a net null diff in the case of an immediate revert?

I've been thinking about other ways of testing for reverts.
 * 1) Expand the edit summary regex. E.g this is a revert of this
 * 2) Have a more general regex still, which is bound to give lots of false positives, but then filter these by comparing diffs.
 * 3) Test each edit to see if it is an immediate revert. i.e. check for a null diff of the edit when combined with the previous edit and when combined with the previous chain of edits by a single user. Obviously, these checks would only need to be made if the previous editor of the article was a newcomer.
 * 4) Test each edit to see if it undoes any of the previous n edits or chains of edits by a single user.

The above seem doable to program but some of them may challenge the servers I suppose.

Beyond that, we are into a more general measure of edit persistence. I think that would be useful but it would obviously be more difficult to program so feel free to ignore this until you have everything else sorted out. Possibly the best approach is to have a measure of the magnitude of a diff, e.g. num characters added + num characters deleted + 0.1 * num characters moved, and compare the version before and after an edit to the latest version and see which is more similar. If the diffs have the same magnitude, the edit persistence is 50%. I think you'd want a formula like:

Persistence = 0.5 + 0.5 * mag(diff(before,latest))-mag(diff(after,latest)) / mag(diff(before,after))

N.B. My formula for the magnitude of a diff includes a term for the number of characters moved but the standard diff doesn't look for moved text. However, moved text is identified by the two algorithms above (User:Cacycle/wikEdDiff and De:Benutzer:Schnark/js/diff/core). A measure of persistence can be calculated without looking for moved text, but obviously the results go a bit wonky if text is then moved.

Yaris678 (talk) 21:46, 23 December 2012 (UTC)

Brain dump
So I have a few ideas about uses of Snuggle that I have to get written down.

Snuggle has the potential to produce some work-lists for Wikipedian mentors (e.g. WP:Teahouse hosts).
 * Good-faith users with Warning templates
 * This could be incorporated into Snuggle. It's hard to keep adding functionality to the same UI, but this might be useful enough.

Snuggle could also send out notifications to mentors.
 * A newcomer you're following has just been sent a warning
 * This could be a bot posting on a talk page.

In-wiki Snuggle summaries of users... I could write a wiki gadget that would add snuggle to talk and history pages to highlight newcomers and provide a visual summary of their activities.

That's all for now. -- EpochFail (talk 21:36, 13 January 2013 (UTC)


 * These all sound great. Yaris678 (talk) 18:12, 23 January 2013 (UTC)

Number of reverts reported
The number of reverts reported in the meta-data area doesn't seem to agree with the number of dots on the edit-history plot. Is this because the number of reverts includes self reverts but these don't get a dot now? Yaris678 (talk) 18:15, 23 January 2013 (UTC)


 * That is correct. I'm planning to make some updates to make the number coincide, but I haven't gotten to that one yet.  Thanks for the note about it though.  Your reminder helps me raise the priority.  :) -- EpochFail  (talk &bull; work) 20:34, 27 January 2013 (UTC)