User:FT2/Shadow

Aim:

A software program that allows quick offline reviewing of a number of articles and editors contributions, to gain a sense of how matters have developed in an editing dispute, to be able to quickly scan for "who did what when", and its companion question, "where else was it done, by whom", and so on.

Also useful to quickly get diffs for these for use in cases.

Terms I'm using
In case others use different terms :)


 * Edit ID -- the "id" used to identify a specific permanent version, or the two id's (or one ID + prev/next) used for a DIFF.


 * DIFF HTML - the actual HTML render of a DIFF, the green, grey and yellow bit at the top of a DIFF page :)

Overview of data
A lot of data, sadly. And because it's a big job I don't honestly mind if it runs in the background grabbing data at a civilized rate for a few hours or overnight, that's fine by me. Wouldn't want to overload the server.

The rough outline is that it pulls down all relevant histories and contrib lists, and (initially) the most recent of these DIFFS. There could be easily 5k or 15k of DIFFs (users may have 5k edits, talk pages could easily have a thousand or so), so thats why it grabs all the DIFF information, but (initially) only a selection of the actual DIFFS and page contents.

It then pulls down other diffs on demand ("click to get the DIFF on this edit"), and save these in its DB (if not already held). The list of diffs and the actual DIFF can then be (fairly simplistically) displayed and scanned visually, or sorted and filtered.

As an additional function it also allows quick selection and display of "diff between two points", so that one can select any two edits and it'll grab the diff between them into its database too and display it. This shows what effect a bunch of edits have had combined.

DB Engine
MySQL or MSAccess. Access is actually pretty good for me, but if others use it MySQL may be more sensible. Try Access 1st to test the usefulness :)

Overview of DB tables

 * A table that holds a list of edit ID's it knows about, sucked fron various user contribs, article histories, whatever. Some of these it'll have pulled page content for, others it wont have pulled more than header info yet and page content will only be pulled on demand.
 * FIELDS:  edit ID (primary key), user, datestamp, article, namespace, edit summary.


 * A table that caches all wiki-markup and rendered HTML that's been grabbed from the server.
 * FIELDS:  edit ID (primary key), wiki-markup, rendered HTML.


 * A table that caches all DIFFs pulled from the server to date. These are mostly just normal diffs between two successive edits on the same page, but any diff between two edit-id's could be cached in this table, if the user asks for the diff between two non-consecutive versions to be pulled. So once a DIFF is pulled, its data can be found for future.
 * FIELDS: editID, oldeditID, diffHTML

Also information on history of some kind, allowing one to skip to different views or filters to double check stuff.

Initial input
The program accepts:
 * A list of articles and users
 * A date or edit count entry (to limit what diffs are initially pulled)

For the articles and named editors -- the program first their entire edit history or contribs records. Not all pages or diffs will be initially pulled down, but even for those not pulled, the edit info for all DIFFs is grabbed (even if the diff itself isn't pulled from the server).

For each of these that's within the date or edit count range, it also grabs the markup for the revision, and the rendered DIFF HTML and rendered article, from the DIFF page, and populates the two caches with these for all DIFFS and edit IDs that it has pulled.

It also grabs the logs for the named editors. (User uploads, user page moves, admin page protects, admin page deletes, admin user blocks, and block logs)

Basic overview
Upon completing the above load, the program "knows" about all relevant edits. For many of them it also has the wiki-markup, the rendered HTML, and the formatted DIFF from the previous version. Those it doesnt have, it will load on demand if the DB entry is empty.

It also has somewhere a copy of the header and footer of a typical current DIFF page, so that the HTML chunks can be re-rendered at will.

Filter/sort
Filter/sort (by editor, date, and text search via manually entered SQL "WHERE" expression, or by completely manual SQL WHERE clause) - past filters and sorts remembered and listed for quick recall.

History/contribs list view
Split screen with a listbox of editIDs at the top, the (selectable) diff / wikimarkup / rendered HTML in the bottom half, and the edit ID info (URL, author, date, id# etc) in a line at the bottom for easy copying.

Purpose - allows scrolling through selected diffs quickly, with display in any of (markup/html/DIFF) in the bottom panel. if these aren't cached they are grabbed as needed at the time.

The DIFF list also allows multiple DIFF selection - clicking a button if 2 DIFFs are selected grabs the DIFF between those 2 versions (if not already cached) and displays that, until the selection is changed.

Stacked view
For a set of selected editIDs in the listbox, create a single view showing DIFFs stacked one after the other with a heavy line in between, to allow single page review (and normal text search) of all the selected DIFFs on one page. Clicking on a diff pulls up the markup or rendered text for the editID concerned.