Wikipedia talk:WikiProject Deletion sorting/Archive 2

This project is still alive
Most categories haven't been updated in a while. But the Australia one seems to be the most active one. I peridcally update various categories. I see a few others do it periodically as well. So, I hope it carries on, and beleive it will. This project is mentioned on the page giving AFD listing directions, and hopefully that will encourage more use. So, I consider active (though needing a major boost). Hence, I removed the historical tag. --Rob 05:39, 26 January 2006 (UTC)


 * Is this project still active? --Ugur Basak 20:20, 23 March 2006 (UTC)
 * Some of the pages are still active (NZ and Australia are, at least...), but some haven't been cleared out since they were created, and probably should be. Z iggurat 03:46, 3 May 2006 (UTC)

This project sounds good, and should really have more people (and maybe a bot or two someday) to bring it back to life. I've just listed a recent AfD on WikiProject Deletion sorting/Singapore and hopefully when I've the time, to retroactively scan WP:AFD/Yesterday and tag each AfD accordingly. Kimchi.sg 17:07, 15 June 2006 (UTC)

It is dead by De facto. If it were alive it would have more than 2 edits per month. Yeago 22:56, 15 June 2006 (UTC)

dynamic tool
Although i like the general idea about this, i think there is something wrong with it. It's maintenance havoc (if you wanna keep a bit of global overview). Besides we seem to forget that the articles usually already are categorized... Perhaps a tool such as used for stubs is more appropriate in this case ? See http://tools.wikimedia.de/~interiot/cgi-bin/queries/stub_sense?category=dutch+television&num_pages=4000&dbname=enwiki

I have no idea how to built something like that, but i suspect it will be easier to use (for users and wikiproject admins) and maintain (for the VfD admins) the deletion pages, then the current proposed sorting. - The DJ 17:09, 22 May 2006 (UTC)


 * That would be good, except the toolserver isn't working correctly for the english wikipedia. When it is that's certainly something to think about. Meanwhile I'm going to try and hack up some javascript tools to make sorting easier. the wub "?!"  09:24, 10 August 2006 (UTC)


 * Hmm... I like this idea. I've roughed up some PHP code that would take this part of the way.  See http://ytrewq.net/deletion, currently showing the results from a scan of the August 9 AfD log.  To generate this list, I created a text file with the URLs of all that day's AfD candidates, and used Wget to harvest those pages onto my server.  The PHP generates the data from those pages (avoiding the need to repeatedly query the Wikimedia server).
 * The sorting usually only needs to be done once for each day, so on first blush I think something server-side makes more sense than a client-side solution. Reduces server load, anyway.
 * The ideal tool would use the database dump to generate a category map, and use that to identify which (if any) of the existing Deletion Sorting pages are appropriate. That information could then be passed to a bot which would update the DS pages automatically.  If we could set something like that up, this project might actually be workable (although it would then rely on the category tags already placed on articles).
 * Anyway, thanks for bringing up this idea. Cheers, -- Visviva 13:35, 10 August 2006 (UTC)


 * That's pretty awesome, and way more advanced than what I was thinking of. However it stumbles when it comes to the large number of articles that come to AFD without any categories on at all. Definitely something to pursue though. the wub "?!"  20:17, 10 August 2006 (UTC)

Progress report
1. Automatic category matching (Status:  Poor).  OK, back when the above was written, I had the notion that (of course!) one of the things in the database dumps would be a dump of category relationships. As it turns out, that's not the case; if you want to create such a file, you have to download the entire current text of Wikipedia and run your own script on it. I have done so, and am now the proud owner of a 272,000-line SQL table containing all category relationships as of August 10 2006. Unfortunately, that doesn't get us very far, since:
 * a. It's (inevitably) out of date.
 * b. Wikipedia category structure is full of horrible ontological flaws, such that "Category:Turkey" becomes a 4th-generation subcat of "Category:Bulgaria."  Even if we clean these up in the live wiki, they're still in the database dump and I don't know of any reliable way to exclude them.
 * c. As the wub noted above, many deletion candidates are uncategorized, and almost none are fully categorized; thus any category-based approach will only be able to do a fraction of the sorting.
 * d. The script I'm currently using to map the binary relationships into a tree format for sorting is painfully slow and wasteful of system resources.  This could be fixed, I'm sure, but in view of the above I don't think it's worth it.

2. Keyword matching (Status:  Promising).  I've incorporated this into my code, see my website for a current demo. At present, this only searches for the category name (for example, "India") in the article text. Obviously, this could be improved by defining a set of keywords for each category. However, I'd have to say the initial results are quite promising. (I was very pleased to see a fresh catch for Zambia).

3. General issues (Status:  Messy).
 * a. Parentheses.  Currently this approach does not identify second, third, etc. nominations, thus the suggestion erroneously points toward the first AfD subpage for a given article.  Articles identified parenthetically (X (music)) are also misfiled.  Should be fairly easy to fix.
 * b. Page overlaps are not properly accounted for.  For instance, a deletion listed under "Schools" should not also be listed under "Education."  Again, this should be fixable.  However, we might consider whether it doesn't make more sense to replace the current tree structure with a truly flat one (no transclusion between pages).
 * c. No distinction is made between closed (speedied) discussions and others.  The person using the data has to sort this out independently.  This may be fixable, tho' only at the cost of doing live queries of en.wikipedia.org, always an iffy business.
 * d. Even with the above issues solved, the results are unreliable and have to be checked by a human being.  I don't think this is likely to change.

This approach could be fairly easily scaled to deal with IfD, MfD, CfD, etc. (maybe Prod?) as well. However, since most of those don't use subpages, such an expansion would require some adjustments to our page format. Might be something to consider during the general overhaul suggested below.

Thanks to TheDJ and the wub for stirring things up around here. Cheers, -- Visviva 09:11, 15 August 2006 (UTC)


 * Update: Dumped sorted output for Aug. 14 at WikiProject Deletion sorting/Ready ... there's a lot of wonkiness in there still, but it gives us something to start with.  -- Visviva 11:41, 15 August 2006 (UTC)
 * Update again: Tweaked it to rank keyword results based on whether the keyword occurs in the first sentence or first paragraph.  Surprisingly effective; the "best" results are almost always on the money.  Check it out.  (Bear in mind that at any given time I may be fooling with something.)  Using these results it took me less than 2 hours to sort a full day's AfD's, although I skipped some of the general sortpages like "People" (which we might want to ditch anyway). -- Visviva 20:58, 15 August 2006 (UTC)

Reform and Revival
OK, I'm extremely keen to try and revive this (did it ever really get started?). However I think we need to make some changes:
 * I have to say I don't agree with some of the goals on the project page. In my opinion this system should run concurrently to the current day logs of AFD, not try to replace them. Not only is this less ridiculously ambitious, it should help the project attract more support. From comments at AfD reform it appears a large number of people are in favour of some form of AFD categorisation, including Jimbo. There have already been a number of people pointing to this project.
 * The categorisation should be easy and voluntary. The AFD process is confusing enough to newbies already, and I still have to check what I'm doing or use a script when I list something at AFD. I think the current method of this project, that anyone can help by adding the delsort template and adding the debate to the relevant list at any time during the debate is ideal. If we make it easy enough, and useful, then people will hopefully start doing it themselves.
 * Related to making sorting easier, I've worked up a little script User:The wub/deletionsorting.js. It adds a series of links in the toolbox when editing an AFD which add the appropriate tag to the AFD, and also adds the debate to the correct list. At the moment it is still a work in progress, comments and improvements are very welcome. The main problem is that few of the categories are added. This is because...
 * The current categorisation is a bit of a mess. We could really do with sorting it out before doing much else. There is useful (if slightly old) information at the beta test page, and more discussion of categories at AfD reform. WikiProject Stub Sorting might also be useful for advice.
 * Do we really want to keep closed debates (e.g. at WikiProject Deletion sorting/UK/Closed)? Personally I think that these may have been appropriate when the project was proposed, but with the broader categories that I think we should move towards, not to mention the increase in AFD throughput, they will probably become unmanageable. If we do keep them then they should almost certainly be converted to plain links to the debates rather than transclusions.
 * The listing pages need to be maintained. A bot would be perfect for this, and I have asked at Bot requests. Such a bot could easily remove closed debates or move them to a closed page as desired.
 * Once all this is sorted, we need to publicise. We should notify other WikiProjects if we have a sorting category relevant to them, after all there's not much point sorting these things if no one sees our work.

Phew, I think that's the longest commment I've ever written on Wikipedia. Let's hope someone agrees. the wub "?!"  21:10, 10 August 2006 (UTC)


 * I agree. ;-)  ... In more detail:
 * Yeah, the goals are kind of out of date... the near-term goal of this project was always to do as you say; the longer-term goal rose out of some rather forgettable events of a year ago, and is probably best forgotten.
 * Yep. But experience has shown that without some automation, this cannot work on a global scale.
 * Cool.
 * Yep. In general, I think the key thing is that deletion-sorting pages need to correspond to an existing community of interest (ideally an active noticeboard or WikiProject).
 * In most cases, probably not, because they aren't useful. However, they might be useful for specific article types (webcomics &c.) where such a body of precedent can provide a basis for bottom-up policy revision.
 * Yep.
 * Yep. Did some work in that direction when this project was active, but that was, um, about a year ago.
 * Thanks for taking an interest here. Hope you stick around and we can finally get this to work. -- Visviva 15:27, 11 August 2006 (UTC)

Goals
I commented out the entire "Goals" section on project page, sinced it doesn't really correspond to anything we're working on now. New versions welcome. -- Visviva 09:36, 15 August 2006 (UTC)

Sorting
I've deposited the August 15th suggestions at ./Ready. I won't sort them myself for a few hours yet, in case someone else would like to try out this new arrangement. There's still a lot of wonkiness in the system, but I think you'll be favorably impressed (especially if you've ever tried to sort a day's deletions by hand). -- Visviva 00:46, 16 August 2006 (UTC)
 * I've had a go at some of them, doesn't seem to have done too bad a job. the wub "?!"  10:05, 16 August 2006 (UTC)

Update
See User:Visviva/Test for a revised sort of the August 15th AfDs. Here I've added keywords for sortpages A-C, and combined category and keyword matches into a ranking system. Articles get 4 points for having a keyword in the first sentence, 2 for having it in the first paragraph, and 1 for anywhere in the article text; they also get 3 points for a category match. This seems to work much better, and after some wrinkles are smoothed out this could perhaps be automated.

Problems:
 * 1) Currently the "first-sentence" scan is searching for the first period after the first five words; this is not reliable (abbreviations like "St." throw it off, and not all deletion candidates are punctuated).  I'll probably change "first sentence" to "first 10 words" or something of that nature.  (Might also change "first paragraph" to "first 100(?) words in first paragraph.")
 * 2) Some categories are particularly difficult.  For instance, the script threw up a lot of false positives for "Authors" because almost every book for deletion starts with a sentence like "X is a ... book by the famous ... author Y."  This leads me to wonder whether we really need these categories ("authors," "athletes," etc.) -- do these sortpages really correspond to a community of interest?
 * 3) There are certain cases like "Ireland" vs. "Northern Ireland" (to say nothing of "Georgia" vs. "Georgia") that need to be sorted by hand.
 * 4) One has to be careful with keywords -- I foolishly added "player" as a keyword for "Athletes" and ended up with a pageful of false positives.

With the exception of the problematic categories in #2 & #3 above, this new approach is almost perfect for those articles scoring more than 4. When I skipped the "authors" and "athletes" sections, I found only one goofup in the first 17 results that scored above 4. And that one (due to a run-on first paragraph) can probably be eliminated with the tweaks described in #1. This would put us in a position to automate about 90% of the existing sortpages (./Flat), and then we can figure out what to do with the remainder. -- Visviva 06:32, 17 August 2006 (UTC)

2nd update
There have been some definite improvements in the last few runs. Most remaining problems are limited to a handful of pages that are problematic in any case, like "Internet" and "Businesses." I have turned off support for a number of those pages, which in many cases don't seem to correspond to any community of interest.

We're not quite ready for the big time yet, but getting there. There are still a handful of really bizarre false positives showing up which I can't quite figure out, like the battery Eneloop getting a high relevance score for "Schools." Aside from that, the remaining problems have more to do with sortpage definition than anything.

You can retrieve the current output at. Today's has already been dealt with, but I've set up a crontab that *should* automatically reset that page to the next day's output at around 00:30 UTC -- we'll see if it works. -- Visviva 12:59, 18 August 2006 (UTC)
 * It should work next time. :-) -- Visviva 03:36, 19 August 2006 (UTC)
 * all right, it still isn't working. pathetic, huh?  :-) -- Visviva 11:59, 30 August 2006 (UTC)

Structural reform
OK, if we're going to make much more progress here we need to work on WikiProject Deletion sorting/Flat. This was just a rough draft in the first place, and really needs to be replaced with something a little more polished.

When this project started, the idea was to create a system where every candidate for deletion would be sorted onto at least one page. Thus, pages like "Sports" and "Internet" were created which don't really correspond to any definite community of interest. A tree-like structure of transcluded lists was envisioned (see WikiProject Deletion sorting/Beta). But that really hasn't worked out. Furthermore -- perhaps contrary to expectations -- I've found that a few big lists are more of a maintenance headache than a lot of little ones.

Going forward, we need to focus our energies on those pages which correspond to real communities of interest (whether or not those communities are especially active on Wikipedia).

I think it would be nice to have a sortpage for every country, US state and Canadian province. Often those with local knowledge can contribute a great deal to deletion discussions. We don't have to actually create those pages until there's a match, but the program should be looking for them. (Otherwise, how will we ever know that there have been Uzbekistan-related deletions?) The nice thing about geographic sortpages is that it's fairly easy to identify corresponding keywords and categories.

There are some pages which need to be eliminated, sidelined or redefined:
 * Social science
 * Computers
 * Internet
 * Sports
 * Arts
 * Education (?)
 * Events
 * Music
 * Business (possibly Businesses and Businesspeople too?)
 * Organizations and programs
 * People (and perhaps all subcategories; is there really a distinct community of interest for "Authors" as opposed to "Writing"?)

The problem with many of these is simply that they are over-broad. There is IMO no community of interest for "Music," but there are tens if not hundreds of Wikipedian communities of interest for specific musical genres. So in the long term, we may want to create more specific pages to replace those above. But in the short term there is no percentage in maintaining these large, unwieldy pages that serve no particular purpose. -- Visviva 04:30, 19 August 2006 (UTC)


 * Sorry I've not been around much recently, I've been digging through the backlog at CFD as well as feeding my new addiction. Anyway I made a list of possible categories at User:The wub/deletionsorting, though I get the feeling you're going to disagree with a lot of them. On things like Music I feel that we should gather opinions from people who not necesssarily know the particular genre etc. but people who have a good knowledge/opinion of what makes a music article worthy or not of inclusion (WP:MUSIC, AMG presence etc.) Our standards should be similar for all music articles, and in the long term hopefully grouping discussions like this will allow us to develop better and firmer standards. Plus there is the other point of AFD often acting like 'a cleanup tag on steroids', if an article does turn out to be worthy of keeping then references are added, stuff gets wikified, and the more people around to help the better. I am very keen on keeping the regional categories though, even those that hardly get any traffic, I just haven't bothered to put them in that list. the wub "?!"  19:22, 23 August 2006 (UTC)


 * I've actually come around more to your way of thinking since writing the above. Particularly in the case of Music (although I think it's also useful to have smaller lists if there is demand -- I don't think there's too much danger of segmentation).
 * Some of the lists you're proposing are ones (like "People"/"Biography" and "Technology") that I've been inclined to abandon as unmaintainable. This is partly due to my experience from blundering toward some kind of semi-automated sorting system -- very broad categories like these don't seem to be easily defined by keywords.  For instance, the word "author" will also appear prominently in articles about books by "the famous Foovian author Ms. Bar."   And the obvious keywords like "person" and "biography" don't usually appear in bio-stubs at all!  So while such lists would be useful -- and allow us to dream of 100% sorting -- I don't think they can be easily implemented in practice.  (I'm happy to be persuaded otherwise, though).  ...of course, such lists can always be maintained by hand, but that kind of work gets very discouraging very quickly.  -- Visviva 18:01, 24 August 2006 (UTC)

Javascript Tagging Tool
I noticed that the folks over at CVG-related deletions have a nifty little tool, and I've adapted it for somewhat more general use. Unfortunately it still is limited to 1 deletion-sorting page, so doesn't do me much good... but it should be a help for those with a topic-specific interest (and they're our target audience anyhow).

See Template:Deltab for more details, code and instructions. I'm not much for JavaScript -- I just tweaked the CVG code a bit -- so if anyone can improve it please do so. -- Visviva 15:28, 19 August 2006 (UTC)

Accuracy reports
See: Wikipedia talk:WikiProject Deletion sorting/Accuracy reports


 * That page speaks of an automated tool; does anyone know where that tool can be found, and perhaps explain why it isnt in use? John Vandenberg 06:30, 11 June 2007 (UTC)

Prod et al.
I've been thinking about how best to integrate those forms of deletion that don't use subpages -- Prod, CfD, IfD, TfD and RfD (maybe others?). Some of the independently-maintained sorting pages have put these in-line with (usually above) the AfDs. However, I'm not sure if this is ideal, since these can't be "read" on-page in the way that AfDs and MfDs can be. I'm leaning towards some kind of sidebar. I've added a sidebar to ./Template -- comments and improvements are welcome. Especially improvements; I'm not good with tables.

I've also started turning my still-imperfect script loose on prods. See a sort of the Aug. 23 prods here: WikiProject Deletion sorting/Prodsort. I will try to keep dumping this daily, but probably won't actually distribute them to the individual lists myself; however, anyone is welcome to do so (and to comment on any strange errors in the sorting). -- Visviva 17:40, 24 August 2006 (UTC)

On hold
At present my contributions are on hold, until I get my Linux box hooked up again. Moving is such a hassle...

On that note, let me thank the maintainers of ./Japan, ./Anime and manga, ./India, ./Australia -- and others -- for their hard work in maintaining those topical pages. -- Visviva 17:09, 2 September 2006 (UTC)

AfD categories
AfD categories is now "official policy", and seems to be having some success. I'm not sure where this leaves this project :-/ the wub "?!"  10:43, 8 September 2006 (UTC)


 * I don't quite see the contradiction between the two... the AfD categories should relieve us of some burden for maintaining the big & messy categories, but do nothing at all in terms of more specific/user-relevant sorting. Certainly this would be a good time to reorganize things a bit further.
 * By the way, is back online, but still running from the old system.  -- Visviva 02:40, 30 September 2006 (UTC)

Project directory
Hello. The WikiProject Council has recently updated the WikiProject Council/Directory. This new directory includes a variety of categories and subcategories which will, with luck, potentially draw new members to the projects who are interested in those specific subjects. Please review the directory and make any changes to the entries for your project that you see fit. There is also a directory of portals, at User:B2T2/Portal, listing all the existing portals. Feel free to add any of them to the portals or comments section of your entries in the directory. The three columns regarding assessment, peer review, and collaboration are included in the directory for both the use of the projects themselves and for that of others. Having such departments will allow a project to more quickly and easily identify its most important articles and its articles in greatest need of improvement. If you have not already done so, please consider whether your project would benefit from having departments which deal in these matters. It is my hope that all the changes to the directory can be finished by the first of next month. Please feel free to make any changes you see fit to the entries for your project before then. If you should have any questions regarding this matter, please do not hesitate to contact me. Thank you. B2T2 13:35, 26 October 2006 (UTC)

Leaving
Hi,

Since I've been the principal contributor here, I think I should formally mention that I'm withdrawing from the project. I still think this can work, and would be a benefit to Wikipedia, but am not prepared to devote the time and energy that would be needed to make it work. If & when anyone tries to start this up again in a systematic way, please drop me a line -- I'm still happy to help out when I can.

Thanks to all the people who have worked on this project, and especially to those who continue maintaining various subpages. Best, -- Visviva 13:53, 2 November 2006 (UTC)

What is the precedence on userpages?
Hello, User:GabrielF/ConspiracyNoticeboard is currently up for deletion. My question: I am wondering what the precedence is on deleting userpages which encourage others to comment a certain way in AfDs and on wikipolicy.

Spam and User page don't seem to address this particular issue. I am simply asking what the AfD precedent is. Thanks in advance. Best wishes, Travb (talk) 02:18, 26 December 2006 (UTC)


 * Why not just move User:GabrielF/ConspiracyNoticeboard to Wikipedia namespace, as a subpage of this WikiProject? This particular page is a list of articles for deletion where the discriminant is whether an article is a conspiracy; however the subject of the page is a much heated debate.  Things like this shouldn't go in the user namespace anyways, and this user page can have a much better life as a part of this WikiProject.  Regards, Tuxide of WikiProject Retailing 07:34, 11 January 2007 (UTC)
 * This is starting to reach the point of harassment. If GabrielF wants to have a subuser page, let him have it.  The page has been through two MFDs already.  If you or someone else wants to maintain a Wikiproject by the same name, have at it.  Nobody is stopping you.  The Illuminated Master of USEBACA 16:51, 11 January 2007 (UTC)
 * I derived my proposal from those MFD discussions. Although the result of one of them was keep, it should read as kept without prejudice against a consensus move/redirect because that idea was brought up there.  Regards,  Tuxide 23:38, 11 January 2007 (UTC)
 * I have no objection to moving the page if the wikiproject wants it.GabrielF 12:57, 11 January 2007 (UTC)
 * I have created WikiProject Deletion sorting/Conspiracy theories. We'll see what happens. Personally, I liked being able to comment on AfDs and add misc. discussions, deletion reviews, and XfDs, but if this is the best way to do it... GabrielF 15:18, 11 January 2007 (UTC)

Wikipedia Day Awards
Hello, all. It was initially my hope to try to have this done as part of Esperanza's proposal for an appreciation week to end on Wikipedia Day, January 15. However, several people have once again proposed the entirety of Esperanza for deletion, so that might not work. It was the intention of the Appreciation Week proposal to set aside a given time when the various individuals who have made significant, valuable contributions to the encyclopedia would be recognized and honored. I believe that, with some effort, this could still be done. My proposal is to, with luck, try to organize the various WikiProjects and other entities of wikipedia to take part in a larger celebrartion of its contributors to take place in January, probably beginning January 15, 2007. I have created yet another new subpage for myself (a weakness of mine, I'm afraid) at User talk:Badbilltucker/Appreciation Week where I would greatly appreciate any indications from the members of this project as to whether and how they might be willing and/or able to assist in recognizing the contributions of our editors. Thank you for your attention. Badbilltucker 18:41, 30 December 2006 (UTC)