Wikipedia talk:Articles written by a single editor/Archive 1

Obvious problems with this page
Lets see, "pick a "CLUMP" at random, and fix the broken articles." The first problem is that the directions make little to no sense. Clump? What? Broken? What exactly needs to be fixed here? The second problem is that we are picking articles at random. How can anybody be expected to "fix" articles that they don't know anything about? Do we really want this to happen? Editing for the sake of editing may be dangerous. I'm sure this page has good intentions, but really. And what's up with all the smiley faces? 76.93.90.245 (talk) 07:34, 21 June 2008 (UTC)
 * We changed the instructions part a lot recently, it should be much clearer. We don't expect people to fix articles that they don't know anything about, we expect people to flag these articles appropriately, so that the topic experts can find and fix them :-) (I like smiley faces!) Nicolas1981 (talk) 06:04, 7 November 2008 (UTC)

columnating the lists?
This may be impossible or too much work for you, I don't know, but it would be nice if this script could at least columnate the lists, as right now the titles bleed into each other.
 * So far there is almost no formatting, so indeed I would be glad if someone came with a way to make the list more usable. I think it will not be trivial, though, because some titles are very long. Nicolas1981 06:14, 17 August 2007 (UTC)

Otherwise, I can just imagine how useful this would be in the hands of a band of deletionists! I'm going to watch this page, maybe I can do some AfDs based on it.
 * I am rather an inclusionist than a deletionist. My goal with this list is really to improve articles. I am optimistic, that's the way Wikipedia works in my opinion: the more people involved, the better the content. Nicolas1981 06:14, 17 August 2007 (UTC)

problem #1 found
See 4 yo information - the page was salted, but it came up in your list.

Maybe there's some way you can check each article in the list to see if it starts with the deletedpage tag?


 * Thanks for your report ! The list is not dynamic. It is produced on a given day, and some parts will probably be outdated on the next day. Also, it depends on the correctness of . I haven't figured out any way to fix this so far :-( It should not be a major problem if the list is updated regularly enough, let's say once per month. Nicolas1981 06:14, 17 August 2007 (UTC)

Beyond A?
My first suggestion is to somehow give something beyond the abbreviations beginning with A. All of the articles currently listed have a high probability of not being fun to edit, while article titles that begin with words or names would be less intimidating, at least for this editor. --Evil1987 21:49, 9 August 2007 (UTC)


 * Indeed, I have just processed a tiny fraction of Wikipedia, so far. I am traveling/couchsurfing in Japan right now so it is not easy for me to run it. If someone has CPU-time available, it would be kind to run the script on the rest. You will need to install Java and Groovy, then just run the script. Modify the INITIAL_INDEX_URL variable to start from AKK (disambiguation). Then edit the ASE page and append the produced list (found in the generated plaza.txt file). It works on Ubuntu Linux, never tried on other systems, let me know. Nicolas1981 06:14, 17 August 2007 (UTC)

Too large
This page is 238kb, which is making my browser very slow, and also frequently doesn't load properly (resulting in a completely blank page, with no source code transmitted, but the browser-status-bar claims the load is "done" (firefox-latest, linux, plenty RAM)). Needs to be split or something similar. -- Quiddity (talk) 06:41, 7 December 2007 (UTC)
 * I am using Firefox+Linux too, and indeed the page is often blank. And history diffs just never works :-( If anyone has an idea on how to split, feel free to ! It would be nice to let at least some of the chunks on the main project page, though, in order to quickly show something to visitors, and avoid expensive clicks. Nicolas1981 (talk) 18:39, 5 January 2008 (UTC)
 * Solved: Thanks to user ZooFari who split the data into different pages, there are no performance problems anymore :-) Nicolas1981 (talk) 06:39, 7 November 2008 (UTC)

Request for Deletion
This page is too long and too much of a problem. In my opinion, it has no use and picking random articles from somewhere you don't know about is pointless. An article could have been made by 2 editors, maybe even 5 editors, and it would still have incorrections. Also, why do all the articles listed start with "A"? And as mentioned before, what's up with the icons and smiley faces? This page should either be deleted or made to something more simple. --ZooFari (talk) 22:47, 29 June 2008 (UTC)
 * i agree --Melly42 (talk) 13:30, 21 July 2008 (UTC)
 * I think the original idea behind this page might have been to inspect all the single-author articles; spam articles tend to be single-author, and so this page would be a big help in combatting spam at Wikipedia. But as it turns out, many good and useful articles are also single-author. So unfortunately, as ZooFari notes, we end up with a huge list even if we limit articles to the letter 'A'. If there was a veritable army of editors using this page, it'd be a valuable item to keep... I just don't know if the veritable army exists right now. AllGloryToTheHypnotoad (talk) 16:53, 21 July 2008 (UTC)
 * The project has not yet motivated an "army of editors", but already hundreds of articles have been checked. Cheers to the editor who recently split the huge list into separate pages, that makes the project more user-friendly, and will probably attracts more editors. Please share any thought on how to make the project more user-friendly ! :-) Nicolas1981 (talk) 10:58, 10 October 2008 (UTC)
 * Of course most articles in the list are good articles. But try checking 50 of them and you will find a number of articles that have nothing to do on Wikipedia, or need to be tagged as stubs. And without this tool, such articles would probably stay like this for years in Wikipedia, unnoticed by editors. Nicolas1981 (talk) 10:58, 10 October 2008 (UTC)

What does this paragraph mean?
I just finished copy-editing this first page, but I could not figure out what this paragraph means: "If the link to an article is broken, be sure to remove those from the list, or fix them to link it to the correct article." Those? Them? There are no plural nouns for these pronouns to refer back to. In total confusion, I am yours sincerely, GeorgeLouis (talk) 07:36, 2 November 2008 (UTC)
 * Thanks for the notice. It is now fixed:)  Zoo Fari  17:00, 2 November 2008 (UTC)

Refresh
A question, are the chunks refreshed ? lifting the first page from a chunk, http://en.wikipedia.org/w/index.php?title=Angkor_Borei_District&action=history, 2 others made an edit on it, assuming that adding a category also adds some visual control. Good idea btw. Cheers Mion (talk) 10:31, 2 November 2008 (UTC)
 * Hi Mion ! That's a problem indeed, the list was generated a while ago and has been not been refreshed since. A fresh new list will be provided when everything has been processed. If you are motivated, please generate a new list using the open source generator (preferably starting from letter B). That would be great :-) Until then, some articles have indeed been modified by various authors, which means they have higher chances to be fine... but still, it does not harm to check. When this project starts getting more popular, I will make significant improvements, including more frequent updates. Cheers :-) Nicolas1981 (talk) 11:38, 3 November 2008 (UTC)
 * Hi Nicolas, it does not harm to check is right, but with the amount of pages, double checking is wasted time, maybe adding a time stamp to the chunkname is a good option, if they get outdated, just remove them and replace them with fresh ones. Cheers Mion (talk) 12:23, 3 November 2008 (UTC)
 * Good idea ! Actually, the timestamp for all chunks is the same: 3 August 2007. Yes, quite old :-/ Nicolas1981 (talk) 23:14, 3 November 2008 (UTC)

Let's improve the list generator !
A significant portion of the single-editor articles are good articles about some plant or insect or small county, those articles were generated automatically from a database by very experienced users, and they are very good even though nobody checked them. We're wasting time reviewing them. So I have just got an idea: the list generator could filter out articles whose creator has already created more than 100 articles. What do you think about it ? Any other ideas to improve the list generation algorithm ? Right now, the algorithm is just to pick up all articles which have a single non-bot author. For those interested, the current source code of the list generator is here, it is written in an easy cross-platform script language called Groovy and released under the GNU GPL. Any idea welcome ! :-) Cheers, Nicolas1981 (talk) 23:26, 3 November 2008 (UTC)
 * I like it. This will decrease a huge amount of articles, and we might be able to add articles beyond "A"! I'm not a very experienced editor, so I'm not sure how the generator works. But as Nicolas mentioned, anyone may volunteer:)  Zoo Fari  00:44, 4 November 2008 (UTC)
 * A hundred? Why not just fifty? Or fewer? Questioningly, GeorgeLouis (talk) 01:05, 5 November 2008 (UTC)
 * Well, let's guess... poor articles (good stubs are not poor articles) tend to be written by inexperienced wikipedians, who probably never created an article or created just a few ones... I guess you're right! Let's try with 5 creations before the creation of the article in question, sounds good ? Nicolas1981 (talk) 06:24, 7 November 2008 (UTC)
 * We could also put into consideration the number of wikilinks pointing to the article. Roughly put: the fewer the wikilinks, the less people have read the article, the more it needs to be checked. From your experience, does it make sense to filter out from the list the articles which have more than a certain number of incoming wikilinks ? If yes, how many ? Nicolas1981 (talk) 06:24, 7 November 2008 (UTC)
 * Some tools that could be useful: Pages containing wikilinks to a given article List of creations by a given user. Of course it would be better to get this information from an offline dump than hitting servers... anyone knows where to find this data in Wikipedia dumps ? Nicolas1981 (talk) 07:03, 7 November 2008 (UTC)

Terminology
Can we use some terms other than "Chunk" and "Pile"? GeorgeLouis (talk) 01:12, 5 November 2008 (UTC)
 * Yes, sure ! As you probably guessed, I am not a native speaker so I have a hard time guessing what "sounds good" or not. Let's have this terminology debate here. Over the time, I think we have already been using CLUMP, CHUNK, PAGE, PILE. What are the other candidates ? ;-) Also, a total reorganization is always possible, if anyone has ideas and time to experiment (preferably on a test page like WP:ASE/test1). Nicolas1981 (talk) 12:24, 5 November 2008 (UTC)

How about "Section" and "Subsection"? Also, what is the purpose of these groups in the first place? Are we supposed to edit all of the articles in one Subsection before we start on another one? Questioningly, GeorgeLouis (talk) 22:07, 15 November 2008 (UTC)
 * Interesting question ! As it is a different topic, I reply in a new discussion below. Cheers! Nicolas1981 (talk) 06:16, 16 November 2008 (UTC)

Debate: How to format/group the list of articles ?
At the beginning of the project I just posted the articles as a very very long list. However, after checking a number of articles, it was quite difficult to remember which ones to remove. So I divided them into groups of 30 or so, a manageable number that one can process in maybe 15 minutes, and then remove easily. Because the huge page was causing server problems, user ZooFari then split it in two pages, and also added another level of separation. A fresh new list will appear in a short time, so now is the time to debate what the most appropriate format is ! Please everyone write a paragraph describing what you think is the best formatting/grouping of the articles :-) Nicolas1981 (talk) 06:16, 16 November 2008 (UTC)

Here is my proposal: Each "PAGE" contains 10 "SECTION"s. Each "SECTION" contains 3 "SUBSECTION"s organized as columns (so that's only 3 columns, as opposed to 8 right now). Each "SUBSECTION" contains 30 articles, one line per article. Please write your own proposals too :-) Nicolas1981 (talk) 06:16, 16 November 2008 (UTC)

Here is my second proposal after more reflexion: Only one page referred to as the "Articles list". It contains around 30 "Section"s of 5 articles only. 5 really needy articles will take the same time to process as 30 of the current articles, because the latter actually contains a lot of articles that do not need to be fixed. Nicolas1981 (talk) 10:50, 17 November 2008 (UTC)
 * In my opinion, 30 sections would be too much (or I guess too "cramped"). Maybe 20? Though I don't have much to say about proposals, I would be willing to reformat the pages:) So if you have come up with the final proposal, let me know!  Zoo Fari  04:14, 18 November 2008 (UTC)
 * Given that each section contains only 5 articles, I think 30 sections is not that much, it is under 200 lines. I will rerun the generator now to see how it looks. And it will refresh the articles a bit :-) Nicolas1981 (talk) 04:36, 18 November 2008 (UTC)
 * It would be great to see a small preview (not saying with all articles). But since each subsection would include only 5 articles, then I guess it wouldn't be to long. By the way, would this really cover all the articles in the list, excluding the ones the bots will clear? Zoo Fari  04:17, 19 November 2008 (UTC)
 * As you have probably seen by now, the Articles list is exactly a preview of this :-) What do you mean by "excluding the ones the bots will clear" ? I don't intend to write a bot that would remove articles from list, and I don't think anybody is currently writing one. Nicolas1981 (talk) 05:10, 19 November 2008 (UTC)
 * First experience: I processed a section yesterday, and I have to say I am quite happy with the current formatting :-) It took me 30 minutes for 5 articles. That's because all articles actually had to be fixed in a lot of ways, and also because the topics of the articles attracted my curiosity and I spent some time wikiwandering around them ;-) 30 minutes is a reasonable amount of time, and I am satisfied because I have brought 4 of the 5 articles from an unacceptable level to a good stub level, I had never achieved so much on Wikipedia in just 30 minutes ! The fifth article will need the help of an electronics expert to make it a good stub, so I just tagged it. Nicolas1981 (talk) 05:10, 19 November 2008 (UTC)

A note on the new planned list
I plan to, by December 20th, update the list of Articles with a Single Editor.

This change will, most obviously mean: -Articles which have had a second non-bot editor since August '07 will no longer be included. -New articles with only one non-bot editor will be included.

It's also going to mean some filtering of some pages which probably do not need inspection:
 * Redirects.
 * Pages with authors with more than 100 edits to their name. (Nicolas' idea) This works under the concept that if they've been able to make 100 edits without being banned, most of those edits are probably okay, and they've been here long enough that they probably know what makes a decent page.

That said, we will still be left with, if my estimates are correct, anywhere from 50,000-160,000 pages. This is not an unmanageable number, but it is one we can afford to narrow down even further to the pages with symptoms most indicative of a problem page.

So, although this might be kind of controversial, I will *ONLY* be including in this list:
 * pages which have had two or less edits
 * pages without templates.

I understand that templated pages, pages with many edits, and pages by experienced wiki-community-members may still be vandalism, or in need of cleanup, or otherwise seriously in need of attention. But I am working off of the assumption that template-less pages with one or two edits from an inexperienced author, are much more likely than otherwise to be in need of attention.

We will still, I imagine, be left with thousands of pages. - Monk of the highest order (t) 10:37, 15 November 2008 (UTC)

P.S. I'm composing and narrowing down this list using a python bot which uses wikipedia-exported XML (so as to reduce wikimedia server workload). Once I'm done making the list I'll give out the source. (Sorry Nicolas1981, as much as I love borrowing code, I'm a python fan thru and thru, cause my perl is rusty. Cause I come from the city, but I live in the suburbs. That said, it's not based off your code, but you still deserve much props for making it and releasing it! Making a spider program is hard work!)
 * Regardless of the idea, editors should become more involved in this project. This list needs to be reduced soon so we can start listing modern articles and articles beyond "A". It's a good thing we have an editor that knows about codes and scripts (so thank you). Hopefully this idea will reduce articles atleast 25%. Again, thanks. Also, thanks to Nicolas1981 for sharing the scripts:)  Zoo Fari  16:10, 15 November 2008 (UTC)


 * I am delighted that you are willing to contribute to this project, it is much needed, and the more developers involved, the more sustainable the project will be :-) I don't know Python yet, but I don't mind switching to a new language to contribute to your code. To cooperate efficiently, we should start documenting clearly what a needy article is, for each main version of the software. Something like: v1 (current) with single editors, v2 looking at creator's experience, v3 additionally looking at templates, etc. This will help us deliver solid software while debating on what future releases should feature. I modify the note to sound less "nothing to see go away" and more "will be even better soon", because checking articles is still useful. Cheers :-) Nicolas1981 (talk) 06:59, 16 November 2008 (UTC)


 * You mentioned "a python bot which uses wikipedia-exported XML", I am very interested because hitting the web servers is bad. We have to gain experience working with those files. The latest one seems to be enwiki-20081008-pages-meta-history.xml.bz2 but it is 40 GB, compared to only 10 GB for  the 20080724 version. Any idea why it is so different ? Is one (or both) corrupted ? I am now in a South-East Asia country with a very very slow internet connection, so there is no way for me to download them anyway. Did you download a dump and successfully unzipped it ? How large is it then ? Thanks ! Nicolas1981 (talk) 08:55, 17 November 2008 (UTC)

First sentence
Lead_section says "The article should begin with a straightforward, declarative sentence that, as briefly as possible, provides the reader who knows nothing at all about the article's subject with the answer to two questions: "What (or who) is it?" and "Why is this subject notable?""

Additionally, new concepts should be wikilinked at first occurrence. I did not find an official guideline for this, but I guess everyone will agree. So here is my proposal for a new thing to check:

''The article must introduce its context. For instance, the Jazz article's first sentence must at least explain that Jazz is a kind of music, with a wikilink to the Music article. If it does not, paste   at the top of the article.''

I have found that most articles written by a single editor are so much focused on their expertise that they forget to explain what it's about. For instance, after reading the first sentence of Ambalatungan Group, I have no idea what the Ambalatungan Group is. Last day I was reading an article about someone, and only in the end I realized it was not a real person but an obscure manga character. This is typically written by a domain expert who is very concentrated on his topic. A lot of ASE articles are like that, so we must explain how to deal with it. Ameen Faisal is an example of a good first sentence which unfortunately lacks wikilinks. Any comment on my proposal for this new thing to check ? Feel free to propose a better guideline ! Nicolas1981 (talk) 06:40, 18 November 2008 (UTC)
 * I agree that many leed paragraphs do not give the context, particularly some of the shorter articles that seem to be created by a bot. I mean, what the hell do bots know about context and meaning? In fairly high dudgeon tonight, sincerely, GeorgeLouis (talk) 08:14, 18 November 2008 (UTC)


 * Sure, bots don't guess the context... but humans tell them ;-) An article-creator bot is configured by a human to create a batch of articles about the same topic. See for instance Polbot, whose purpose in life is to generate articles for all plants in the iucnredlist.org database. All such articles are configured to begin with "___ is a species of plant ...". See for instance Annona ecuadorensis. Cheers :-) Nicolas1981 (talk) 10:18, 18 November 2008 (UTC)


 * Well, if you have no objection, I add the new guideline as expressed above. Cheers :-) Nicolas1981 (talk) 11:18, 18 November 2008 (UTC)

Hm, I don't get it. Why shouldn't an editor simply provide the context himself or herself? Why ask another editor to do so? It would really be a simple job in most cases to add a few words to the leed giving the context. Hope to hear from you on this. Sincerely, GeorgeLouis (talk) 18:13, 18 November 2008 (UTC)
 * I believe this is in advance, and shouldn't be required to be patrolled in ASE. I may be wrong, but in my opinion, it is not so important to be mentioned in ASE. Zoo Fari  00:20, 19 November 2008 (UTC)


 * GeorgeLouis: I totally agree with you, it takes more time tagging the article and have someone else fix the first sentence, than fixing it right away. I was thinking, it might be difficult if it is an article about something I don't understand at all. But actually I hesitated between telling to fix or tag. Since you suggest fixing right away is better, I changed the instructions to tell users to try and fix it first. Nicolas1981 (talk) 03:12, 19 November 2008 (UTC)


 * ZooFari: What do you mean by "this is in advance" ? Do you mean that all articles have a valid first sentence already ? Nicolas1981 (talk) 03:12, 19 November 2008 (UTC)
 * Ah, I see what you guys are saying. It took me a while to comprehend this :) You are saying that editors should check and fix the first sentence to introduce the article correctly? I would agree with this, since no bots can fix this.  Zoo Fari  04:09, 19 November 2008 (UTC)
 * Yes, that's it. The first sentence of an article is quite important because it is the first thing people read, so Wikipedia has special rules for its format. Also, some other projects such as Faviki use the first sentence as a short definition of the described concept. Nicolas1981 (talk) 13:33, 15 December 2008 (UTC)

Refreshed list, without articles created by experienced editors
This morning I was trying various solutions described in the formatting debate above and finally I ended up running the generator, and even added the feature to filter out experienced editors. So here is the new Articles list. Generating it was sooo slow, we need your help Monk of the highest order ! :-) Especially for exploiting the data dumps instead of crawling Wikipedia.

Filtering out experienced editors results in a 95% cut ! As for the accuracy of this new list: I have read a dozen articles and ALL of them needed to be fixed or tagged.

So, please everyone process a section and share your impressions and comments below :-) Cheers Nicolas1981 (talk) 11:14, 18 November 2008 (UTC)
 * Wow, this is so great! Thanks! I will go ahead and process a page. Do you want me to use the above format you mentioned earlier? Zoo Fari  04:30, 19 November 2008 (UTC)
 * Actually, it is only a very short list, because the generator is very slow by now. So I just did the formatting as described in the second proposal of the formatting debate. We should get a much bigger list by late December, and THEN we will have a big formatting task. Until then, let's get some experience with using the second proposal's format :-) Nicolas1981 (talk) 04:45, 19 November 2008 (UTC)

Archive talk page
Since AWE talk page has many discussions that were discussed a long time ago and aren't updated, I was maybe considering creating an archive. What do you guys think? Please let me know for your opinion.  Zoo Fari  00:39, 20 November 2008 (UTC)
 * The talk page is not that long right now, but why not :-) We don't need closed topics, or only exceptionally, so an archive page would actually make talking easier, yes. We could move closed topics there twice per year or so. Can you handle this ? Nicolas1981 (talk) 03:05, 20 November 2008 (UTC)

Bots and first sentence: Example of Annona ecuadorensis
The article on Annona ecuadorensis is a good example of how a properly programed bot can provide context. (See discussion above under First sentence.) Sincerely, GeorgeLouis (talk) 01:23, 16 December 2008 (UTC)
 * Exactly, the article you mention was generated by a bot and was already good before any human read it. Please note the wikilink on plant. We could think, well everyone knows what a plant is so no need for a wikilink. But the wikilink is important to avoid any ambiguity that exists or could arise when humans create an homonym concept. Because I am into semantics, I also think it is of utmost importance that concepts (articles) that can be generalized are linked to their generalized concept, but that's a different story. Cheers, Nicolas1981 (talk) 15:36, 16 December 2008 (UTC)

will update soon, sorry for not being on time
I will respond to messages shortly (think w/in a few days). I have had a busy month (finals. I am a COSI student...) and am traveling right now. Thank you. - Monk of the highest order (t) 02:28, 20 December 2008 (UTC)
 * Hi Monk, welcome back and looking forward to your help :-) Have a nice trip ! Nicolas1981 (talk) 12:33, 20 December 2008 (UTC)