Wikipedia:WikiProject U.S. Census

Back in Wikipedia's good ol' days, User:Ram-Man and his trusty bot, User:Rambot created tens of thousands of articles on Census Designated Places (CDPs-towns, villages, unincorporated townships, etc.) in the United States of America, based on information directly from the 2000 United States Census. A few of these articles are randomly sampled here: Samburg, Tennessee; Shawneetown, Illinois; Garrison, North Dakota; Unalaska, Alaska; and Funkley, Minnesota.

My, how the decade has just flown by! Anyway, by 2011, data from the 2010 United States Census will be released. This poses a challenge to the Wikipedia community for various reasons. As Wikipedia is committed to being timely and accurate, the CDP articles will invariably need to be updated with the new information, and herein lies the greatest challenge. Some (some Wikipedians contend many or most) of these articles (Funkley) remain vastly unchanged from their original forms, while others (Unalaska) are drastically different, with others (Shawneetown, Garrison, Samburg) with varying degrees of difference-itude(???).

The Problems
 * Bots - It was a bot that made the original articles, and ideally, a bot could go thru and fix everything, thereby saving everyone a lot of work. However, because the states of these articles have changed, a bot could no longer go in and simply replace "$2000censusdata" with "$2010censusdata" because the accompanying tags that indicate what each piece of data represent are no longer necessarily present. For example, what originally may have said "[x households] had a female householder with no husband present" may have been manually reformatted to say "there were [x households] consisting solely of a female, without the presence of a husband." Additionally, some data may have been completely removed.
 * Subsequent projections and historical data collection - In some articles, estimates of population and other demographics for years after the census (ie, 2006) were included, sometimes in the place of and sometimes alongside the 2000 data. Whether to keep these estimates and the 2000 data in the article for historical purposes, and how a bot could deal with these things are still quite up in the air. (Although, I suppose, everything related to this project is.)
 * Different census material - It is possible that the new census will be dealing with some different material than its decade-old counterpart. Also: the new census will most likely include new CDPs.

The Solutions?

Numerous solutions have been suggested at Wikipedia:Village pump (miscellaneous)#2010 US Census. I've done my best to transclude them here, although I will invariably not entirely understand something, or miss something completely.
 * Bot does the easy ones - Essentially, a bot surveys the CDP articles and determines whether they are unchanged enough for it to update the information by simply changing numbers. Then, it does so. It can also create articles for those new CDPs, and work with infoboxes, which remain fairly uniform. Easy, and fairly doable with current bots and a little reprogramming!
 * Bot dumps info for manual insertion - If a bot cannot find the correct space in which to insert new data, then it may dump all of the 2010 data on a talk subpage and notify relevant people so that they can insert the info. This can be done in conjunction with the previous solution.
 * Super-bots are developed, can do everything - Wikipedians develop bots that are pretty much able to do all the insertion, bypassing the fear of altered tags somehow. This could take a lot of time on developers' parts, and is theoretical at this point.
 * People do it all - This option has all of the data manually inserted by real, live Wikipedians. This option is the opposite of the previous one.
 * Erase it all, replace it all - This would have the entire 'Demographics' section blanked, and replaced with the new data.
 * Import data to Wikisource, use to make templates to use in articles - This choice has all the data put into Wikisource and then somehow, manually or bot-ly, converted into templates for usage in the articles.
 * Ram-Man does it all - Ram-Man has stated that he intended to update the articles from the start, and suspects that the vast majority of them remain unchanged. The help of Ram-Man will probably be a part of whatever concensus is reached, because of the depth of his knowledge on the subject.

This is a good opportunity for Wikipedia because it sets the bar for the next Census, as well as censuses in other places. If done right, this project could set a precedent for other very-large-scale updating projects on Wikipedia and better prepare us for our future. It won't be a while until the data is actually released, but the more time it's given the more likely we are to develop bots that can do everything for us, and thus, the less manual work required. Pretty appealing, no?

Please leave your feedback on the talk page. Thanks!

Members
Members of WikiProject U.S. Census should place one of the following lines of Wikitext on their user page: