Wikipedia talk:WikiProject U.S. Census/Archive 1

Transcluded discussion
Transcluded from Village pump (miscellaneous):

Page moved to Village_pump_(miscellaneous)/Archive_20. Bob Amnertiopsis ∴ChatMe! 01:05, 23 July 2009 (UTC)

 So, eons ago, User:Ram-man programmed a bot to take data from the 2000 US Census and use it to create articles for all the CDPs (Census Designated Places) in the US of A. Well, it's been a while. 2000 has come and gone. Wikipedia has, shall we say, grown and matured. A lot has changed. Now, a new census is coming upon us. This, invariably, will create probz (problems) for Wikipedia.

As it is one of Wiki's goals to be current and timely, the addition of US Census data should be incorporated into articles when it becomes available. However, it is no longer as simple as last time. In fact, it's a whole lot harder.

Many, most likely most, if not all Ram-man's original articles have undergone at least a few edits since their respective creations. This means a bot cannot simply go thru and replace what it identifies as "olddata" with "newdata". I suppose (I'm honestly very unqualified to talk about bots, so correct me if I'm wrong) it would be possible up to a point, for a bot to go thru, looking for certain tags from the original articles (ie, "the city has a total area of x square miles" or "For every p females there were d males") but if that information had been in any way reformatted, it would be far more complicated, if not borderline impossible.

Now, I s'pose it'd be possible to manually go and edit the average per capita income and average family size on every single article, but it would be amazingly time consuming, astoundingly tedious and, in my opinion, dead boring.

So, fellow Wikipedians, I come to you with a question: what the schnitzel do we do? I'd love to get a jump on this problem now, and would love to hear your input. As always, thank you all! Bob Amnertiopsis ∴ChatMe! 02:33, 10 July 2009 (UTC)


 * Also up for renewal will be the "demographics" sections of thousands of municipalities and towns. Many of those have been altered since they were created. Perhaps the bot can see if the information is still in the original arrangement and if so just change it. For those that have changed significantly, perhaps the bot could place suggested text on the article talk pages, and leave it to live editors to modify it to suit.    Will Beback    talk    03:28, 10 July 2009 (UTC)


 * Yeah; it's especially the demographics sections that'll be needing this, although of all the sections, they might have been the least likely to have been edited, because of their statistical nature. I dunno. Maybe a survey of these articles is needed t see what needs doin'? Bob Amnertiopsis ∴ChatMe! 04:12, 10 July 2009 (UTC)


 * I suggest posting a more formally stated note with this issue outlined in brief at every US state's project talk, requesting project members come to this discussion and add their thoughts. This would provide a structured way to set up a 2010 requirements survey, with volunteers providing information about the needs of their respective states. I don't think a bot will be able to accomplish or update what the Ram-man bot did. Sswonk (talk) 04:28, 10 July 2009 (UTC)


 * First off, around what time does the actual data get published? I was thinking this material would not come out until around late 2010/early 2011 at the earliest, and even then we're not sure how much we're going to be off by on the various demographics. So I guess the way to look at this is to try to see: 1) What data (what parameters) we're going to have to change in the various infobox settlements? 2) Can we look for the original demographic text arrangement first, and modify that if possible? 3) If not, can we modify individual sentences/segments without throwing off a lot of the other material article in terms of context? 4) Can we specify some kind of opt-out parameter for this bot for articles which are already being manually maintained? I'm going to go notify User:Ram-Man, since he may have anticipated this on the initial run. From WP:CAL -Optigan13 (talk) 05:32, 10 July 2009 (UTC)
 * An opt-out (hidden category?) for pages is definitely needed. I just did a quick check of six CDPs and Bishop in Inyo County, California and all of the "Demographics" sections appear to be identical with respect to non-numeric content - stress "appear". If a routine could be coded to confirm that the section is identical to when it was originally created, then direct editing by the bot would be OK. Otherwise, I like the idea Will suggested of having a bot dump updated text sections on article talk supbages for manual insertion, and also create new stubs for new CDPs. I think bot infobox edits are also possible, but the likelihood of finding manual formatting w/line breaks, abbreviations etc. there is much higher. (WP:MASS) Sswonk (talk) 12:42, 10 July 2009 (UTC)


 * Is the 2010 Census even going to collect the same data that it did in 2000? I read they were doing a short form this time around. 2010 Census is Different. --JBC3 (talk) 13:47, 10 July 2009 (UTC)

Obviously the ideal scenario would be if we could simply plug the new data into the bot and have it go through and replace $olddata with $newdata, but the obstacles to this have already been stated : the data sets from the Census Bureau may not be the same, the data format in the article may have changed and become unrecognizable to the bot. Also, it'd be ideal if Rambot can be used for this task again; I would like to mention that I have a (currently inactive) bot that was used to update the infoboxes in US cities and places, and I can reprogram it if Rambot cannot be used. If nothing else, I can start running a survey of random articles over the weekend so we can get an idea as to how hard it would be .. Shereth 14:18, 10 July 2009 (UTC)
 * As not even an American citizen, I don't feel overly qualified to comment, but then again, we're all friends now so I'm going to anyway. :) If I were leading this "operation" (which I fear it must be), I think I would follow Optigan13's comments, and work in stages. That is to say: 1) Create new articles (ah! the easy bit; 0/10 difficulty) 2) Find articles where all the relevant sentences have remained totally unchanged and edit them accordingly. Where new data is not going to ever be available (questions not being asked and so forth), mark old data as being from 2000. (Difficulty: 2/10 realistically.) 3) Same as 2, but for articles where all relevant sentences remain unchanged, or only changed to a level through which the bot can penetrate. (Diff: 6/10) 4) Articles where some sentences are machine readable, but others aren't. Two options: involve humans to some degree directly, or try to get a bot tag old sentences as being old. "In 2000, ....", "As of 2000, ..." (Difficulty: 8/10 either way you look at it) 5) Articles where virtually all sentences are inpenetrable, the bot encounters some sort of contradiction, or some sort of artibtrary limit on "importance" of a page i.e. featured articles; these to be updated by hand. (Difficulty: A fair bit of work for humans). Anyhow, just my collected thoughts. - Jarry1250 [ humourous – discuss ] 14:40, 10 July 2009 (UTC)
 * Oh, the complications of it all! One more thing: Many CDP pages have estimates since 2000 ("...the 2009 population is estimated to be..."). What do we do about THAT?!?!? --Tim Sabin (talk) 14:49, 10 July 2009 (UTC)
 * Here is where it gets real fun : I can almost guarantee that the Census Bureau will be re-defining some of their CDP's, adding new ones, dropping old ones, etc. Unfortunately for us they aren't exactly set in stone from one census to the next. Sher<b style="color:#6060BF;">eth</b> 15:08, 10 July 2009 (UTC)
 * Yay. :) You could probably still hold all these changes to my little plan with a little thought, hopefully with only a little manual assistance. Oh yeah, and then we get to go through it all again with the 2011 UK Census and the same for all other commonwealth nations. Smaller tasks, admittedly, but quite probably harder to achieve by bot given that at least US middle-of-nowhere places have RamBot's uniformity as a starting point. Ah, the fun of it. - Jarry1250 [ humourous – discuss ] 15:23, 10 July 2009 (UTC)
 * Yeah that is a bit tricky. The new ones and discontinued ones should be easier to deal with, because we either create a new article or add the data to an existing hamlet article. If the Census Bureau puts out a list of the CDPs that already exist but changed boundaries, that could make things considerably easier to deal with. --JBC3 (talk) 15:36, 10 July 2009 (UTC)
 * The CDP's estimates would no longer be needed and should be replaced with the actual count provided by the 2010 Census, no? --JBC3 (talk) 15:36, 10 July 2009 (UTC)
 * Is there anything in the Demographics section that requires being saved? Couldn't a bot just wipe out the section entirely (if it exists) and replace it with a "template" with the appropriate data filled-in? --JBC3 (talk) 15:36, 10 July 2009 (UTC)
 * Some articles have gotten new demographic information (such as religious affiliations) that are not tracked by the Census - wiping out the demographics sections en masse would result in a loss of all this additional information. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 15:38, 10 July 2009 (UTC)

OK, these new comments have me leaning strongly towards the "bot dumps new demographics paragraph on article talk subpage or section" idea. I am pretty sure that will become widely known and conscientious state project editors will follow up. This would keep the heuristics required and difficulty factor low. Sswonk (talk) 15:25, 10 July 2009 (UTC)
 * I can only see this being useful for the "problem" pages - the vast majority should be bot-editable (see my below comments). If all we did was have a bot plop the info on every talk page we'd wind up with a lot of outdated articles, as there are many thousands of minor CDP's and small towns that get very little attention from actual editors. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 15:31, 10 July 2009 (UTC)
 * Probably so, I anticipated that but it made it simpler to throw the suggestion out there than to investigate on my own. In Massachusetts, Gosnold is the smallest town but as you can see the article has matured. I set up a demonstration diff to show the changes since Ram-man (keep confusing with Manny Ramirez, got to focus...). The verb tenses and other subtle edits exist - lots of them. Mass has 351 cities and towns, no unincorporated territory (true for most of New England and the northeast). I can see that these type of edits also may be common elsewhere, especially here where each place is fairly well populated and the articles have matured. That is why I prefer an opt-out or talk page solution, but I am getting a clearer picture of the difficulties involved in large sections of the midwest and west where the pages may not have changed much or at all since the bot started the articles. Sswonk (talk) 16:33, 10 July 2009 (UTC)

Ok, having taken another look at things, we can also break down the concerns by section. Essentially there are 4 places where the data is likely to change in any article when the new census data comes out. They are: What we probably ought to do is to come up with a list of variants on the "standard format" statements that the bot can look for. I suspect that with a relatively small amount of work, more than 95% of the work can be done successfully with a bot. The small number of articles that have been sufficiently modified by editors so as to be unreadable by the bot are, naturally, the ones that get a lot of editor attention and those editors would be quick to manually update the information as needed. I for one am fairly optimistic about this being somewhat less daunting than it appears at first. Also - perhaps we should consider creating a dedicated page for this discussion, as I fear it may wind up overwhelming the page before too long. :) <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 15:31, 10 July 2009 (UTC)
 * In the infobox. As infoboxes are almost universally standard these should be easy for a bot to edit with minimal problems.
 * In the lede. The standard format that exists in most of the articles will be "The population was XXX at the 2000 census."  Those will be easily modified by a bot.  I'm sure we can come up with a few common variations (such as "population of XXX as of the 2000 Census") to aid the bot, as well.  Many of the larger/more prominent cities will have other constructs with updated estimates.  Again, I am sure we can come up with several variants for the bot to search for.
 * In the geography section. The standard format is "According to the United States Census Bureau, the city has a total area of XXX square miles (XXX km2), of which XXX square miles (XX km2) of it is land and XXX square miles (XXX km2) (XXX%) of it is water."  There are a few standard variations as well.  For the most part, I anticipate these statements will be largely unchanged and easy for the bot to find, but of course there will be exceptions.
 * In the demographics section. There does exist a standard format that many articles will have and thus this should be relatively easy for the bot to change.  However, due to the length of the demographic information, this section is the one I anticipate seeing the highest number of variations on and probably giving the bot the most trouble.


 * (after ec) Yes, CDPs are problematic (as I've said many times before). Apparently the Bureau will be using different criteria for naming and defining CDPs. See here for some details from the Federal Register. In general though, I suggest that all the demographic and statistical data be placed into some sort of standardized template format that can be transcluded onto articles. Such Census data should rarely need to be edited and any stylistic edits should be done with some consistency across the board. Perhaps there might even be a mechanism for importing the raw data to Wikisource and then developing templates that could display the data in pre-formatted ways by using the entity's Census/FIPS codes. older ≠ wiser 15:33, 10 July 2009 (UTC)
 * I remember a discussion not too long ago about trying to centralize the data and simply using templates (or some similar scheme) to display Census data on the articles themselves. If we can come up with a good way of doing this, I am very very strongly in favor of doing so. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 15:38, 10 July 2009 (UTC)
 * Interesting - The UK will a census in 2011 and so will other European countries, e.g. Spain. I like the template idea if it can be made to work. Jezhotwells (talk) 17:17, 10 July 2009 (UTC)


 * I will look into drawing up a more formal page for this, and getting members of all the US States WikiProjects involved. <b style="color:#660000; font-family:Andalus;">Bob</b> Amnertiopsis ∴<sub style="color:#FF9999; font-family:Tunga;">ChatMe! 04:34, 11 July 2009 (UTC)

Brainwave
^ Arbitrary break. Regards evolution of language, if we get Ram Man back, or get his data, this job should be made a lot easier. Don't rely on the words. Instead, look at the numbers themselves. Sure, that wouldn't solve everything, but 2432 => 2564 is a fairly easily substitution to make regardless of the words that surround them. That should help with most numbers (where they're not duplicated), but it doesn't get away from the problem of wanting to ensure 100% goodness. Only some serious logic would help with that. - Jarry1250 [ humourous – discuss ] 16:45, 10 July 2009 (UTC)
 * Not a bad idea at all. The bot would have to search for both the 2000 figures and the most recent estimates, and naturally it wouldn't catch instances where someone has swapped out the census figures for other data - but yeah, this is actually a really good idea. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 22:14, 10 July 2009 (UTC)

I'm not sure a bot is necessary. Run a bot/script to gather the data, put it on a page somewhere, and let users introduce it into the articles. With, what, 55,000 Rambot pages? That could be easily done within a few weeks. --Golbez (talk) 17:24, 10 July 2009 (UTC)


 * Why delete data from prior decades? Isn't it useful to show the change in population? - Pointillist (talk) 22:22, 10 July 2009 (UTC)


 * Although the data is collected in 2010, it will not be available until 2012. And the information that is available will be in general the same as in 2000.  Who then was a gentleman? (talk) 22:32, 10 July 2009 (UTC)
 * Not that I doubt you, but do you have a source for the anticipated date of release of the information? It'd be highly useful.  If the information isn't going to be released for over 2 years then it's probably safe to put a lid on this conversation for now ... <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 22:36, 10 July 2009 (UTC)
 * Well, the 2000 data was released in 2002, and this says that the data doesn't go to the President till December 31, 2010, so it won't be till at least some time in 2011. This says the repapportionment data has to go to the states in March 2011. This says that local population data will be released in 2012.  Who then was a gentleman? (talk) 01:28, 11 July 2009 (UTC)

When I first created all those pages, I had planned to update them all when the 2010 data was published. That has always been my goal, and I always planned on doing it myself, since I created the work in the first place. I suspect that the vast majority of pages are still close enough to the original that a bot could be hand-programmed to replace the data. I have not touched the original data in many years, but I'm pretty sure I have it stored somewhere on my computer. I don't see the need for opt-out lists or anything like that. Anything that can't be done automatically and correctly will be pretty easy to determine. -- RM 00:22, 11 July 2009 (UTC)
 * When you say hand-programmed do you mean numbers matching within rewritten paragraphs could replace just the numbers without changing the prose? And, after the determination of go-nogo on bot updating of the passages, could your bot then still place the newly compiled paragraphs for the article on a talk page for the nogo articles? Sswonk (talk) 03:02, 11 July 2009 (UTC)

an aside
This is a quite interesting knowledge management problem with natural language processing and evolving data format accommodation elements. If properly described and addressed, one might be able to compose either an article for peer review or an abstract for submission to a conference related to this "operation" (I'm thinking about the annual ASIS&T meeting or the JASIST publication, for example). It would be quite a confidence builder for Wikipedia users if Wikipedia editors were to author a peer reviewed research article related to handling a real world knowledge management problem such as this. --User:Ceyockey ( talk to me ) 02:38, 11 July 2009 (UTC)

Page Break
Now that this has moved to its own dedicated page, I'd like to make a few comments: -- RM 21:05, 11 July 2009 (UTC)
 * After the pages were created a number of people suggested using templates to display the data. This was ultimately struck down by consensus.  The basic premise was that we want to use templates as little as possible because it is not as easy to edit templates. I think the idea is that while templates are easier from a maintainer standpoint, they are not easier from a user standpoint and we should favor the users, as that is the ultimate goal of the project.  I think that if we went this route, we'd have to garner sufficient support for a change in the way this is done.  I suspect it would be quite difficult.
 * Changing information in info boxes and other areas (instead of where I put it originally) is certainly not going to be easy. However, that information should already be cited appropriately with the proper date, so I don't see the problem with having data from multiple censuses.  It will be up to the individual editors of those already heavily edited articles to update them manually.  It won't be inaccurate, it will just be out-of-date for a little while, but I don't see that as a serious problem.  To the highest extent possible, the parser bot should flag any such articles so we can speed up updating.
 * There is no reason to keep the old data around. If an article on a city has a sub-article containing historical census information, then it is already in the proper place and does not need any changes.  Other articles should contain the most up-to-date statistics.
 * We might want to consider changing the types of information displayed. I selectively chose some information over other information.  Of course it may be easiest to just stick with what is there, but this is certainly something to consider.
 * Just so everyone is clear, the rambot had to be custom hand-programmed on numerous iterations to work the way everyone wanted it to work. It was by no means a generic process.  Even early in the process there were many special cases that had to be adjusted.
 * I highly suggest we try to avoid manual editing as much as possible. It would be a huge waste of real time to have to manually edit things, unless there is a very good reason for it.
 * Lastly, when I was doing update cycles in the past using the bot (after creation, the bot was used a number of times to update the articles to fix various problems with the originals), I got a lot of feedback from users whenever problems happened. A lot of people have already come up with various problems that could and will occur.  When the time comes, I suggest we just go ahead and start the process and address those issues as they come up.  Stopping the bot to handle new situations is not going to be a problem.  For any article that can't be done automaticaly, a talk page post will likely be the simplest solution.

For the demographics sections, would it not be possible to program a bot to completely remove the current demographics sections and replace them with an updated demographics section, that way all the data in the section would be updated at once? It would also have the benefit of standardizing demographics sections. <b style="color:#009900;">Ks0stm</b> ( T • C ) 03:48, 15 July 2009 (UTC)  Basically, could a bot do the "Erase it all, replace it all" method, thus saving much of the work? <b style="color:#009900;">Ks0stm</b> ( T • C ) 03:53, 15 July 2009 (UTC)
 * I think that makes the most sense. Anything not part of the original Census 2000 data that has otherwise been added to the demographics section could easily be restored by another editor watching the article. --JBC3 (talk) 16:59, 15 July 2009 (UTC)
 * Absolutely not. A bot should never create more work for human editors; their entire purpose is to create less work for them.  Bots that require manual edits to go back and clean up after them are poorly coded, wasteful and frankly are not even worth considering. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 14:24, 16 July 2009 (UTC)
 * There's a difference between a bot doing something that creates more work for human editors and a bot not doing something that creates more work for human editors. My point was that the number of Demographics sections with added content is probably relatively small. For those sections that are added to, a human can re-add the deleted (and probably outdated) information. What is the alternative you suggest? Having humans do everything in those articles that have been altered? I'm open to alternatives. --JBC3 (talk) 19:44, 16 July 2009 (UTC)
 * Well, before an actual determination is made, there is a broader issue to tackle. It needs to be decided what demographic data will be culled from the 2010 Census and added to articles.  If we are going on the assumption that we will be using the same data, then the preferable solution is to take the extra time/effort to program the bot to parse existing demographics sections, find the old data and replace it.  I understand this will require more work on the part of the person(s) operating the bot, but it is worth it when the net result is less work overall.  It is also unreasonable for us to assume that all of the affected articles are diligently watched and the lost material will be restored.  Every effort should be made to ensure that a bot does not remove information from articles with the expectation that live editors will "fix" the problem after the fact. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 20:16, 16 July 2009 (UTC)
 * Which is fine, assuming there is an alternative that makes less work for the human editor. --JBC3 (talk) 01:09, 17 July 2009 (UTC)
 * If the bot was programed as I said earlier, and completely blanked the demographics section, then replaced it with the new one, there wouldn't be anything from the previous demographics section for editors to clean up...all that would be left to do is update Infoboxes and the like, the demographics sections would be done. <b style="color:#009900;">Ks0stm</b> ( T • C ) 17:46, 16 July 2009 (UTC)
 * The problem arises when the demographics section has been added to. Take for example Phoenix, Arizona which has religious statistics, or Houston which has significant commentary regarding specific nationalities and information regarding sexual orientation.  This type of data is not monitored by the Census, it comes from other sources.  Blanking the Demographics section entirely would cause this iformation to be lost and require editors to go back and re-add it after the bot has passed through.  Blanking and re-adding is only acceptable when the Demographics section contains only the original Census data. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 17:54, 16 July 2009 (UTC)
 * That's a problem I hadn't noticed...is it possible to get a bot programmed to recognize what data is and isn't from the census, then replace only what is? The only complication with this is that the wording on that data is likely to have changed from what it originally was... --<b style="color:#009900;">Ks0stm</b>  ( T • C ) 01:05, 17 July 2009 (UTC)

Make separate section
[unindent] Why do we want to remove the 2000 demographics? Wouldn't it make more sense to have two separate sections, as follows?

==Demographics==

2000 census
As long as information is valid, relevant, and sourced, I can't see the benefits of removing any information from any article except to split it out because the page is too large. If we were to adopt this format, we'd have no need to discuss how to implement the new data: we could program the bot simply to add the "Most recent census" header, the 2010 information, and the "20::00 census" header. Shereth's examples of communities with more information, such as Phoenix and Houston, could easily have an editor split those out to separate subsections. Nyttend (talk) 15:36, 20 July 2009 (UTC)
 * This is a step in the right direction. The only immediate problem I can see is in the labeling of the "old" data as "2000 Census" data, when in many cases it will contain newer data (the American Community Survey gets cited relatively often) and it would be a misnomer to label them as such.  Perhaps some less specific way of referencing the data, calling it "historical", "previous" or something less constraining than "2000 Census". <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 16:30, 20 July 2009 (UTC)
 * Good point. However, estimates such as the ACS provide far less data, so perhaps we could decide generally to have a separate subheader for those, putting them between the 2000 data and the 2010?  We can't expect the bot to know what to do here, so we'd have to have a header such as your suggestions, but it would be a good recommendation for editors.  Nyttend (talk) 03:20, 21 July 2009 (UTC)


 * Yes, there should be a separate section. Because the previous data is in a variety of states, maybe that should be handled case by case by human editors. The 2010 data could go at the end of the demographics section, under "2010 census" or somesuch. Maurreen (talk) 09:00, 10 February 2010 (UTC)

CDPs only
Why are we discussing this just for CDPs? Shouldn't we be applying whatever decisions we make here to all municipalities as well: boroughs, cities, towns, townships, and villages, plus counties? Nyttend (talk) 15:30, 20 July 2009 (UTC)
 * I don't think anyone really means to restrict this to CDPs at all. They tend to get brought up in discussion due to the special issues that will arise when comparing data from the 2000 to the 2010 census.  For the most part, the other entities you have mentioned will have a few less issues to tackle. <b style="color:#0000FF;">Sher</b><b style="color:#6060BF;">eth</b> 16:31, 20 July 2009 (UTC)

Suggestion
It seems that a bot may do a fair bit of the updates with the 2010 Census data. Could it please quit linking to Poverty line and instead use Poverty line in the United States? The link to (a redirect to) the 'global' article seems to have confused at least one reader.

Also, technically, the Census Bureau calls this the poverty threshold; the poverty guideline is set by the HHS (and the numbers never match). WhatamIdoing (talk) 00:47, 14 October 2009 (UTC)

Format
If it's not too much trouble, I'd like to suggest that new data be added in a list or tabular format. It's easier to read, and the sentences don't add anything meaningful. Maurreen (talk) 09:11, 10 February 2010 (UTC)

October 2010
The count is over and the results will start coming out soon. The first significant releases seem to be scheduled for February 2011, and they'll mostly be done by November 2011. Is it time to start thinking about this again?  Will Beback   talk    09:28, 4 October 2010 (UTC)
 * I think so...I brought this up on the Village Pump a while back, but to no avail. What's the proper way to bring attention to this? Perhaps a watchlist notice or something? <b style="color:#009900;">Ks0stm</b> (T•C•G) 13:36, 4 October 2010 (UTC)
 * Why not just add a section on 2010 census results to the end of each identified article - and let editors weave the data in where needed, and leave it as stand alone info if no such need is found for an area? Sort of a modified system for how it was handled in the past.  Collect (talk) 20:45, 4 October 2010 (UTC)
 * It seems like most of this job will ultimately fall to bot writers. I'll leave a message over at Bot Approvals Group. One tricky issue is how to handle demographic sections that have been rewritten significantly - should that material be overwritten? Should the new material be added after the 2000 information? Another issue is deciding which data to include.   Will Beback    talk    21:14, 4 October 2010 (UTC)

I am not a bot operator and I don't even work much with US geography articles, but I do want to offer a thought for consideration. While Wikipedia should strive to include up-to-date information, verified information from past years can still be useful in an article. For instance, a good "Demographics" section for a major city should indicate not only the current population of the city (or the most recent estimate), but also the population of the city in past years; for an example, see the table at Charlotte, North Carolina. So, new information from the 2010 US Census must absolutely be added to articles, but it should not entirely replace or overwrite information from past censuses. Thank you, -- Black Falcon (talk) 21:49, 4 October 2010 (UTC)

New CDPs
The new CDPs for 2010 are now listed in the GNIS (search in the Census class for anything with a 2010 date), and it looks like there's a lot of them. These create a special set of difficulties, as the articles for these places are in widely varying conditions. Many of them don't even have articles, many have articles which list them as unincorporated communities (or some variant on that) instead of CDPs, some have articles with demographic data (from ZIP codes, usually), some are redirects to a county/township page, some have articles under variant names, and there's probably some other things I haven't even thought of. This raises a few points: I know this is a lot for one comment, but it has to be sorted out eventually anyway. TheCatalyst31 Reaction•Creation 10:04, 23 October 2010 (UTC)
 * Can a bot create articles for most ofthe new CDPs without articles, or will this have to be done manually? Since not all of the CDPs are actually listed as populated places in the GNIS (many are only listed under the Census class), but many of them are, getting consistent coordinates/elevations/etc. could be difficult.
 * If the above will mostly be done manually, should we wait until the census comes out to write articles on new CDPs without a GNIS listing as a populated place, or can these be written on the Census listing alone?
 * Can the existing unincorporated community articles be updated now to say they're CDPs, and moved to the CDP section of the county templates (and anywhere else it's relevant), or should this wait until the Census is published? We know they're CDPs now, so it would probably be easier to start changing them now, but the Census Bureau itself seems to only have announced them to the GNIS.
 * Since most of the unincorporated community articles will have no existing demographic data, a bot can presumably insert that, but is there any way for a bot to fix the classification once the census data comes out, if it hasn't been done manually?
 * What should we do about the redirects and variant names? These will probably be the most problematic; the redirects to county/township pages deserve independent articles now that they're CDPs, but it will be rather time-consuming to check these manually. Since some of the redirects will be to variant names, a bot can't create articles for every redirect; can a bot list all the redirects somewhere so they can be manually checked without going through everything else?
 * Should communities with variant names be moved to the CDP names or left at the original name? (This should probably be handled case-by-case.) Also, is there any good way of checking for variants so we don't get duplicate articles?

New Census Figures Jan 2011
For what it's worth...

The U.S. Census Bureau releases the 2010 census figures January 2011. All pages containing 2010 demographics will all be out of date. Not all page demographics have been neatly worked into a separate demographics section or other designated section. Some have populations weaved into the lead section, or other places. Some only mention that overall population. Some have the breakdown. And this does sometimes include unincorporated areas.

Additionally, I have noticed two different existing methods on the pages, one possibly due to human error. Taking Scotts Bluff, Nebraska as an example.
 * A lot of those pages correctly reference the U.S. Census Bureau Fact Sheet for their demographics.
 * Others, such as Scotts Bluff, Nebraska, don't reference the correct URL address to pick up the figures for the zip code demographics, so when clicking the reference, you only get the American Fact Finder main page.
 * Maile66 (talk) 21:44, 7 November 2010 (UTC)


 * An update: The census bureau is releasing population data at 11:00 EST today, 21 December 2010. FYI. UltraExactZZ Said~ Did 15:41, 21 December 2010 (UTC)


 * FYI - I've had communication from Ram-Man saying he will be unable to participate in the updating process. Maile66 (talk) 16:07, 21 December 2010 (UTC)

Pie chart?
A user has suggested the creation of a template that displays data as a pie chart at Village pump (idea lab). I don't know if this is possible, but if it is, then I could envision it being used for census-related data. If you know anything about this kind of thing, please comment there. Thanks, WhatamIdoing (talk) 22:04, 25 November 2010 (UTC)

Templates
As I understand it, the Census or the USGS assigns a control number to each place. If we stored that control number as a separate parameter of the Geobox template, a bot could easily update at least the Geobox templates quickly and accurately. Perhaps there are other similar articles with infobox templates that could also use such a parameter. I am a bit troubled that the universe of articles needed an update may not be co-extensive with the set of CDP articles created by Ram-Man from the 2010 data. For example, state township articles that use Infobox settlement.

This is important if the Census releases updated data and of course for 2020.

Thanks, Racepacket (talk) 15:16, 7 December 2010 (UTC)

AWB?
Would AWB be able to help, using "find and replace"? If there is a way to incorporate this, we could have some kind of AWB drive(like Wikiproject Wikify's). Using barnstars and good promotion this could be done in a one month drive and be over for another 10 years. Sumsum2010 · T · C · Review me! 03:21, 26 January 2011 (UTC)

What's the plan?
I noticed that people have been talking about this subject since 2009, so what's the plan? • Sbmeirow  •  Talk  •  03:50, 8 February 2011 (UTC)
 * Since the Spanish Wikipedia is creating all U.S. location from scratch we are using the 2010 Census, but we are planing to use a bot soon to update all the U.S. locations that are currently using the 2000 Census, and I'll ask him (User:_jem_ on the Spanish Wiki) if we can use the same bot here, he said it is possible and easy to update it as long as the data is in .svg format, which in fact you can download all the data from factfinder2.census.gov in that format. I'll let you know when we're done (maybe in April or May, when all the data is released and available). And while creating the list of locations for each state using the 2010 Census, I noticed that the Census Bureau created more CDP, doubling in som cases like Virginia or Louisiana from incorporated areas, and now those articles need to be created you can take a look at List of cities, towns, and villages in Louisiana, I updated the table from the Spanish Wiki and you can see that there are new CDP in Louisiana. --Vrysxy! (talk) 07:26, 23 February 2011 (UTC)
 * Have the 2010 zip code demographics been posted on FactFinder2 yet? If so, I have been unable to find anything later than the 2009 estimates on there. They've made the process complicated. Maile66 (talk) 10:25, 23 February 2011 (UTC)
 * Not sure about factfinder2, but the raw data is available for some states.  They're all going to be published by April 2011. --ChrisRuvolo (t) 16:31, 23 February 2011 (UTC)
 * I've seen people updating local, city-level populations for the states released (For example ) but for the life of me I cannot find this local of strictly population data anywhere (not on Factfinder2 searching for Moore, Oklahoma anyway). Where can I find this level of data to fact check such changes? <b style="color:#009900;">Ks0stm</b> (T•C•G) 20:26, 23 February 2011 (UTC)
 * It was a nightmare trying to find out local data for small towns, but all you have to do is:


 * 1-Go the main Factfinder page (http://factfinder2.census.gov/faces/...es/index.xhtml)
 * 2-Click the Geographies tab to your left
 * 3-Get rid of the select geographies pop up if it shows up. You should now have the annoying list of tables
 * 4-Then the sixth from the bottom should be called "Population and Housing Occupancy Status: 2010 - State -- Place", click it (click on "Population and Housing Occupancy Status: 2010 - State -- County subdivision" to see data for townships, etc.).
 * 5-Now you should get a scrolling list of cities in Arkansas. Where you see Arkansas you can change the state to see local data. --Vrysxy! (talk) 20:36, 23 February 2011 (UTC)
 * "Nightmare" is an understatement. Whatever was the government thinking when they designed this? Thanks for the direction.  I got as far as Texas with your instructions.  Not all the towns that had 2000 population figures in the old Fact Finder are on this one. Isn't that interesting? However, it would be good if we could figure out to get the total demographics breakout, like the old Fact Finder by just inputting the zip code. Maile66 (talk) 21:14, 23 February 2011 (UTC)
 * The old fact finder was waaay easier, anyway, this Friday I'll create a table (for the states with data available) listing every city, town village (if any) and CDPs with the 2010 Census, and btw maybe those towns that u said are not on the 2010 fact finder, maybe they just merged within a city?? --Vrysxy! (talk) 21:44, 23 February 2011 (UTC)
 * Re the small cities, it's hard to say if they merged or not on the census, but I doubt they did geographically. Small farm communities ten or twenty miles from bigger places, with lots of acreage inbetween. One I had in mind was Doss, Texas, and others in Gillespie, Mason, LLano, etc. Texas Hill Country towns. I believe grass grows faster than the FactFinder2 can download data just to scroll to the next set of towns. I anxiously await the Saturday Night Live parody of someone trying to retrieve census data from that site.Maile66 (talk) 21:58, 23 February 2011 (UTC)
 * I found something a couple weeks ago that said they are releasing the information gradually, like groups of states at a time, and suppose to be finished by April or May? •  Sbmeirow  •  Talk  •  23:22, 23 February 2011 (UTC)
 * If the rest of the Texas communities are like Doss, they probably aren't CDPs and their 2000 census data was taken from the ZIP code tabulation area. From what I can tell, 2010 census data for ZCTAs isn't out yet, though considering how terrible factfinder2 is it may be on there and I just can't find it. I don't know when it's coming out, but it probably isn't covered by the April/May deadline since that appears to be for releasing the basic data for the remaining states. TheCatalyst31 Reaction•Creation 23:49, 23 February 2011 (UTC)
 * Thanks for the navigation tip for factfinder. I was able to pull out the NJ data to create this table to begin working from:  WikiProject New Jersey/Census 2010.  Looks like a bunch of the CDPs changed.  Thanks again. --ChrisRuvolo (t) 01:15, 24 February 2011 (UTC)

Yeah, and btw did u do that table by hand? Or did u use excel or any other tool? Btw those new CDP were created from unincorporated areas and some of them were divided from 2 CDP like Great Meadows-Vienna, New Jersey to Great Meadows, New Jersy and Vienna, New Jersey. --Vrysxy! (talk) 05:23, 24 February 2011 (UTC)
 * I downloaded the CSV, then pulled out fields and used text editor macros to build the table. Yes, I did notice that some of the CDPs have been split up.  More research will be needed to see if the new split CDPs are co-terminus with the previous CDPs, or if they perhaps contain additional territory.  I also noticed some communities that were previously ZCTAs are now CDPs, but with differing geography.  Some more info here: Wikipedia talk:WikiProject New Jersey.  Thanks. --ChrisRuvolo (t) 18:40, 24 February 2011 (UTC)
 * Thanx, I also noticed that the some states have districts as their subdivision, just like townships in some states, why is that? Are they new? are they actually subdivision of those states? MS for example has over 300 districts, like District 1 etc.. I try to google them on Google maps and none of them were there. So I'm guessing they were created during this cesus, anyone from Alabama, MS, VA or LA? --Vrysxy! (talk) 05:02, 25 February 2011 (UTC)
 * These are basically alternate forms of county subdivisions for states without active minor civil divisions. Alabama uses census county divisions, and according to this, LA uses Parish Governing Authority Districts, MS uses Supervisors' Districts, and VA uses magisterial districts. The latter three seem to be defunct, and with a few exceptions they generally don't have articles, so mass creation of these probably isn't necessary right now. TheCatalyst31 Reaction•Creation 07:06, 25 February 2011 (UTC)

From the Census 2010 folks
Received the below email today in response to a query I'd sent quite some time ago Maile66 (talk) 21:42, 3 March 2011 (UTC) "Thank you for your feedback and please forgive the delayed response. I want to let you know that Zip Code Tabulation Area ( ZTCA) data will not be available until the June - August 2011 time frame."

- American Fact Finder Staff,US Census Bureau

What breakdowns?
Last census 2000, we reported population (as does the US Census) by race (as defined by US Census) and then Hispanic & Latino (separately, as does the US Census). To make comparisons between 2010 and 2000 data meaningful and to not push a POV, I assume we do the same this time around (as I have been doing in the articles I am updating). Another editor takes the Hispanic & Latino populations out of all race categories, or just from Whites, or just from Whites & African Americans, and presents that data (often wrongly). Another seems to break the Asian American populations by Chinese American, Japanese American, etc. - but not consistently. Another editor takes pains to inform us in various places that "Middle Easterners", "Iranians", "Armenians" and "Jews" are "White", but again not consistently. Can't we just stay with the tried and true breakdowns, to make things simple, comparable, consistent, and NPOV? Carlossuarez46 (talk) 21:05, 16 June 2011 (UTC)
 * It seems like, ideally, you take out the Hispanics, and you then list "non-Hispanic Whites," and "non-Hispanic Blacks". It's misleading to read about cities that are 80% white when, in fact, there are almost no people who are White by the most used definition (i.e., non-Hispanic Whites). john k (talk) 19:52, 1 July 2011 (UTC)

Why no updates?
It's rather sad to see how many articles haven't yet been updated with the new census data. Why hasn't a bot been created to do this? john k (talk) 19:54, 1 July 2011 (UTC)
 * Most articles have not. I just checked the big city in my major metro, Boston, which has not been updated with the 2010 census data in the demographics section.  And neither has any other city I've checked.  What we need is a complete REPLACEMENT of the "Demographics" section with new information by a bot.  That's what I vote for.  A bot should be able to replace a good deal of the info in the infobox as well.  But I think replacing the entire demographic section with 2010 data using a bot is a no-brainer.  — Preceding unsigned comment added by Midtempo-abg (talk • contribs) 06:33, 17 October 2011 (UTC)

How to handle mixed 2010 and 2000 information
In the hopes that someone is watching this page -- I notice some people are (helpfully!) adding 2010 data. I'm very glad to see this. However I see a problem with retaining some 2000 information which evidently cannot yet be updated (please correct me if I am wrong). Here is typical edit. The 2010 demographic data is now correct, but the paragraph on income "The median income for a household in the city was $13,750, and the median income for a family was $11,250..." is now from the 2000 census, but this is not clear from the text or the paragraph heading -- which actually implies it is from the 2010 census. What is the best way to fix this? Program a bot to follow after, adding "According to the 2000 census, ..." to the beginning of that paragraph, or a "2000 income data" subsection header? Program a bot to add 2010 income data when (or if) it is available? Removing it altogether in my opinion would be a bad idea. Thanks, Antandrus (talk) 01:59, 20 June 2012 (UTC)

My line of though
I'm the person who is doing the editing, at least for Iowa and now North Dakota. I looked around before I started coding my bot to try and find the recent income data and I was unable to find it if anybody does I can add it to the bot, and actually if I'm not mistaken I think the data might actually be from around 2005 and not related to the census. I can easily modify my bot to start changing that paragraph to include the date of the data at the beginning, and if I know for sure what the date is I'll do that when I move on to my next state.


 * It's from the 2000 census, except for the cases where someone manually updated it with more recent data. Here, for example, is the Hansboro page from 2002; you can see the numbers are the same as they are in your recent edit.  I have not found income data myself at the local level for 2010; obviously using that would be a good solution (but may not be possible for a while yet). Antandrus  (talk) 02:19, 20 June 2012 (UTC)


 * I looked for the 2010 data before I started and couldn't find it so I don't think its out yet. I'll start modifying that paragraph to state the info is from 2000 when I move onto a new state (probably South Dakota), and come back to Iowa and North Dakota when I'm done to fix that paragraph.


 * Thank you! That would fix it. Antandrus  (talk) 02:50, 20 June 2012 (UTC)

2010.census.gov has been retired
http://2010.census.gov has been retired for http://www.census.gov. Special:LinkSearch/http://2010.census.gov currently shows 1550 links including all namespaces. Most links I tried would work if 2010.census.gov is replaced by www.census.gov. I guess a request to update links should be made at Bot requests. PrimeHunter (talk) 13:13, 6 March 2013 (UTC)

Activeity
This project is marked as inactive. It is unclear to me, but did the issue in question get dealt with? Or is the bot transcription of census data ongoing? Dysklyver 11:47, 15 October 2017 (UTC)

RfC about mass changes to California census figures
Should the mass changes in the California census figures made recently by User:CarlosSuarez46 be maintained or rolled back? BeenAroundAWhile (talk) 19:51, 30 September 2018 (UTC)
 * If you go to Special:Contributions/Carlossuarez46, you will see a full page of changes to California census figures which are unnecessary and break the flow of the sentences. Also, the Edit summaries are quite incomplete, explaining nothing. I believe all these changes, made without consensus, should be rolled back and a full discussion made as to their necessity. Yours, BeenAroundAWhile (talk) 19:51, 30 September 2018 (UTC)
 * The current wording of the demographics of California places based on 2010 census reports show numbers and percentages of owner-occupied vs. renter-occupied housing units. The percentages are based on occupied housing units not total (there are vacant units, naturally). It was pointed out that what was missing was a report of occupied units so that the percentages match. Now, it may or may not be necessary to clarify, since both owner-occupied and renter-occupied units must necessarily form the entirety of "occupied" rather than "total" housing units. I'll leave that to community. Feel free to rollback all my changes if you think that's best - as you propose. If clarification is not needed, so be it. If it is, I have made the clarifications through Inyo County alphabetically and someone else can carry on. Carlossuarez46 (talk) 20:40, 30 September 2018 (UTC)
 * See Talk:Furnace Creek, California for the background of what started this series of edits. Bottom line: there are thousands of articles with incorrect 2010 census data about housing, due to a mistake in a mass edit from 2012. is attempting to fix (which I think is admirable: I could not figure out a way to do it using AWB). I believe that these edits are necessary.
 * Now, I agree with that the phrasing of the correction is awkward. For example, in Winterhaven, California, the corrected sentence reads
 * There were 186 housing units at an average density of 781.4 per square mile (301.7/km²), of which 151 were occupied, of which 62 (41.1%) were owner-occupied, and 89 (58.9%) were occupied by renters.
 * The two sequential "of which" are awkward, IMO. Carlos, would it be possible to break into two sentences? Like this:
 * There were 186 housing units at an average density of 781.4 per square mile (301.7/km²), of which 151 were occupied. Of the occupied housing, 62 (41.1%) were owner-occupied and 89 (58.9%) were occupied by renters.
 * I realize that Carlos is manually performing these edits, so adding complexity to the edits is probably quite painful. What do editors think? —hike395 (talk) 01:58, 1 October 2018 (UTC)
 * The second "which" could be changed to "these", but I leave it to you guys to decide whether to do anything and what to do. Carlossuarez46 (talk) 17:44, 1 October 2018 (UTC)
 * I hope this has been fixed. BeenAroundAWhile (talk) 05:02, 6 December 2018 (UTC)
 * I hope that Carlos decided to finish the edits --- better to have awkward phrasing than incorrect data in Wikipedia. —hike395 (talk) 11:26, 16 December 2018 (UTC)
 * He was going to finish it. Carlossuarez46 (talk) 19:07, 17 December 2018 (UTC)

A new newsletter directory is out!
A new Newsletter directory has been created to replace the old, out-of-date one. If your WikiProject and its taskforces have newsletters (even inactive ones), or if you know of a missing newsletter (including from sister projects like WikiSpecies), please include it in the directory! The template can be a bit tricky, so if you need help, just post the newsletter on the template's talk page and someone will add it for you.
 * – Sent on behalf of Headbomb. 03:11, 11 April 2019 (UTC)