Wikipedia:Reliability of GNIS data

Wikipedia has thousands of "populated place" stubs which were mass-created from the United States government's Geographic Names Information System (GNIS) database. Unfortunately, a major flaw has been found in this source: GNIS has labeled many locations as "populated places" in error rather than as a locale or another more accurate category. There are countless instances of discrepancies between the GNIS and print versions of the National Gazetteer, a publication of the USGS with the same entries. This means that everything from small homesteads to railroad junctions to river crossings have been mislabeled as "populated places".

Feature classes
Geographic Names Information System is the official repository for place names in the United States, with a database of over 2 million natural and man-made features. Entries are compiled from sources such as atlases, gazetteers and topo maps.

Each place is assigned an official name and a "feature class" such as Park, School, Dam, Populated Place or Locale. Locale is meant to encompass miscellaneous human-made features such as battlefields, campgrounds, farms, railroad sidings, windmills, etc. However, since the topo maps that provide the bulk of GNIS entries do not clearly distinguish between locale-type features and cities/towns/villages/hamlets, many of these were incorrectly transcribed as "populated places", a label that is supposed to apply to "... a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between." That's right: Many of our "populated place" articles are only labelled as such because an employee poring over a map missed a subtle difference in typeface.

It's difficult to prove that there was never a human settlement at a given location, but in many cases it's been found that the place name has only been used in conjunction with a railroad siding, ranch, windmill or other feature. For example, Haberman, NY was the location of a train station built to serve the Haberman Manufacturing Company in Queens, and the USGS employee who added the location to the database failed to recognize the subtle difference in spacing which was used to distinguish a train station from a community on the topo map. This particular error doesn't seem to have been repeated by Wikipedia since we already had a Haberman station article based on a different source, but it did appear in other GNIS-derived sources such as Google Maps.

Propagation of errors
Errors quickly propagate to other online sources which rely on GNIS for location data. Our AfD for Jolly Dump, South Dakota shows that it was never anything more than a place where railroad cars were loaded and unloaded, yet a Google search brings up the "Things to do in Jolly Dump" Facebook page, a list of nearby FedEx locations, a "Populated Place Profile" with coordinates and elevation copied from GNIS, nearby hotels ("lastminute.com has a fantastic range of hotels in Jolly Dump, with everything from cheap hotels to luxurious five star accommodation available!"), a weather forecast and daylight savings time information. Although this type of coverage is sometimes presented as evidence of notability, they don't meet our "significant coverage" requirement since they're simply copied from another source by an automated program. Wikipedia also forms a link in this chain of errors: When we describe a place as an "unincorporated community", a label that is often completely unsourced, Google Maps copies it as a description of the place.

GNIS itself has been found to propagate questionable information from other sources. Most entries were taken from USGS topographic maps at the smallest scale (1:24000 or 1:25000), but we have also found entries copied from NOAA navigational charts, from Forest Service maps, from promotional maps, from Rand-McNally atlases, from books of place names, and even from a philately journal, as well as items copied from larger scale topographic maps. One can readily deduce that these entries are not reflected in the small-scale topographic maps, which already adds an element of doubt; in the case of the nautical charts, which can be verified online, we have found that the charts were sometimes misread and sometimes bore name labels on shore which could not be reconciled with other maps. Promotional maps tend to list non-notable subdivisions; other sources report 4th class post offices, which were typically just a place in a store or railroad station or even a private residence where people could come to post and pick up their mail.

Official standards
Although GNIS provides the official name of a place, the "feature class" labels do not carry the same official standing. They're simply used for "efficient data search and retrieval purposes" and "have no status as standards". In fact, GNIS specifically does not involve itself in such geographic minutiae as the differences between hills and mountains, lakes and ponds or rivers and creeks. As editors we need to be aware of the purpose and shortcomings of GNIS, using it as a resource where it excels (name and coordinates) while relying on other sources for notability and feature type. After all, our research and editorial discretion is what distinguishes Wikipedia from machine-generated gazeteers such as Hometown Locator.

Feature classes abandoned in 2014
In 2017 the USGS made this announcement: "Data Content: Since GNIS staff has been unable to maintain Domestic administrative names for quite some time (since October 1, 2014), these records will be archived from GNIS database and will longer be available through the GNIS search application. The following feature classes will be archived: Airport, Bridge, Building, Cemetery, Church, Dam, Forest, Harbor, Hospital, Mine, Oilfield, Park, Post Office, Reserve, School, Tower, Trail, Tunnel, and Well." Wikipedia articles bulk-added in earlier years based upon these archived records now link to blank records on the https://edits.nationalmap.gov/apps/gaz-domestic/public/search/names interface to the "gaz-domestic" (NGNDB) database.

Reliability of locations
While the GNIS entries are generally considered accurate, pace several AFD discussions where discussion has been derailed by what turned out to be a single-digit typing error on the part of a data entry clerk, they may not be appropriate. This is because Wikipedia has different rules to the GNIS rules. Further complicating this is that there were alternative forms of the database that substituted co&ouml;rdinate information from the National Map database.
 * Per WikiProject Geographical coordinates/Linear Wikipedia wants the mid-point of linear features. However, the rules for the GNIS data compilation were that the primary co&ouml;rdinate be the "mouth" of the feature and secondary co&ouml;rdinates be any point on the feature as long as it indicated what (other) map(s) the feature crossed.
 * Per WikiProject Geographical coordinates Wikipedia wants the centres of towns and cities. In Payne's own word in the USGS report on GNIS phase 1, the selection of a co&ouml;rdinate for a big town or city is "subjective", and the GNIS rule was, in contrast, to pick a prominent civic feature (town hall, main intersection, main public library, and so forth) rather than attempt a geometric centre.
 * While in phase 1 co&ouml;rdinates were read straight from the markers on the maps, in phase 2 co&ouml;rdinates were interpolated, using contour lines.

FAQ

 * Q: Aren't government sources always reliable?
 * A: They're generally accurate, but like any reliable source they're susceptible to errors.


 * Q: What's the harm in keeping these stubs?
 * A: Wikipedia is a trusted source that many organizations rely on. For example, some of these places appear on Google Maps with descriptions such as "Jones Windmill is an unincorporated community in Smith County", even though the "unincorporated community" designation has never appeared in a reliable source - it was applied by a Wikipedia editor, based on their own interpretation of an erroneous "populated place" label. When we keep these stubs, we play an active role in creating and propagating false information.


 * Q: But it returned 6,000 Google search results - There's even a FedEx office there!
 * A: Many websites use GNIS for automated location data. When you search for real estate listings, store locations or weather reports, the name is used to mark a point on a map and return the requested information. The source isn't saying that the location is notable, probably doesn't do business there and most likely isn't even aware of its existence.


 * Q: If it's listed in GNIS, wouldn't that make it a "populated, legally recognized place" and therefore presumed notable per WP:GEOLAND?
 * A: According to the USGS, "populated place" is a designation for places that are generally not legally defined or recognized: "An entry with Feature Class = Populated Place represents a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between. The boundaries of most communities classified as Populated Place are subjective and cannot be determined." Wikipedia doesn't have a specific definition of what qualifies as a "legally recognized populated place", but repeated discussions have concluded that simply being listed in a government database or appearing on a map does not meet the requirement.

Relevant AfDs
To illustrate the range of misidentified places, here is a list of AfD discussions of GNIS "populated places":
 * Wikipedia:Articles for deletion/Susie, Washington – 15 industrial rail spurs within the Hanford Site in Washington State, shown to be named after various railroad employees
 * Wikipedia:Articles for deletion/Headquarters, Arizona – The headquarters building for Petrified Forest National Park in Arizona
 * Wikipedia:Articles for deletion/West Junction, Illinois – A railroad junction west of the city
 * Wikipedia:Articles for deletion/Road Junction Windmill, Arizona – Several windmills
 * Wikipedia:Articles for deletion/Bormister, California – Numerous individual ranches
 * Wikipedia:Articles for deletion/Monkey Box, Florida – A literal box in the marshes of Lake Okeechobee, listed as a community
 * Wikipedia:Articles for deletion/Willy Dick Crossing, Washington – A river crossing
 * Wikipedia:Articles for deletion/Saint Joseph Youth Camp, Arizona – A summer camp
 * Wikipedia:Articles for deletion/Fish Pond, Kentucky – A literal pond, listed as a populated place
 * Wikipedia:Articles for deletion/Caldwell Pines, California – a stand of trees
 * Wikipedia:Articles for deletion/Aurant, California (2nd nomination) - Formerly a railroad station, now a railyard
 * Wikipedia:Articles for deletion/Silver Hill, Charlton County, Georgia - A deserted swamp with no buildings mislabeled as a community
 * Wikipedia:Articles for deletion/Scoria Point Corner, North Dakota - scenic overlook in a national park, listed as a "populated place"

Cleanup efforts

 * User:SportingFlyer/Arizona placenames cleanup &mdash; includes a list of river crossings
 * WikiProject California/GNIS cleanup task force, projects include
 * User:Hog Farm/springs &mdash; a effort to deal with all of the "Something Springs, California" articles, once we found a book that documented the histories of the springs in California; now mainly done, with a few articles still requiring more detailed attention, because the documentation was just a mention or it wasn't in the book at all or some other reason
 * Articles for deletion/Acors Corner, Virginia, Articles for deletion/Allen Shop Corner, Virginia &mdash; An old surveying technique in centuries past was the use of marker trees, variously corner trees and line trees, to mark the boundaries between properties. Many (but not all) "corners" that have survived on maps that appear at intersections and boundaries are not actually the populated places at all, but rather the marker trees, usually named after the property owner of the property that they primarily stood on, that are between the populated places.    (Interesting factoids:  It was expressly illegal under federal law and under state law in states such as West Virginia to remove a corner tree back in the 19th century; and one important point of West Virginian law was whether one could take the word of a dead person about which tree was a marker tree.  )
 * WikiProject Kentucky/GNIS cleanup
 * User:Hog Farm/Kentucky &mdash; articles that from their names alone might be problematic
 * WikiProject Minnesota/GNIS cleanup
 * WikiProject Washington/GNIS cleanup
 * User:Hog Farm/Missouri attention needed

Books to check against
There are usually Arcadia Publishing books for a particular locality. Arcadia books are not the be-all-and-end-all, but they do point the way and are generally the results of local historians already having done for us the poring over old maps, records, and photographs. Arcadia (and other local history) books helped sort out and Escalle, Larkspur, California; helped identify what  actually was; and conversely made the cases stronger against the likes of. All of these were two-sentence GNIS-only stubs at the time of deletion nomination, all claiming "unincorporated community".


 * Gazetteers : These are useful for telling whether an "unincorporated community" that is just a dot nowadays is a historical post-town/post-village or only a post office; that then might be found in local county/state histories. Lippincott's, in particular, has a uniform scheme for this.  Take care about dates, of course.


 * Books of place names: In many states people took it upon themselves to identify the origins of the names of places within the state. These vary in quality but have often helped to clarify matters by giving a more specific characterization of the places in question. We have found these used as GNIS sources, often quite badly.


 * Old local histories: As with the place names books, quality is variable, and those from around 1900 tend to be a bit gushing in their praises of the forefathers and heavy on the anecdotes. That said, their age (typically with a few decades of the foundation of the places, at least outside the east coast) and attention to detail can help resolve matters.