Wikipedia:Geographical names

Wikipedia has over 700,000 articles about geographical entities such as villages, districts, lakes, rivers, mountains and protected areas. Their infoboxes vary considerably in layout and the information they support. The article title holds the common English form but the article may also give the common names used in the local language(s), official names, former names, other names and nicknames. Non-Latin script may be followed by a romanized or phonetic form.

All non-English forms of a name should be marked up so they are rendered correctly by a screen reader. This essay proposes standard ways to gather, validate and format the different names in the article text and in infoboxes, and outlines a migration approach. The core proposal is to adapt all the geographical entity infoboxes to use a standard child template, infobox geonames, which will undertake validation and formatting of the names.

Current situation
There are several hundred geo-infoboxes used in over 700,000 articles about geographical entities. As of February 2022 Infobox settlement was used in over 543,000 articles, Infobox river in 28,870, Infobox mountain	in 26,448, Infobox building in	24,502, and so on down to a long tail of infoboxes like Infobox Tibetan Buddhist monastery (286 articles) or Infobox dive site (18 articles). As shown in (below) the infoboxes are very inconsistent in the name-related parameters they accept, and as shown in  (below) they are also very inconsistent in the format they render.

Non-English names are common even in countries where English is the national language. A place in California might have former names in Spanish and indigenous languages. A place in England may have former names in Common Brittonic or Old English. In France, there may be variants of local names in Breton, Occitan or Corsican. India has a wealth of languages and scripts. Due to lack of consistent support for non-English names, editors may struggle with the default formatting, as with Introducing standard validation and formatting for names in all geo-infoboxes will give a more consistent reader experience, reduce accessibility problems with screen readers, and make life easier for editors.
 * |native_name = 四国
 * |native_name = Anadolu Selçuklu Devleti سلجوقیان روم Saljūqiyān-i Rūm

Identifying languages
Non-English names are often formatted using lang or native name. However, both these templates require a 2- or 3-digit ISO code. Many editors do not know what these codes are, and many former place names are in languages that do not have an ISO code. Thus River Derwent (Tasmania) was originally called timtumili minanya in the Mouheneener language. Sometimes the language is unknown. An explorer may have recorded what the "natives" called the place, but failed to record the natives' ethnic group.

The solution is to enhance the lang and native name templates, or create a new template to allow the full names of languages as an alternative to the ISO code. Thus and  should both be accepted and render the same result. infobox geonames would implement the same logic. The enhanced or new template should also accept and display a romanised or phonetic version of the name. E.g.
 * If a language is not found in the list of ISO codes that gives corresponding language names, check for it in a list of language names that gives corresponding ISO codes
 * The second list may include languages such as Chirr, Phuthi or Erzgebirgisch with ISO code "mis", meaning they have no ISO code
 * Both lists will also include the name of the Wikipedia article for the language, for use as a link
 * If the language is not known, use the language code "und"
 * Use the ISO code for HTML tagging and the corresponding language name for display purposes
 * Flag articles with unrecognized languages for manual follow-up
 * or

would render
 * with the non-Latin name tagged with the html lang=ar.

Standard infobox parameters
See (below) for parameters used in different infoboxes. Assuming the parameter names used in infobox settlement will prevail, and that official names, native names and other names can all have languages and may all have Romanized forms, the parameters could be

Comparison of alternatives
In both alternatives the editor must enter the same information:
 * official_name      = name
 * official_name_lang = language
 * official_name_roman = roman form

or
 * official_name      =  &#123;{lang2| language | name | roman form }}

The first format is probably slightly easier for the novice editors, who may be put off by the curly brackets and vertical bars in the second form. Articles about major geographical entities like Cairo, Brahmaputra River or Mount Everest attract seasoned editors who can deal with formatting issues. But the majority of geographical articles are stubs like Orto, Corse-du-Sud, Maquan River or Klinkit Creek Peak, where the editors may find even a simple infobox a bit of a challenge.

The first form also makes it easier to ensure that languages are rendered correctly, since the infobox geonames template can see and validate all the parameters, for example checking for unusual characters in a name such as ":" or "(" that may indicate attempts to pre-format them. With the second approach infobox geonames can only see the result rendered by, and cannot be sure that only the correct formatting template has been used. This essay therefore recommends the first, explicit alternative.

Rendered layout
See for the various ways in which geographical infoboxes render name information. There is no reason why they should be so inconsistent. The obvious way to standardize collection, validation and rendering of name data is to use a child infobox that can be shared by all the geographical entity infoboxes. To demonstrate, Infobox geonames parent embeds child infobox geonames, which formats the names. This is just a crude mock-up of the alternative 2 format, with no real validation and formatting, but illustrates the concept. The code at the left (or below on a phone) renders the result at the right.

This is a rough first cut. The format rendered by infobox geonames should be carefully reviewed and adjusted. Logic must be added to validate the languages and ensure that names, languages, non-Latin scripts and lists of names are formatted correctly, and titles must be pluralized as needed. But once this is done, the standard validations and formatting will then be picked up automatically by all geo-infoboxes that embed infobox geonames.

General migration approach
lang, native name etc. should be enhanced to support language names as an alternative to language codes, and to support romanized or phonetic forms. This can be done at any time, and will have no impact on existing articles.

Migration to a more standard way of collecting, validating and formatting names can be done infobox by infobox. Two types of change may be introduced independently:
 * Every effort should be made to minimize disruption.
 * A geo-infobox change that introduces red error messages in the text of many articles where there were no error messages before is unacceptable
 * The preferred approach is to flag issues using a hidden tracking category, and allow gnomes to work through the flagged formatting replacing it by the new standard. Once almost all the non-standard formatting has been eliminated, the geo-infobox may start to render red error messages.


 * 1) The geo-infobox is changed to use the new infobox geonames
 * 2) The geo-infobox is changed to eliminate non-standard parameter names

Converting to &#123;{infobox geonames}}

 * The first step for each geo-infobox is to obtain agreement on its talk page and associated project talk page to migrate to the standard infobox geonames
 * A version of the geo-infobox using infobox geonames is prepared and carefully tested
 * This version will use the standard parameter names, but will also accept variants to provide backward compatibility
 * Assuming no problems, the standardized geo-infobox template will be cut into production, passing "mode=transition" to infobox geonames. In this mode, infobox geonames will populate tracking categories with error messages, but will attempt to format the data provided, and will not generate red error messages.
 * Once the tracking categories have mostly been cleared, the geo-infobox will start passing "mode=strict" to infobox geonames. In this mode, infobox geonames will generate red error messages

Standardizing parameter names
In the long run, it will be easier for editors if all geo-infoboxes use the same names for the same parameters. Providing support for the standard parameter names is important. Removing variant usage is less important, and should not be allowed to get in the way of the main thrust to standardize name validation and formatting.
 * The geo-infobox passes infobox geonames parameters with the standard names, but also passes the old parameter names:
 * The documentation is changed to show both parameter names:
 * At some point, the old name is deprecated, with articles that use it put into maintenance categories
 * Gnomes work through changing to the standard parameter names
 * Eventually the old parameter names are dropped, and flagged as errors when the article is in edit mode
 * Gnomes work through changing to the standard parameter names
 * Eventually the old parameter names are dropped, and flagged as errors when the article is in edit mode

Sample infobox templates
See Category:Place infobox templates for the complete set.

Miscellaneous not reviewed:


 * Infobox attraction 690
 * Infobox border 61
 * Infobox campground 102
 * Infobox cave 786
 * Infobox climbing area 29
 * Infobox climbing route 53
 * Infobox cycling path 173
 * Infobox dive site 18
 * Infobox farm 57
 * Infobox fictional location 342
 * Infobox forest 341
 * Infobox political division 74
 * Infobox port 778
 * Infobox port-of-entry 179
 * Infobox property development 74
 * Infobox seamount 201
 * Infobox ski area 807
 * Infobox ski jumping hill 110
 * Infobox spring 253
 * Infobox terrestrial impact site 152
 * Infobox urban feature 201
 * Infobox waterlock 151
 * Infobox water park 142

Not checked:


 * Category:Place infobox templates by country‎ (33 C)
 * Category:Buildings and structures infobox templates (5 C, 67 P)
 * Category:Constituency infobox templates (22 P)
 * Category:Country infobox templates‎ (10 P)
 * Category:Country subdivision infobox templates‎ (5 C, 3 P)
 * Category:IUCN Protected Area infobox templates‎ (3 P)
 * Category:Templates calling Infobox settlement‎ (26 P)

Current usage examples
The examples below are taken from articles as of February 2022, with the infoboxes edited to remove information other than names, and to show a standard image. They illustrate the varied visual styles and approaches to presenting names, partly imposed by the infobox templates, and partly chosen by the editors.

Island

Borneo (Kalimantan) is the third-largest island in the world and the largest in Asia. At the geographic centre of Maritime Southeast Asia, in relation to major Indonesian islands, it is located north of Java, west of Sulawesi, and east of Sumatra.

Country

Albania (Shqipëri or Shqipëria), officially the Republic of Albania (Republika e Shqipërisë), is a country in Southeastern Europe. It is located on the Adriatic and Ionian Sea within the Mediterranean Sea and shares land borders with Montenegro to the northwest, Kosovo to the northeast, North Macedonia to the east and Greece to the south. Tirana is its capital and largest city, followed by Durrës, Vlorë and Shkodër.

Settlement

Brussels (Bruxelles or ; Brussel ), officially the Brussels-Capital Region (Région de Bruxelles-Capitale; is a region of Belgium comprising 19 municipalities, including the City of Brussels, which is the capital of Belgium. The Brussels-Capital Region is located in the central portion of the country and is a part of both the French Community of Belgium and the Flemish Community, but is separate from the Flemish Region (within which it forms an enclave) and the Walloon Region. Brussels is the most densely populated and the richest region in Belgium in terms of GDP per capita. The five times larger metropolitan area of Brussels comprises over 2.5 million people, which makes it the largest in Belgium. It is also part of a large conurbation extending towards Ghent, Antwerp, Leuven and Walloon Brabant, home to over 5 million people.

Airport

Frankfurt Airport (Flughafen Frankfurt Main, also known as Rhein-Main-Flughafen), is a major international airport located in Frankfurt, the fifth-largest city of Germany and one of the world's leading financial centres. It is operated by Fraport and serves as the main hub for Lufthansa, including Lufthansa CityLine and Lufthansa Cargo as well as Condor and AeroLogic. The airport covers an area of 5683 acres of land and features two passenger terminals with capacity for approximately 65 million passengers per year; four runways; and extensive logistics and maintenance facilities.

Ancient site

Nineveh (نَيْنَوَىٰ Naynawā; ; ) was an ancient Assyrian city of Upper Mesopotamia, located on the outskirts of Mosul in modern-day northern Iraq. It is located on the eastern bank of the Tigris River and was the capital and largest city of the Neo-Assyrian Empire, as well as the largest city in the world for several decades. Today, it is a common name for the half of Mosul that lies on the eastern bank of the Tigris, and the country's Nineveh Governorate takes its name from it.

Bridge

The Band-e Kaisar, Pol-e Kaisar ("Caesar's bridge"), Bridge of Valerian or Shadirwan was an ancient arch bridge in Shushtar, Iran, and the first in the country to combine it with a dam. Built by the Sassanids, using Roman prisoners of war as workforce, in the 3rd century AD on Sassanid order, it was also the most eastern example of Roman bridge design and Roman dam, lying deep in Persian territory. Its dual-purpose design exerted a profound influence on Iranian civil engineering and was instrumental in developing Sassanid water management techniques.

Building

The Palace of Versailles (Château de Versailles ) is a former royal residence located in Versailles, about 12 mi west of Paris, France. The palace is owned by the French Republic and has since 1995 been managed, under the direction of the French Ministry of Culture, by the Public Establishment of the Palace, Museum and National Estate of Versailles. 15,000,000 people visit the Palace, Park, or Gardens of Versailles every year, making it one of the most popular tourist attractions in the world. However, due to the COVID-19 pandemic, the number of paying visitors to the Chateau dropped by 75 percent from eight million in 2019 to two million in 2020. The drop was particularly sharp among foreign visitors, who account for eighty percent of paying visitors.

Historic site

Diocletian's Palace (Dioklecijanova palača, ) is an ancient palace built for the Roman emperor Diocletian at the turn of the fourth century AD, which today forms about half the old town of Split, Croatia. While it is referred to as a "palace" because of its intended use as the retirement residence of Diocletian, the term can be misleading as the structure is massive and more resembles a large fortress: about half of it was for Diocletian's personal use, and the rest housed the military garrison.

Mountain

The Central Eastern Alps (Zentralalpen or Zentrale Ostalpen), also referred to as Austrian Central Alps (Österreichische Zentralalpen) or just Central Alps, comprise the main chain of the Eastern Alps in Austria and the adjacent regions of Switzerland, Liechtenstein, Italy and Slovenia. South them is the Southern Limestone Alps.

Body of water

Lake Sevan (Սևանա լիճ) is the largest body of water in both Armenia and the Caucasus region. It is one of the largest freshwater high-altitude (alpine) lakes in Eurasia. The lake is situated in Gegharkunik Province, at an altitude of 1900.44 m above sea level. The total surface area of its basin is about 5000 km2, which makes up $1/undefined$ of Armenia's territory. The lake itself is 1264 km2, and the volume is 32.8 km3. It is fed by 28 rivers and streams. Only 10% of the incoming water is drained by the Hrazdan River, while the remaining 90% evaporates.

River

The Nile is a major north-flowing river in northeastern Africa. It flows into the Mediterranean Sea. The longest river in Africa, it has historically been considered the longest river in the world, though this has been contested by research suggesting that the Amazon River is slightly longer. The Nile is amongst the smallest of the major world rivers by measure of cubic metres flowing annually. About 6650 km long, its drainage basin covers eleven countries: Tanzania, Uganda, Rwanda, Burundi, the Democratic Republic of the Congo, Kenya, Ethiopia, Eritrea, South Sudan, Republic of the Sudan, and Egypt. In particular, the Nile is the primary water source of Egypt, Sudan and South Sudan. Additionally, the Nile is an important economic river, supporting agriculture and fishing.

Valley

The Alay Valley (Алай өрөөнү, ) is a broad, dry valley running east–west across most of southern Osh Region, Kyrgyzstan. It spreads over a length of 174 km east–west. The valley extends in north–south direction with varying width of 27 km in the west, 40 km - in the central part, and 3–7 km - in the east. The altitude of the valley ranges from 2,440 m near Karamyk to 3536 m at Toomurun Pass with an average altitude of about 3000 m. The area of the valley is 8400 km2. The north side is the Alay Mountains which slope down to the Ferghana Valley. The south side is the Trans-Alay Range along the Tajikistan border, with Lenin Peak, (7134 m). The western 40 km or so is more hills than valley. On the east there is the low Tongmurun pass and then more valley leading to the Irkestam border crossing to China.