Wikipedia:Wikipedia Signpost/2017-06-09/Op-ed

In the 9th edition of the Encyclopædia Britannica, editor Thomas Spencer Baynes introduced the convention of including a person's birth and death year after their name in all biographical articles: CAMPBELL, John, LL.D. (1708–1775), a miscellaneous author, was born at Edinburgh, March 8, 1708. This allowed a reader to more easily distinguish between the 100+ notable people named John Campbell (only one of whom was actually lucky enough to get an article in the 9th edition). Although this convention was a bit awkward and redundant, it served a useful purpose (in the absence of disambiguation pages), and was kept in all subsequent editions.

When Wikipedia was created in 2001, it sought to emulate the successful model of the Encyclopædia Britannica and many editors adopted the convention of including birth and death years in the lead sentence. Here is the lead sentence for Christopher Columbus as it appeared on June 13, 2001: "Christopher Columbus (1451?–1506) was a probably Genovian sailor who crossed the Atlantic in service of Spain." Little did Thomas Spencer Baynes realize, Wikipedia editors would eventually expand on his convention, including not only birth and death years, but entire birth and death dates, birth and death dates in alternate calendars, birth and death locations, alternate names, maiden names, foreign names, pronunciations, foreign pronunciations, and transliterations. , here's what Christoper Columbus's lead sentence had become: "Christopher Columbus (Cristoffa Combo; Cristoforo Colombo; Cristóbal Colón; Cristóvão Colombo; Christophorus Columbus; born between 31 October 1450 and 30 October 1451 in Genoa – died on 20 May 1506 in Valladolid) was an Italian explorer, navigator, colonizer, and citizen of the Republic of Genoa."

What began as a concise, encyclopedic sentence had slowly grown into a sprawling mess of multiplying metadata—a sentence so complicatingly packed as to render it unreadable. This isn't just a subjective opinion, either. If you chart the Flesch Reading Ease score of the sentence over the years, you'll see an almost continuous decline since 2002. This is by no means an isolated example, either. The metadata virus has spread from biographical articles to other subjects as well, like geography: "Israel (יִשְׂרָאֵל Yisrā'el; إِسْرَائِيل Isrāʼīl), officially the State of Israel (מְדִינַת יִשְׂרָאֵל ; دَوْلَة إِسْرَائِيل Dawlat Isrāʼīl ), is a country in the Middle East, on the southeastern shore of the Mediterranean Sea and the northern shore of the Red Sea."

The problem has become so noticeable that many reusers of Wikipedia content (including the WMF itself) have started stripping out parenthetical phrases from the lead sentence in certain contexts. If you search for "Christopher Columbus" on Google, you'll see a much more digestible description, both in the Knowledge Graph and under the Wikipedia search result. If you turn on the Page Previews beta feature in your preferences and hover over Christopher Columbus, you'll also see a much shorter version. The Wikipedia apps even experimented with removing parenthetical phrases from the lead sentences in the articles themselves. This has led to heated debates about whether or not we are potentially removing important information (as some parenthetical phrases consist of content other than metadata). Without a clear way to identify which parenthetical phrases are useful and which are detrimental, I'm sure these issues will remain unresolved. What's really needed is a vigorous debate by the Wikipedia community about how to bring this problem under control and make our articles readable again.

If we don't take significant steps to address this problem, the metadata disease is only going to keep multiplying and spreading. If left unchecked, I fear this is what our future will look like:

[Excerpt from the Americapedia article about Wikipedia, copyright 2034, used with permission.]

...Like frogs in a pot of boiling water, the proliferation of lead sentence metadata happened so slowly that no one noticed until 2021 when John Seigenthaler's son published a devastating video on ClickNews in which he read aloud the lead sentence of his Wikipedia article, and then wept for 3 minutes. "John Michael Seigenthaler (born December 21, 1955 in Nashville, Tennessee, current resident of Weston, Connecticut (as of 2008), not yet deceased), also known as John Seigenthaler Jr. (German: John Seigenthaler jünger, ), is an American news anchor, most recently working for ClickNews." Seigenthaler's video caught the attention of the recently re-elected Donald Trump, who only weeks before had dissolved The New York Times and Washington Post by executive order. Trump immediately posted a flurry of tweets eviscerating the venerable online encyclopedia. By the next day, Wikipedia was no more. Let's avoid this sorry fate and make Wikipedia great again!