User:Herostratus/The Hundred

We looked at 100 random articles to see how many matched WP:GNG -- "the GNG", and here's what we found.

Executive summary
34 meet the GNG, 66 don't. 56 meet the GNG or an SNG (Special Notability Guideline, like WP:NSPORTS etc.). Guessing (it's kind of wild guess), it's maybe like this: Some of the ones in trouble could meet the GNG with extra effort (finding offline refs that might be only in a local newspaper archive, offline book in an Italian library, stuff like that) and some couldn't.
 * 49 covered by the GNG or (we guess) could be without too much trouble.
 * 22 covered by an SNG
 * 29 in trouble.

Of the 29 in trouble, most don't actually suck. Most of them are OK articles on some level.

If we cut out 29% of our articles, that'd be 1.5 million, dropping us from 5.4 million to 3.9 million articles --still more than the two next-biggest Wikipedias combined. Whether that's desirable we don't know; it's a matter of opinion we guess. We would not lose 29% of our readership. We would look a little less like an academic publication and a little more like a popular publication (a lot of our more obscure science stuff doesn't meet the GNG, while celebrities and popular culture stuff is better documented (and many athletes are under SNG).)

You're keeping In the Flesh (Pink Floyd tour) and 1977 in video gaming and Southridge Mall (Iowa) and Hell Is Empty and All the Devils Are Here and Caterina Scorsone and so forth. You're getting rid of Kanichee layered intrusive complex and Literacy in Tokelau and Gogana conwayi and List of people on the postage stamps of Nigeria and Grant of arms and so forth.

How desirable this is a matter of opinion and depends on your vision of what we are trying to accomplish here. It is what the GNG does for you, though. If it was policy.

Background
The lede sentence of the GNG states "If a topic has received significant coverage in reliable sources that are independent of the subject, it is presumed to be suitable for a stand-alone article or list". (Although the GNG doesn't say so, many (but not all) Wikipedians assume a second sentence: "And if not, probably not".)

The GNG then gives details on what is meant by "reliable sources" and "independent of the subject" and also valorizes secondary sources, and what is meant by "Presumed" (an assumption, not a guarantee, that a subject should have an article).

Note that the GNG says reliable and not notable. Contrast a subject covered with articles in reliably fact-checked local paper with a circulation of a few hundred souls, versus coverage in venues such as the Daily Mail (circulation 1.5 million)... the latter is far more famous, but only the former is "notable" in our sense. "Notable" as the GNG uses it does not mean "famous", "important", or even "notable" in the dictionary sense. It means "covered in peer-reviewed, fact-checked, or otherwise reliable sources".

However, our personal opinion that there is some intended relationship between our use of "notable" and the dictionary word "notable" (notable, adj.: Worthy of notice; remarkable; memorable; noted or distinguished; prominent). Otherwise a different word would have been chosen, we figure. How close the the relationship (if, indeed, any) is intended to be is a matter of opinion.

Anyway, as practical matter, many or even most Wikipedians will disparage local papers per the essay WP:LOCAL, and some other low-readership special-interest publications also. The question of whether there's a difference between these (assuming they're all shown to be reliable): is a matter of opinion and the details of the case, we guess. Whether these are equal to the New York Times or Nature... also a matter of opinion, but many editors would probably say "not necessarily". Bottom line is that there's probably, de facto, different weighing for different sources rather than a simple YES/NO status for potential sources. Different weighing means different opinion and different judgement.
 * a local town news website with an estimated ~800 regular viewers (almost all concentrated in one small town)
 * a Moody Blues-themed website with an estimated ~800 regular viewers, spread throughout the Anglosphere
 * a Franco-Prussian War-themed website with an estimated ~800 regular viewers, spread throughout the world
 * a narrowly focused scientific journal (Indian Fern Review, say) with an estimated ~800 readers

The GNG also tries to address the key matter of what is meant by "significant coverage" (or "in-depth coverage", same thing), but fails. It says with no discussion of the how to handle material in the huge gap between these two examples (which probably includes >99% of cases).
 * an entire book about the subject is "significant coverage", and
 * a passing mention in part of a single sentence is not "significant coverage",

Left thus at sea, for our part we personally use the two-paragraph rule -- two short paragraphs, or one long one, generally are enough to provide enough information to write at least a very short article, and this eligible to be considered "significant coverage". It depends on the subject and what's in the material of course. The GNG says "If a topic has received significant coverage in reliable sources, not "a reliable source", so we assume that, generally, two sources are wanted, which seems reasonable. ([[WP:BIO, the SNG (Special Notability Guideline) for all people, makes this manifest: "significant coverage in multiple published secondary sources" is required. That is only for people, though.)

The GNG is contained within WP:N, which has a great deal more to say about details and special circumstances and considerations. Generally, people just use the GNG except in special cases. There are also many SNGs (Special Notability Guidelines) listed at the top of WP:N. The relationship of SNGs to the GNG -- whether they override the GNG, are co-equal, or are subsidiary -- is matter of some contention.

So to summarize, to meet the GNG's requirements, in our personal opinion:
 * You want at least two sources, usually.
 * That cover the subject with something on the order of at least a couple short paragraphs, or equivalent.
 * In a publication that is reliable, but with a weighting difference between a publication that has 80 full-time fact checkers on staff (Der Spiegel) and one that has a lot fewer, and a weighting difference between a publication with a daily circulation of 150,000 and one with a quarterly circulation of 500, since we assume our "notable" has some relationship to the common adjective "notable".

But there are several ways to skin a cat, and being slightly inclusionist, we sometimes make arguments along the following lines. Sometimes they help and sometimes we don't. This is just our own idiosyncratic personal standard and in no way official.
 * If there is one source that greatly exceeds the requirements -- say, a long article in a very reliable and widely read magazine -- we can count that as going a long way to helping establish notability, and look for little supporting coverages to help seal the deal, if there are enough of them, even if individually they are not that great.
 * Although mere mentions don't count much, they're not nothing, if there are a lot of them, that might count for something, especially if they are in highly notable publications, and they are full sentences addressing the entity rather than mere listings or passing name-checking.
 * Although you want somewhat notable publications, there's no bright line, and if there are several instances of truly in-depth coverage in special-interest publications -- death-metal magazines, Brazilian botany journals, model railroader magazines, etc. -- that might count for something.

Results

 * 25 "Yes" or "Yes, probably" or "Yes,possibly" (meeting the GNG)
 * 11 "No, probably"
 * 64 "No"

We'll count the "No, probably" as "Yes", to be liberal. That makes 36% Yes.

In: 5 athletes, 2 musicians, a singing duo, 2 actors, a businessman, a judge, "1977 in video gaming", a machine, a drug, a medical device, 3 films, 2 albums, a band tour, a detail of a music contest, an historical event, 2 populated places (one defunct), an ancient site, a school, a shopping mall, a butterfly, a fungus, a plant, a fluid dynamics phenomenon, a cosmic gas cloud, an electromagnetic wavelength, and a cheese.

Out: 6 athletes, a soldier, 2 actors, an entertainer, a priest, a bishop, a politician, an artist, a chess player, an historian, a civil servant, a list of people, 2 schools, 3 buildings, a rail station, a train, 17 populated places, a list of prisons, an organization, a beauty pageant, 5 sporting events, 2 rock formations, a mountain, a fungus, 2 moths, a butterfly, an extinct genus, a census survey, an election, a business term, a game mod, a concept in heraldry, and some comic books.

A lot of those are covered by SNG (Special Notability Guidelines). The relationship between SNG and the GNG -- whether SNG supercede the GNG, are subsidiary to it, or supplemental, or in some other relationship -- is a fraught question and the answer depends on who you talk to. WP:ATHLETE, a significant GNG, lays down a marker in in its first sentence, contradicts it in the second, and contradicts both those in the third. It then goes on to muddy the waters with more temporizing and hedging before it peters out. Other SNGs say different things. Some claim to supersede the GNG and some don't.

As a fact on the ground, sometimes SNGs are taken to supercede the GNG and sometimes not. (See the 2017 Magdalena Zamolska case -- subject meets SNG WP:NCYCLING but not WP:GNG, was deleted anyway on the grounds that WP:GNG supersedes WP:ATHLETE, upheld at Deletion Review -- indicates that it's a debatable situation, at least at the margins.)

Anyway, continuing... of the group not meeting the GNG, SNG cover 14 of the 17 populated places, the politician, potentially 3 of the 6 athletes (the footballer, the gymnast, and the wrestler (there's no wresting SNG but he was in the Olympics) but not the squash player, the race car driver, or the wheelchair racer), the mountain (mountains have a very liberal SNG), one of the give sports events (it was an Olympic event) and the 2 of the 3 buildings. There's a criteria for organizations, but its as stringent as the GNG if not more so. There's an SNG for web content but we don't think our game mod meets it. There no SNG for soldiers but there is a guide made up by the military history wikiproject, WP:SOLDIER, but our soldier doesn't meet it anyway.

So that leaves:
 * 34 covered by the GNG
 * 22 covered by SNG (14 of these are populated places)
 * 44 out in the cold

Of the 45 out in the cold, many could probably be ref'd. Less than half probably. Some could be ref'd with difficulty, such as by sources only existing in paper form and hard to get to. Maybe Saidali Iuldachev if you went to Uzbekistan and dug up hard copies of local papers and could read Uzbek, that sort of thing. Let's say 1/3 could be ref'd with a reasonable level of effort. We don't know if it's that high, but let's say.


 * 34 covered by the GNG
 * 22 covered by SNG
 * 15 probably could be ref'd with a reasonable effort
 * 29 on the ice

Commentary
29 on the ice plus 21 covered only by SNG. That's 50% that don't and maybe can't easily meet the GNG, we have 5.4 million articles so that extrapolates to 2.7 million. On the ice and not covered by an SNG, 1.6 million. So even if we took all SNG as gospel we could reduce from 5.4 million to 3.8 million. That's still as many as the German and French wikipedias combined, and those are considered successful projects. If we go strictly by GNG we could go down to 2.7. That's still more than any other language wikipedia -- much more. It may be that these other language project have the right idea and we've metastasized out of control. Those other wikipedias also probably have less vandalism, need fewer admins, fewer new page patrollers, and so forth.

Of course, even with SNG you're getting rid of St Francis Xavier's Cathedral, Wollongong. It's far more important than the two buildings we're keeping (one is just a private house), but those two are on the American National Register of Historic Places list (which is large) and the Cathedral isn't on a similar Australian list (as far as we can tell).

This shows one of the problems with the SNG -- they are a bit arbitrary. Another problem is that, like a lot of rules here, they're sometimes cobbled together in a somewhat random fashion -- somebody adds a sentence, there's some desultory objections but nobody rolls it back, or there's a flurry of edits or a little edit war and it ends up with a change that nobody really notices or understands, or there's a sparsely attended discussion and the SNG is pushed through, or whatever. Another problem is that its kind of random what has a SNG -- Astronomical objects do, animals don't. Professors have an SNG, businessmen and civil engineers don't. Porn actors have an SNG, religious figures don't. Numbers have an SNG, and so do books, but historical events and chemical compounds don't.

Another problem is de facto SNG. Animals don't have an SNG, but as a fact on the ground animal species articles aren't deleted -- they have a de facto SNG. This applies to plant and fungus species also, we think. Solders don't have an SNG, but they have WP:SOLDIER which is written like and referenced in discussions like an SNG, even though it's never been adopted into WP:N. Secondary schools don't have an SNG, but they have WP:SCHOOLOUTCOMES which is treated kind of like an SNG.

Because of this, the various SNG's can't really all be treated the same and should be neither taken as gospel nor ignored. It's reasonable to say some SNG make a lot of sense and some are nonsense. It's reasonable to say WP:BASEBALL/N should be treated with more deference than WP:NCURLING. It's reasonable to say that WP:PORNBIO is not more important the de facto traditions defending train stations, funguses and high schools, notwithstanding that one is written down and the others aren't. It's reasonable to say that a given criteria of a given SNG is silly and you're don't think people should pay attention to it.

It's also reasonable to not say these things. It's reasonable to hold that all part of all SNG are equally valid and that de facto traditions mean nothing, if that's how you roll. It's reasonable to maintain that SNG mean little and only the GNG counts, and it's reasonable to maintain that SNG supersede the GNG, and it's also reasonable to aver that the GNG itself is just a guideline and place to start.

In our opinion the only guideline that matters is "Is this good for the wikipedia, does it make sense for use, does it fit with our mission". Everything else is noise.

As to trimming the project by 20% or 50% or whatever (which would take years, but the years are going to pass anyway)... we're skeptical that that's a good idea, but we're not sure it isn't either. It depends on one's answer to the question "What are we trying to accomplish here?" and that's a matter of opinion.

How many suck?
How many of these 100 articles actually suck and should not exist? Somewhere between 1 to 6 in our view. Your results may differ.

There weren't really any articles that were horrible -- speedy-deleteable, or just promotional, or egregious BLP violations, or full of probable falsehoods, or anything like that. That's cheering.
 * Definitely sucks


 * Probably sucks
 * Alan Taylor (racing driver) is basically a BLP with no refs that devolve, so it's not allowable on those grounds. Should go on that ground. Nice looking article with nice table and infobox and some paragraphs, but Taylor never won a race and is very obscure, and there's a mention of his restaurant so its maybe a little promotional. It's not much of an asset to the encyclopedia.


 * Possibly sucks
 * Loren Stuckenbruck is probably "on the ice". He's a professor at a proper big university and he's written a bunch of books and stuff (on extremely obscure, but very intellectual, topics). It's not like he's a hobo. It's not an unref'd BLP because there's his faculty bio (and that's all there is). Article is not hurting anyone, but if he's in then most any publish-or-perish professor is in (which might be OK). It's not a huge asset to the encyclopedia.
 * Kanichee layered intrusive complex is probably "on the ice". It is "is a layered intrusion in Northeastern Ontario, Canada, located in the central portion of Strathy Township about 6.5 km (4.0 mi) northwest of the town of Temagami", which it is hard imagine anything more obscure. It's two sentences and probably always will be (there is one ref). On the other hand it is "scientific", if that matters. Geocruft, maybe, but some people like science stuff. It's not a huge an asset to the encyclopedia.
 * Literacy in Tokelau might suck. It's "on the ice" because it's only ref is one primary source, and there might well be no other refs easy to get. It's an interesting and useful article for all that -- if you're interested in the subject. On the other hand 1,500 people live in Tokelau, so who cares actually. More people live in Otis, Massachusetts and we don't care how many of them can read. But Tokelau is a country and Tokelauan is a language, so maybe it's different. Possible mergebait. Article averages one reader per day, and we suppose you could say that that that's too small a readership to bother servicing. It's not a huge an asset to the encyclopedia.
 * SLATES. Made up term, and while there are people who use the term, according to Google, it's not a lot of people. Article is poor and poorly ref'd and quite possibly "on the ice", although maybe not. Arguably businesscruft. Gets nine views a day, probably mostly people wearing suits. Whether we want to cater to the wearing-a-suit demographic or if we'd rather leave those types in the dark is a matter of opinion.
 * Miss Orlando. It's probably "on the ice". It's essentially a list of winners with some other info. OK article, interesting enough if you're into this stuff -- if the info is true, which we don't know due to lack of refs. Probably sucks since we don't know if it's true.


 * We don't think it sucks, but we might be in the minority
 * Mitrulinia. We'' don't think it sucks, but we like science stuff. On the hand it is two sentences, clearly does not meet the GNG, has no SNG protection that we know of, and averages one reader a day (probably mostly Poindexter types to be brutally frank, if that matters), and if we're looking to make the project more manageable pages like this are possibly a good place to start. If many people wanted articles like there would at least be an SNG. If there is one we haven't found it.
 * ''Venusia sikkimensis. See Mitrulinia. Arguably biocruft.
 * Virachola isocrates. See Mitrulinia. Arguably biocruft.
 * ''Gogana conwayi. See Mitrulinia. Arguably biocruft.
 * Coriolano Vighi. We have no idea. 19th century Italian painter. Do we want to document extremely obscure 19th century painters, or not. Does not have an article on the Italian Wikipedia. Probably "on the ice". Article doesn't say much useful about Vighi but did not offend me either. Might suck. We're partial to art history so we're OK with it.

We assumed all the articles that meet or might meet the GNG, or are SNG-protected articles like populated-place articles, don't suck. This means that In the Flesh (Pink Floyd tour) and 1977 in video gaming and Southridge Mall (Iowa) and Hell Is Empty and All the Devils Are Here and Caterina Scorsone and so forth automatically don't suck.
 * Probably don't suck

There were two articles that were unref'd but we still don't think they suck:
 * Lynfield, New Zealand has no references whatsoever but it doesn't suck, it's a well-done article, town has 9,000 people so it's no hamlet, and it's all almost certainly true. List of people on the postage stamps of Nigeria has no references whatsoever but it doesn't suck, it's a nice looking list and is most likely all true. At any rate, it's not covered by the SNG because the article doesn't prove it exists, but assuming it does exist that bare fact ought to be demonstrable.
 * List of people on the postage stamps of Nigeria is a harder case. "Nigeria", "postage stamp", and "human being" are all important concepts; whether the intersection of those three is worth knowing about is debatable, but it's part of Nigerian history so maybe. Ref'ing this article might be difficult, but we doubt that the writer just made up the material so there are probably refs somewhere. Scott's stamp catalog or whatever.

Ryo Fukawa -- the article as it stands pretty much sucks. It's a tiny article and there are no refs except his website, and it makes him sound obscure ("On television he generally takes minor roles"). However, we feel its worth making an exception in his case because his Japanese article is is extensive (and includes a section titled "Hairstyle" (!)) and has 20 refs, although it is tagged for poor sourcing. And he has many albums on major labels. So while the article does suck, we wouldn't say it should be deleted -- it just needed to be tagged with Expand Japanese (unless the case is made that he's only notable in Japan (apparently true) and doesn't rate an article in the Anglosphere wikipedia (but the GNG doesn't say anything like this, and all WP:N has to say is "Sources do not have to be... written in English").)

Caricature (comics) has no references whatsoever but while "out in the cold" it is not "on the ice" -- we checked, and it is probably ref'able: here is review at the bluelinked The A.V. Club so you're halfway to the GNG right there. (We don't know if the A.V. Club is reliable, but a review is an opinion, and all entities are reliable sources for their own contents.) Article itself is not crap, short but OK article, assuming it is true. The author has an article and even his own navbox (he wrote Ghost World), so that helps with importance if not notability.

Anders Beggerud is probably "on the ice", but he was director of the Norwegian Press Directorate in Quisling Norway (and got in trouble for that later), so it's not like he's the guy at the Pump-n-Pay down the corner. Article is five sentences. Pretty obscure, but a bit of stretch in our view to say it sucks. Article is not hurting anyone. Arguably Norwaycruft.

Ballyroney railway station. Defunct rail station is "on the ice" probably. Beyond "Ballyroney was one of the principal loading points for cattle bound for the ports of NW England" (unref'd) it's railcruft. Nice picture and layout though, lots of railcruft details. We enjoyed looking at it the picture and reading the article, so We'd be hard-pressed to say it sucks. People like railroads. Although it doesn't really bear on the question of whether the article sucks or not, but even though there's no SNG for train stations, clearing out all these station articles would be a herculean task and the howls would be heard in Hades, so there's little practical point it considering it. I guess you could figure there's a de facto SNG for stations, if you want.

1976 Austrian Grand Prix. Motor race. Nice-looking article, doesn't offend us. Needs refs.

All the other we think are OK or meet the GNG or an SNG.

The 100
Note on terminology: "404s" means the linked website does not load at all. "Doesn't devolve" means the website loads but doesn't devolve to the intended article -- such as when a link a newspaper article just shows today's front page, for instance (this typically happens if the link has been moved or deleted within the website). In the first case the link is possibly available through an internet archive, in the latter by searching within the website, but we didn't go to this level of effort.