Wikipedia:Wikipedia Signpost/2015-04-08/Op-ed

I used to think of myself as an inclusionist. I used to write articles. I still do, certainly. However, I recently came to a sad realization that I am spending less and less time creating new content, and more and more deleting things. Let me tell you a slightly worrisome story of how this came to be. From 2007 I have been regularly monitoring the list of new articles related to WikiProject Poland. This started as a (moderately successful) attempt to recruit people for WikiProjects I am involved in. Over time I sought to automate this process (reviewing all of those articles and reacting to them can take several hours each week). To this end I developed a few templates. At first, they were only invitations – to WikiProjects, DYKing, such. But looking at them now, a big chunk of my tools are paste-in prods for "ARTicles that are merely SPAM" (aka "advertisements masquerading as articles", ADMASQ), most commonly in the categories for biographies and companies/products. It is a sad testament of what I thought I would need (rewards and words of encouragement) and what I ended up needing (in essence, words of discouragement). I haven't kept specific numbers, but for the past few years, at least, each week I have to prod/AfD articles, whereas I use my WikiProject/DYK invitations maybe once or twice.

Not all of my deletion nominations come from the new article reports. In fact, if I was just limiting myself to those, I would not be here, calling for your attention. Few year ago I started to realize that many of the articles I prod/AfD share similar topics. What's common? Biographies (primarily artists failing WP:CREATIVE). Music bands, songs, and tours failing WP:MUSIC. But I can stomach them, perhaps it's what remains of my inclusionist sentiment – I will prod those articles with no mercy, but the poor fame-starved artists are not whom I want to draw your attention to. No, we have a bigger problem, or – perhaps, a fatter, juicier and more problematic target. Those of you following The Signpost for a while know well the recurring theme of paid editors and promotional advertising of products and companies. I personally don't have a problem with paid editing if our policies and guidelines are respected. Unfortunately, they – namely, Notability and its child-guidelines- are not. I would go as far as to say that in fact they are rampantly disregarded. They are disregarded by WP:VANITY-seeking individuals, but even more so, by those creating articles about products and companies (and here I sadly have to concede that majority of such articles are almost certainly a work of people who were paid to create them).



What I am looking at right now are several categories: Category:Business software, Category:Websites, Category:Law firms, Category:Internet marketing companies, Category:E-commerce. They are gateways to many related categories, and I estimate that they are filled with lots of spammy articles. Let me now define spam in the context of this op-ed as advertisements masquerading as articles (in short, artspam) rather than external links spamming. The latter is more easily identifiable through automated tools, and WikiProject Spam and others seem to be managing it well enough, as far as I can tell. What I am concerned with is the former: articles that fail notability criteria, aiming to promote a certain topic, not (only) through biased wording, but through their very existence ("I/we/our product is/are on Wikipedia, hence we are important/respectable/famous/encyclopedic").

I said, now, that those categories are filled with lots of artspam. By that I estimate that between 25% to 75% of entries in them would not survive PROD/AfD. And those are not the worst categories; I am afraid they represent an average of hundreds of categories related to companies and certain types of products (websites, software, etc.). After a while – having reviewed hundreds of such articles – you learn to recognize patterns. Few are created by editors active across numerous topics. Most are the work of single purpose accounts; either ones focused on a single article, or a group of them. A small percentage are so bad they classify for near-speedy deletion (zero references, for example) – but those are rare, as the proverbial low-hanging fruit of deletionists they don't survive long. Through just few days ago I stumbled upon an unreferenced product stub from 2005, so... Distressingly, in the last year or so I have noticed a significant proportion (<20%) of problematic entries as having passed through Articles for Creation process or similar.

This leads me to conclusion that (as observed by some prior research on the subject) many Wikipedians (even myself) often pass quick assessments of articles by looking at the reference list. If there are many references (bonus points for being formatted), we check the article as "probably ok" and move on. This is a problem, because while understandable (we are all busy), many sources fail the reliability requirements, while others mention the company just in passing (notability requires in-depth coverage) and this is a trick that artspammers have learned to use against us – and it appears, very successfully. Most of companies and product pages I nominate for deletion have several, if not dozens of inline references. Many are to their own pages (in other words – self-published), but quite a few are masked better. It is quite common for slightly smarter artspammers to use other websites – such services are cheaply offered by various PR companies, who maintain extensive portals filled with dime-a-dozen press releases such as PRWeb, many of them are distributed through news sites and appear in search engine results, giving them a surface appearance of legitimacy.

Here's a case study. "www.reuters.com/article/" looks nice, until you notice the literal small print: "Reuters is not responsible for the content in this press release". The article about the associated product, Faircoin, had been deleted twice so far. Those articles are often rich mines of bad sources: I have seen everything from twitter, youtube, facebook, irrelevant awards (another PR trick), to numerous blogs and the myriad of low-key promotional websites masquerading as professional press. Such websites might sport names that imply reliability, but usually are quite WP:QUESTIONABLE. Those, at some point, transition into reputable sources (magazines maintained by professional associations), but with tens of thousands of websites out there, it's a pain to figure out which are good and which are bad (i.e. ones that do fact checking and have editorial oversight from ones that will publish anything for few dollars); we desperately need more initiative like the few found in Category:WikiProject lists of online sources. For now, however, thousands of articles about organizations or products linger in the mainspace, sustained by nothing but false appearance of being well referenced, well defined as nothing but numerous. Even worse, we have even worse articles – ones that clearly sport no reliable references (usually referencing their own websites), or ones with no references at all. Notability may be "just" a guideline, but Verifiability is a policy. Yet it is a policy with patchy enforcement, and numerous artspam entries survive happily with no reference to speak of.

What's the scope of this problem? Category:All articles with topics of unclear notability has about 63,000 entries, but less than 20% (and that's a generous estimate) of articles I prod/AfD have it; ditto for the nearly 18,000 of articles with a promotional tone. Of course, not all categories have similar levels of artspam, but I am afraid that we are looking at a number of up to, maybe, 300,000 such articles. Now, this is a napkin type calculation, based on extrapolating from the few, very rough, statistics presented here ("if out of five artspam articles, only one is tagged as such, and we have about 60,000 tagged ..."). Yes, I am well aware not everything with a notability tag on it will fail notability once some research is done, but if, let's say, just about a half will, then the napkin equation ends up with 150,000. That's something like 3% of our total articles. Even if I am grossly exaggerating this, and we just have few thousand entries to clean up, this is a significant number – and there's no way the few of us working on this can make any sizeable dent in this amount of artspam. Worse, I am afraid we are losing – our backlog in just notability topics goes seven years and the one for promotional tone is about the same. If you think that we are doing better with unreferenced content, the backlog for Category:Articles lacking sources goes back to 2006 and lists over 200,000 entries (including over 2,000 in Category:All unreferenced BLPs)!

This shouldn't come as a surprise. Artspam, by its very definition, is about things nobody else cares about; it is advertising. Neither experienced editors nor newbies visit such pages often. They are underlinked, hidden in the dusty corners of our project, with the scope of the issue only visible on few cleanup backlogs, or during category reviews. Many die early, when they are spotted by a recent change patrollers, but those that survive the first few weeks can feel pretty secure, particularly if (counter-intuitively) they were created by a SPA whose further actions won't draw scrutiny to their prior creations. In short, by their lack of encyclopedic value and obscurity they become the proverbial bugs not seen by many eyeballs. And so they linger, bloating numerous categories which are quietly becoming little but business and product listings with little concern for notability.

Enough is enough, I say. It is spring of 2015. Wikipedia has been gravitating towards a vehicle for business and product promotion for too long. We need a major artspam cleanup drive, a literal purge of promotional articles, and a push for development of tools and frameworks to stem the tide of such articles in the future. Perhaps something similar to WikiProject Unreferenced Biographies of Living Persons, an effort which a few years back cut down the number of unreferenced BLPs from 50,000 or so by more than a tenfold.

Either way, it is high time for some spring cleaning. Please help out, go to a category for a business type or product of your choice, and start enforcing notability, with fire. Prod an artspam each day, and save this project, before we become a Yellow Pages clone, with a small encyclopedia attached to it.




 * Piotr Konieczny is a Polish sociologist at Hanyang University in South Korea, specializing in Internet studies and wikis. He edits Wikipedia as .
 * The views expressed in this op-ed are those of the author alone; responses and critical commentary are invited in the comments. Editors wishing to submit their own op-ed should look at our opinion desk.