User:Mr.Z-man/analysis

This page gives an estimate as to how many problematic BLP articles the English Wikipedia has hosted.

The analysis here makes the following assumptions:
 * 1) The ratio of BLPs to all articles has stayed constant since 2005
 * 2) The percentage of BLPs that are problematic enough to potentially generate a complaint has remained constant since 2005.
 * 3) For every BLP that actually generates a complaint to OTRS, 1.5 more are problematic but unreported. (If the number of complaints is c, the actual number of problematic bios is 2.5c)
 * 4) Wikipedia's reach before 2005 was low enough that problems on BLPs were not nearly as significant as they are today.

Data
Based on a search of the OTRS queues for all tickets in the "quality" queue that were created between the beginning of July 2009 and the end of December 2009 and were not closed in a way that suggested they were spam or duplicates, Wikimedia gets ~6.6 complaints per day regarding BLPs. At the time of this search, Wikipedia had approximately 430,000 BLPs and 3,172,000 articles.

Historical data for number of articles is from Size of Wikipedia.

Analysis
Using this information, we can find:
 * % of articles are BLPs.
 * % of BLPs generate a complaint on any given day.
 * Because the rate of new articles has remained relatively linear since 2005, we can find a linear approximation for the number of "bad BLPs" per day:
 * $$B = 0.0083d + 1.9755$$
 * Where B is the number of potentially-complaint-inducing BLPs and d is the number of days since 1 January 2005.

Results
Integrating over this line gives us an estimate of the number of potentially-complaint-inducing BLPs since the beginning of 2005: 

Note that this is only a very rough estimate. Changing one of the parameters, such as the ratio of reported/unreported complaints can increase or decrease the final result by several thousand.