User:Periglio/Wikidata

An RfC has come along proposing to do away with Persondata. As someone who has invested a lot of time in using Persondata I am in principle against this proposal. Wikipedia:Village pump (proposals)#RfC: Should Persondata template be removed from articles?

The alternative of Wikidata is certainly the way forward, but my impression is that it will not be kept up to date. This is just a gut feeling, so I have produced statistics to see the actual position.

My database extracted from Persondata
I have a personal database of every article that holds a Persondata template, currently 1153424 records. For easier handling, each record is randomly allocated to one of 255 blocks. I have randomly chosen block 177 for my analysis, which contains 4548 records.

The current status of this block is 1226 Complete living Complete = full date of birth 514 Complete non-living and full date of death 204 Validated All checks pass, but accurate birth/death dates are not known 2604 Various errors

Apart from extracting the birth and death dates, my software does various validations, such as categories not agreeing with the data. 2604 seems a high percentage, but the majority of these are minor such as no place of death but not in the missing place of death category. Note, dates that are just a year are counted as not known.

Comparison of Persondata and Wikidata
I took all the articles in block 177, and looked up birth/death dates on Wikidata. 96 Birth dates do not match between databases 141 Birth date only on Persondata 187 Birth date only on Wikidata * 424 in error (somewhere) 40 Death dates do not match between databases 47 Death date only on Persondata 125 Death date only on Wikidata * 212 in error (somewhere)

Out of the 4548 records analysed, 3140 records have a full date of birth in Persondata, Wikidata or both 1344 records have a full date of death in Persondata, Wikidata or both

The conclusion is that where a birth/death date is known, 13% of births and 15% of deaths show a discrepancy between the two databases.

The full list of discrepancies are listed below. My plan is to correct these and make a note where the fault lies. This should give an indication of work required to make Wikidata reliable.

Results
I manually check the article to see why there is a difference and make a comment. The correct column shows which database was correct, P = Persondata, D = Wikidata, ? = unknown due to conflicting information.

Conclusion
My main conclusion is that as far as birth/death date completeness and accuracy goes, there is no significant difference between Wikidata and Persondata. If you want to find a date, both offer a 90% chance of it being available and accurate. I personally have made the decision to switch to utilising Wikidata as I believe it has the potential to improve, whereas Persondata keeps being threatened!

Faults with the Wikidata database
On the assumption I will be working with Wikidata from this point on, I will list the problems that currently exist and that I hope to be helping to address.


 * Wikipedia is the main source for Wikidata, but nothing seems to be in place to keep Wikidata synchronised
 * There are Wikipedia articles with birth/death dates that have not yet been added to Wikidata
 * There are Wikipedia articles with revised birth/death dates which are not being updated on Wikidata
 * Articles that are deleted on Wikipedia (i.e. not-notable) leave behind an entry on Wikidata


 * Wikipedia articles have problems that are not flagged. Wikidata picks up a date that is not necessarily correct
 * Some Wikipedia articles have conflicting dates within the article
 * Some Wikipedia articles have different dates between different language wikipedias
 * Julian and Soviet calendars are not always been handled correctly


 * Birth dates are rarely referenced on Wikipedia. This makes it impossible to correct any conflicting information. I raised this at WikiProject Biography but did not get much feedback.