User:Dsp13/matching from list of people by name

Wikipedia / LC Name Authority matching - earlier effort
An earlier effort to match Wikipedia biographical articles with the LC name authority file used the List of people by name on WIkipedia to produce the tables linked to below. In these tables, a match with the LC name authority file was only returned if it was a unique match for the name (in which case it was labelled ExactMatch) or for the name and at least one birth/death date (in which case it was labelled ExactMatchWithDates). Between a third and a half the people in Wikipedia appeared to be matchable to the LC Name Authority file. For example, of the 191 people with names beginning with Z, 93 (just under 50%) were found.

There were several caveats with these tables: Dsp13 12:09, 27 July 2006 (UTC)
 * 1) The list of people on Wikipedia I used was not quite up to date: it was taken six months ago from a mirror site copy of List of people by name, after I had problems parsing Wikipedia's XML download. This list was incomplete: in fact, it only included around 15% of those categorized by date of birth.
 * 2) Unicode wrinkles meant that the LC recommended name form column in the tables had a little unicode garbage in it
 * 3) Within each table, names were mostly in alphabetic order. They were, in particular, mostly locally in alphabetic order. But some were not quite - e.g the table for W.
 * 4) Recent changes in the OCLC name authority service meant a little loss of flexibility in date search

A    B     C     D     E     F     G     H     I     J-manually corrected     K     L     M     N     O     P     Q     R     S     T     U     V     W     X     Y     Z

Dsp13 02:43, 17 July 2006 (UTC)

Earlier effort at automatic matching: manual check of names starting with J
I manually checked the results for the case of names beginning with J. (Since there are plenty of common surnames beginning with J - Jackson, Johnson, Jones etc. - I do not think that this is an unrepresentatively easy sample.)

Out of 640 names taken from the List of people by name (see caveats above), 357 (or 56%) were automatically matched against the LC name authority file. Manual checking of the matches found was a little disappointing: 42 (or 12%) were erroneous.

I marked manual changes in the corrected table for J, although this corrected table does not show the names which needed to be manually removed because in fact the people were not in the LC Name Authority File at all.

The 42 erroneous matches are displayed in the table below. There are three kinds of error, requiring three corresponding kinds of manual repair:
 * 1) Disambiguation. The need for this arises when the name in the List of people by name does not point to an individual biographical page but only to a disambiguation page. (This is essentially an internal Wikipedia problem, rather than a problem arising from record linkage per se. Such cases could be automatically removed from the table by removing links to disambiguation pages.)
 * 2) Correction. The need for this arises when a LC name authority record for the individual concerned exists, but the automated matching finds another one.
 * 3) Removal. The need for this arises when there is no true LC name authority record for the individual concerned (although the automated matching has mistakenly found one.)

ExactMatchesWithDates are far more reliable than Exact Matches: of the 198 searches which produced Exact MatchesWithDates, there are only 4 (2%) errors produced. Three of these errors are disambiguation errors; if these are taken out of account, we have a very respectable 0.05% error rate.

Moved from User:Dsp13 by Dsp13 19:42, 9 August 2006 (UTC)