User talk:Quale/List of chess grandmasters

The List of Chess GMs Manifesto
To improve the maintainability and sourcing of List of chess grandmasters:
 * Updates to List of chess grandmasters will be supported by easy to use and reliable tooling.
 * Updates to add new GMs will be accurate and complete and require only a few minutes per month.
 * The tools will use FIDE data to automatically identify any living GMs missing from the GMs list and produce correctly formatted table entries that are ready to paste into the article.
 * To the extent that is practical, tooling will be developed to verify entries in the table. (Some table data may be very difficult or impractical to check automatically such as date of death.)
 * Tools will be built on free software.
 * Code will be publicly available to reduce duplicated effort and allow the community to inspect for correctness and make improvements.
 * Tools will be usable by non-programmers, although some computer aptitude may be required.
 * Every GM entry will be reliably sourced and non-specialists will be able verify by hand if necessary

Problems with the grandmasters table in April 2019

 * FIDE can mint 50 or more new GMs every year, so the page requires a lot of updates.
 * All update work is done by hand. I think some editors have used some tooling to help, but I don't think we have a systematic and reliable method to make sure we keep the page up to date.  (My guess is that a few editors have searched the monthly FIDE ratings lists to find players with GM titles and compared to our article to find missing entries.  I've done this myself, but it's still a lot of manual work. Also we haven't shared any of our automation, so we duplicate work.)
 * Some columns are simply not maintained such as "Most recent federation". Changes in federation are just more updates that need to be identified and performed = more work.
 * Sourcing is unclear. A few years ago someone decided to add "(FIDE)" after the names of the living players to indicate that FIDE was the source for the GM claims.  This is ugly (it is not part of the player name and does not belong there) and in some cases untrue, as FIDE is not the source for death dates and for many player entries FIDE wasn't the source for some of the other information either.
 * Entry coding is inconsistent and makes it hard to mechanically extract reliable information from the table, for example to convert it to a CSV that someone could use for another purpose. (This refers primarily to the coding to allow proper name sorting and to link names to bio pages, but the "(FIDE)" crap after the player names also mucks this up.)
 * The information provided is insufficient to allow reliable mechanical comparison to FIDE data, and the inconsistent entry coding makes this still harder. For example, if FIDE ID were provided in the table it could provide both a source (as a link to the FIDE Chess Profile page) and provide a unique key to allow a programmatic or manual match against FIDE data.

Player name transliterations
Should the GMs table use the FIDE names of record (ASCII), "correct" names with full diacritics, or something in between? Verdict: Follow WP:NCP and use the title a Wikipedia bio article would use. Really this seems the only acceptable choice.
 * Use FIDE ASCII names
 * Pros: easiest to search and verify against FIDE records
 * Cons: ASCII in the 21st century is awful and widely distant from preferred transliterations
 * Use "correct" names with full diacritics
 * Pros: the closest possible representation to the native name
 * Cons: often not the form commonly used in English-language sources (WP:COMMONNAME); can be harder to search; inevitably leads to regional and national fights over the one-true-transliteration for a particular country; some transliteration is always necessary since we don't use Cyrillic, Arabic or even full Icelandic orthography
 * use the Wikipedia bio article title
 * Pros: closer representation to native orthography while comporting with English-language sources
 * many GMs do not have a Wikipedia bio, slightly harder to reconcile to FIDE records; some regional fighting over correct article titles occurs but this would be inevitable no matter what we do and can be resolved on the individual bio articles

Name order
Should names in the table be "First Last" ("Rober Fischer", current practice) or "Last, First" ("Fischer, Rober")? (I'm not referring here to questions about Eastern name order vs. Western name order. Those are somewhat case-by-case, so for exmaple the English convention for Hungarian names is to always use Western name order, but for some Indian names the name order used in English varies without any obvious rule.) Verdict: no obviously superior choice, but I'd like to try "Last, First". As a programmer I think a single fact is almost decisive in favor of "Last, First". The "Last, First" order can be reliably and mechanically transformed into "First Last", but the inverse is not possible.
 * Use "First Last"
 * Pros: continues our current practice; most natural way to read names; can agree directly with article titles often allowing unpiped links; possibly helpful for names using Eastern name order
 * Cons: sort is implicit and non-obvious especially with the combination of Eastern and Western name orders and with non-binomial names (Spanish naming customs)
 * Use "Last, First"
 * Pros: sort order is explicit; easier to distinguish given and family names which 1) makes it easier to distinguish names using Eastern name order, 2) can be helpful to humans, and 3) helps omputer programs trying to mechanically process the table data; closer match to FIDE records may aid verification
 * Cons: breaks with our current practice; less natural way to read names; will always require piped links; might still require explicitly coding sort keys if table sorting for Unicode doesn't work well [research item: explicit sort keys are/were needed for non-ASCII category sorts, is it needed for the table sorting done in the client browser?]

Presentism vs Historical Record
Before presenting arguments I should say that I think WP:RECENTISM is a serious problem, so I may weigh historical interests more heavily than others would.

Present vs. Past really obviously only matters for info that has changes after the player earns the GM title, and in fact it only affects two columns in the GMs table: Federation and Name. I don't think we have to use the same answer for both columns, but there is a consistency argument to be made in favor of it.

When we add FIDE ID to the table we may find a small number of GMs have had more than one ID because FIDE. (Yes, it's difficult to understand why FIDE did this.) This would raise the question whether to use the player's FIDE ID at the time the title was issued or to use the most recent ID.

Federation changes
Verdict: Although I'm not adamant about this, my choice is federation when the player earned the title. I fully recognize that this might not be the majority opinion.
 * Use most recent federation
 * Pros: if we don't, drive-by editors will update the federations to "fix" them anyway; some readers will expect that the federation is current; supposedly a continuation of current practice but in reality the "Most recent federation" column is an unmaintained disaster
 * Cons: extra maintenance burden and churns extra table updates; it isn't maintained now, why do we think that will change?; not obvious why the final federation of a deceased player is most important; loses historical information of federation when the player earned the title (the current federation for living players can be found at ratings.fide.com)
 * Use federation when the player earned the title
 * Pros: reduces maintenance burden and article churn (never needs an update); retains historical information
 * Cons: if the player changes federations then drive-by editors will "fix" it; some readers will reasonably expect that the current federation is listed; current federation of living players is much easier to verify either by hand or mechanically; (sort of) not the current practice and change is scary
 * Two federation columns, title earned and current
 * This idea is so horrible I won't comment further

Name changes
Name changes may come up in a few different cases: name change due to marriage or divorce, preferred transliteration changes over time (often with Russian including player moves, e.g Aleksandar Berelovich (RUS) => Alexander Berelowitsch (GER)), or the player chooses a name change (think Cassius Clay). Verdict: Really have to follow Wikipeda naming conventions for people I think, using the name people would expect today rather than when the title was earned. It's a harder decision perhaps with cases such as Susan Polgár as she was widely known as Zsuzsa Polgár. Since name changes can be a big deal, perhaps the table should have a Comments column which could note name changes. There wouldn't be too many.
 * Use name and transliteration that would be used in a Wikipedia bio WP:COMMONNAME
 * Pros: probably expected by most users
 * Cons: small amount of extra maintenance; slight loss of historical information although it's reasonable to suggest that the bio article should be consulted for detail on name changes
 * Use name and transliteration at the time the title was earned
 * Pros: less maintenance; possibly slightly easier to verify with FIDE records (although this is far from certain); retains a small amount of extra historical information
 * Cons: confusing when name is changed for any reason

Recent sources
FIDE is a reliable source for most of the data we need. FIDE provides ratings lists for download at https://ratings.fide.com/download.phtml. The ratings lists have been provided monthly since August 2012 in an easy-to-process XML format, older lists. FIDE ratings data is rather poorly maintained but we are probably OK since we care only about a small subset of the ratings list, the GMs.

The FIDE Chess Profiles on https://ratings.fide.com site can also be web scraped to get some data not published in the ratings lists such as title year. The Chess Profiles also contain links to the title applications which can give full birth dates (the ratings list give only birth year).

Wikipedia is not a WP:RS, but it might provide a machine readable source for some table maintenance. For a time the German-language page Liste der Schachgroßmeister was more complete than the en-wiki page, and they helped identify several errors and omissions in our table. It is difficult to compare the en-wiki page to the de-wiki page since FIDE ID is not included in either table. Matching on names is hard because the German transliterations vary from both the English wiki spellings and from the spellings used by FIDE. I hope to correct the omission of FIDE ID in the en-wiki page, and maybe de-wiki will follow suit.

It could be possible to mechanically scan Wikipedia chess player bios for birth and death dates. As things stand now, DOD is only entered if an editor notes that a GM has died and thinks to update the GMs table. A tool could be written to try to extract that information automatically. The German wiki may have GM bios lacking in the English wiki, so it's worth considering it as a source as well if the matching problem can be solved. (Perhaps the pages include the FIDE ID somewhere that can be used to match, but a quick look at de-wiki didn't suggest that this is the case.)

Older sources
Chess Informant compiled historical ratings lists files that were scanned and OCR'ed from magazines from 1971 to 2000. Mark Weeks obtained permission from FIDE to make the files available at https://www.mark-weeks.com/chess/ratings/, but they aren't officially endorsed. It scarcely matters for the problem of identifying GMs since the ratings lists from 1975 to 1998 do not include player titles, but the old ratings lists can be helpful to determine player federations. Arpad Elo's book The Rating of Chess Players includes all GM title awards up to January 1978, leaving a 20 year gap.

I think that exhausts the sources suitable for automatic processing by computer program. Online and print sources will be needed for Date of Death. There are a few print sources that are definitive and all-inclusive at the time they were published:

In most ways Gaige 1987 subsumes Elo 1978 since it has about 9 more years of player titles. But Elo has one big advantage over Gaige for our purposes, namely that Elo gives tables of all GM and IM titles issued by FIDE from 1950 to 1978. Gaige has title information scattered across 14,000 individual player entries. Once a GM has been identified Gaige is invaluable to verify title year, DOB and DOD, but it is less helpful in compiling a complete list of all GMs up to 1987 from scratch. Di Felice is the same, although it is a bit shorter.

Citing sources
Unclear sourcing is one of the top two most important issues with the current GM list. (The other is difficulty using mechanical assistance for updates and verification.) There are over 1800 GMs with the number growing every month, and every entry needs one or more sources. There are several possible ways to improve, although it isn't clear which ones might be practical and effective. Concerns include impairing table readability, hindering machine processing of table data, greatly increasing table size which would slow page loading and rendering times, and difficulty of maintenance.

Some possibilities, not all mutually exclusive: Verdict: not sure. Have to try some of these possibilities to find something workable. Initially might try putting coded cites in a Comments column.
 * Include a FIDE ID column linked to the FIDE Chess Profile page to cite title year, birth year and federation. This could work well for living players, although DOB (month and day) would need a cited source as would DOD for deceased GMs. If federation at time of title is used instead of current federation then that would need cites as well.
 * Would links to WP chess bio pages satisfy the need to cite DOB and DOD? Generally this is not sufficient, but perhaps it would be satisfactory for a list to avoid the need to add perhaps 1000 or more citations.
 * Could the list have a blanket citation explanation such as "Except where noted, all title years, DOB and DOD through 1985 are given by Gaige 1987", or "... through 2016 by Di Felice"? Normally this might also be unsatisfactory, perhaps because page numbers are not specified for each bit of data.  In this case it seems maybe this could be excused because Gaige is organized in the manner of a biographical dictionary with individual entries for each player in alphabetical order.  We would have to watch for cases where we use a spelling different than Gaige, and a few rare cases where Gaige contains inaccurate information.  These might be noted in a Comments column.
 * Use inline cites in each cell or add one or more columns to hold cites for the entry? I don't favor inline cites in the data cells mostly because they can greatly complicate machine processing of the table data. Some possibilities to separate the cites include using a single column, perhaps in Comments, or to create separate cite columns for Title Year, DOB, and DOD.
 * If blanket cites covering large parts of the table aren't sufficient, perhaps an abbreviated cite code could be used in the comments column or separate columns. For example, simply 'F' if cited to the FIDE Chess Profile, 'G' if cited to Gaige 1987 and 'DF' if cited to Di Felice.  Not sure if something such as 'TA' for title application might be useful (title applications can help with DOB and federation).  TA could be linked to the FIDE title application, and if needed page numbers could be added to the print sources G and DF.

Proposals
Proposals concerning the GMs table columns and data.
 * Add a FIDE ID column. For living players this should be a link to the FIDE Chess Profile page for that player providing a source for the entry.  It also makes matching against FIDE data easy. FIDE has changed the FIDE ID assignments at least once and maybe twice, so some GMs who earned their titles long ago may have had two or three different FIDE IDs.  (FIDE changed the assignment scheme at least once, and in the past sometimes reassigned ID numbers if a player changed federations.)  For living GMs it makes sense to use the most recent FIDE ID since this can link to the Chess Profile.  For deceased GMs who may have had multiple FIDE IDs it isn't as clear cut what to do.  But once we've used the FIDE Chess Profile as a source or to verify some information we probably should stick with the most recent ID.
 * Add a Sex column. There is interest in finding female GMs and FIDE provides this information.
 * Put honorary GMs in a separate section instead of in the main table. There are only a small number of these titles and FIDE has stopped issuing new ones.
 * Put the two players who had their GM titles revoked in a separate section instead of in the main table. These require more explanation than is convenient in a table row and they don't belong with the GMs anyway.  (Especially consider mechanical processing of the GMs table, e.g. in CSV form when these entries are probably not desired.)
 * Consider adding a column for Birthplace and possibly Place of Death. This extra information would be of interest to some readers.  On the negative side, it would be extra maintenance, as with DOD we don't have a mechanical way to collect the information so it would require manual research, it would make the table wider (which is especially worse when viewed on small screens and mobile devices), and it would increase page size and loading times.
 * (Controversial) "Most recent federation" should be replaced by "Federation" with the meaning of the federation affiliation at the time the title was awarded. If the reader wants to know the current federation of a player they can check the WP player bio (if there is one) or look up the player's Chess Profile on http://ratings.fide.com.
 * (Maybe controversial) Use "Last, First" formatting for the names so the natural sort is more apparent. I'm not sure whether this would obviate the need for data-sort-value or if it would still be required to correctly sort Unicode or for other reasons.  If piped links don't sort as desired (e.g.   we could consider putting the links in a separate column called Bio that would have  .  This might also make it easier to reasonably web scrape the table for other uses, but it would need testing.

Tools implementation
These suggestions about the tooling are largely arbitrary. They reflect my own biases and are tilted heavily in favor of tools and techniques that I have experience with. There are other choices that would be just as good and possibly better. Development of one set of tools does not preclude other implementations that make different choices in technology or goals.
 * Implement in Python. Other languages would also be great choices, especially R, Ruby and Perl.  I find Python easier to use than those alternatives and I think it is also the most popular of the reasonable choices.
 * Use Jupyter notebooks for the code. I think they are a remarkable way to share code, and can make the tools easier to deploy and use as well.
 * Publish the notebooks on mybinder.org or collab.research.google.com.
 * Source repo on gitlab. Another reasonable choice is github.  In fact github is more popular, but I think there are reasons to prefer gitlab.  Anyone is welcome to fork the code and host it wherever they like, or to develop entirely independent tools hosted anywhere.
 * Work plan – there are two mostly separate work efforts:
 * Get all historical GMs into the table and prepare the list for program-assisted updates by including FIDE ID. This especially means making sure that all non-living GMs are in the list since the monthly update workflow won't identify them. It would be best if this work is recorded in a reproducible form, but since it only needs to be done once it's OK if the process requires quite a bit of manual work.  This work would rely on historical FIDE ratings lists and several print sources (Elo 1975, Gaige 1987, etc.).  FIDE provides official ratings lists for downloads since January 2001, so for the best source of machine readable data for earlier years are the historical ratings lists assembled Mark Weeks: http://www.mark-weeks.com/chess/ratings/.  Unfortunately FIDE ratings lists from 1975 to 1997 did not include titles, so they can't be used to find GMs.
 * Develop a simple workflow to identify and add new GM awards to the table. This should be mostly automated in an easy to use way and published to the public so that anyone can use it.  Input data could be either the Chess Profile pages from http://ratings.fide.com or the monthly ratings zip files from https://ratings.fide.com/download.phtml.  The Chess Profiles contain information such as title year that is not in the ratings files.  Most likely both sources will be used.

Wikidata List
&sum; 2038 items.