User:Dudemanfellabra/UpdateNRHPProgress

This script is used to update the statistics on the WP:NRHP Progress page. Any questions about its output or inner workings should be addressed to User:Dudemanfellabra.

How to use it
Anyone can use this script by adding to the bottom of their personal JavaScript page. The script generates a button at the top of the Progress page which reads "Update Statistics". Clicking on that button will start the script, which will then check all ~4000 lists under the scope of WP:NRHP for statistics on total number of listings, total number of pictures uploaded, total number of articles created, and quality statistics (e.g. number of stubs, start+, untagged, etc.). The script and automatically adds the updated statistics to the Progress page and saves the output at the end.

How it works
When the "Update Statistics" button is clicked, the script does the following actions:
 * 1) Extract the wikitext of the Progress page using the Wikimedia API. If an error is encountered at this stage, the script aborts itself.
 * 2) After extracting the wikitext of the Progress page, the script fires off in rapid succession asynchronous API queries for the wikitext of each county list.
 * 3) When each wikitext query completes, the script extracts instances of NRHP row to obtain a total number of listings for that county and gathers information about whether or not each listing is illustrated (i.e. whether or not the image parameter is non-blank). If any error is encountered at this stage, a fatal error is triggered (explained below), and the user is asked to skip the county that produced errors or try to query the county again.
 * 4) Next, the script fires off asynchronous queries to find out if each article is a bluelink (Note: Links to disambiguation pages are counted as redlinks) ; the articles are grouped into batches of 50 here to reduce the number of API calls. At the same time the script checks if a page is an NRIS-only article and records that information.
 * 5) After all queries for each batch of articles are complete, the script queries the talk page to find out quality statistics (Stub-class, Start+, etc.); the articles are here again grouped into batches of 50 to reduce strain on the API. (Note: If a listing links to another list of NRHP properties (e.g. a MPS), the listing is counted as unarticled. If a listing links to some other type of list (e.g. a list of contributing properties to a historic district), the link is counted as Stub-class.)
 * 6) After the above steps are completed for all of the ~4000 lists on the page, the script totals up the statistics for each county with sublists, each state, and the entire nation, taking into account the duplicate information found at WP:NRHPPROGRESS/Duplicates, adds this information to the previously-fetched wikitext, and edits the page with the newly generated wikitext. After the edit is completed, a diff link is generated, and the script exits.

Explanation of error messages
To prevent the script from writing gibberish to the Progress page, if at any time during execution the script encounters an error while parsing the wikitext of a given list, it will ask the user how to proceed. The script will identify the problematic list, and the user will have the option to make the script retry that county (in case of connectivity/unknown issues) or to skip the problematic county in favor of later manually updating it. The following error messages may be encountered:


 * Error: No county section found for LISTNAME! – This error is only triggered for links from the Progress page which are redirects. It means that the script was unable to find a section located at the target of the redirect link specified. The most likely cause of this error is that the redirect does not point to a specific section on the page. For example, the link for National Register of Historic Places listings in Autauga County, Alabama (a redirect) points to the section National Register of Historic Places listings in Alabama. If someone has edited the state list to read something different (e.g. just "Autauga"), the script will produce an error. To fix this, simply change the redirect to point to the correct section or change the section title to match the redirect (e.g. for the above example, one would have to change the redirect to point to National Register of Historic Places listings in Alabama or change the section name to read "Autauga County" to match the redirect). Usually the former option–changing the redirect–is preferable to changing section titles, which may lead to edit warring.
 * Error: No table found for LISTNAME! – This error means that there was no table found at the location of the link specified. This is the least common error, and should only trigger if someone has vandalized a county list by blanking it or removing a large chunk of random code, or someone has vandalized the Progress page itself so that the link there does not point to the correct page. To fix it, revert the vandalism and retry the county.
 * Error: Incorrectly formatted table for LISTNAME! – This error means that a table was found at the target link but it does not seem to include a list of sites on the National Register, meaning it does not use the NRHP row template. Like the previous error, this one should only be triggered if someone has vandalized the county list or the Progress page itself. To fix it, revert the vandalism and retry the county.

Other errors
If any other error other than the above is encountered, the script simply retries the query after a brief pause. The most common error encountered is the API returning a warning that the "rate limit"–the number of API queries per unit time–has been exceeded, so the script should slow itself down. If too many of these errors are encountered, the script throttles itself by increasing the gap between each subsequent API query. At the end of execution, these less serious errors are written to the JavaScript console for examination by the user.