User:Monkbot/task 19: cite iucn update

Task 19 was originally conceived to update, from the IUCN Red List API, the 13,000 or so articles that use where url holds an old-form IUCN url. These articles are listed in.

There are several old-form urls (not all of these work): Old-form urls are considered 'old-form' because (when they work) they always point to the current assessment.
 * http://www.iucnredlist.org/details/22718564/all
 * http://www.iucnredlist.org/details/22718564/full
 * http://www.iucnredlist.org/details/full/22718564/0
 * http://www.iucnredlist.org/details/22718564/0
 * http://www.iucnredlist.org/details/22718564/
 * http://www.iucnredlist.org/details/22718564
 * http://www.iucnredlist.org/details/summary/22718564
 * http://www.iucnredlist.org/search/details.php/22718564/all
 * http://www.iucnredlist.org/search/details.php/22718564/summ
 * http://oldredlist.iucnredlist.org/details/22718564/0

Most of these old-form urls are used in templates that are found in the status_ref parameter of  and  templates (collectively hereafter 'taxobox') to support the values in the taxobox status and status_system parameters. Because values for status (IUCN uses the term 'category') and for status_system can be extracted or derived from the results of an additional IUCN API call, task 19 was expanded to support updating these taxobox parameters.

IUCN API
This task is generally slow. IUCN do not want anyone or anything hammering away at their API as fast as possible so task 19's calls to the IUCN API are spaced about 3 seconds apart. To accomplish this, the AWB Bots→Auto save→Delay setting is 3 seconds. This prevents task 19 from making edits that require only a single IUCN API call too quickly. For edits that require multiple IUCN API calls, task 19 imposes a 3-second pause before executing each IUCN API call after the first one.

IUCN API calls require a token. While the code for this task is published, the task's token is not. Anyone considering reuse of this code must obtain their own token; do not use the publicly available demo token.

Task 19 fetches data from the IUCN API in four forms; two of species data and two of species citations. These examples are for Anthus roseatus (the name) and  (the taxon id). The IUCN API returns for Anthus roseatus (name) and 22718564 (taxon id) are:
 * name:
 * taxon id:
 * taxon id:

The citation data returns are:
 * name:
 * taxon id:
 * taxon id:

taxobox updates
Task 19 confirms, updates, or adds taxobox parameters status, status_system, and status_ref using data extracted from the IUCN API. The IUCN API data are fetched using a binomial species name; task 19 does not attempt to fetch IUCN API data using the taxon id found in any existing IUCN references in the taxobox. For taxobox updates, task 19 attempts to get the binomial from various taxobox parameters: when the taxobox has none of the above parameters, task 19 will use the article title in the IUCN API call.
 * parameters
 * taxon
 * genus + species
 * name
 * parameters
 * binomial
 * name

Task 19 does not confirm, update, or add status, status_system, and status_ref when:
 * the binomial is not a binomial; usually because the taxobox or article title uses only the genus portion of the binomial
 * the IUCN API does not recognize the binomial as a valid name. When this happens task 19 adds  and a hidden comment with the unrecognized binomial.  Reasons that the IUCN API might not recognize the binomial are:
 * misspellings
 * typos
 * extraneous text
 * species name might not be 'globally assessed' but instead be 'regionally assessed' – the taxobox does not specify the region of an assessment so task 19 cannot use the regional form of the citation API call
 * IUCN API does not support the redirect-like behavior for binomials as the search box at https://www.iucnredlist.org/ does

parameters status2, status2_system, and status2_ref are not handled in the same way as their non-enumerated counterparts. This is because there are relatively few instances of the enumerated forms (~25 according to this search 2021-09-20). status2_ref may be updated by subsequent task 19 processes but status2 and status2_system will not be.

and support status, status_system, and status_ref but task 19 does not attempt to update these parameters as a group because the use of these parameters in those templates is comparatively rare and because species names upon which task 19 depends are inconsistent in comparison to  and. Task 19 may choose to update the content of status_ref in these templates if the parameter uses an old-form url or is a plain-text citation but will not attempt to update status and status_system nor will it remove duplicate status_ref references.

IUCN status
From the IUCN API call for species data using the binomial, task 19 extracts the  value and the   value. The species IUCN status is confirmed when status has the same value as the category returned from the IUCN API. When they are different, task 19 updates status to the value from the IUCN API. When status is missing (because it was never there or because an empty parameter was deleted) task 19 updates status or adds a new status at the end of the taxobox. Updates, confirmation, and additions are noted in the edit summary.

IUCN status displayed on an IUCNredlist web page may be different from the category returned from the IUCN API – task 19 uses the IUCN API's category; cf. (as of 2021-09-22):
 * NT (from the Zenia insignis web page)
 * LR/nt (from the IUCN API):

IUCN status system
To update or add a taxobox status_system parameter, task 19 extracts the year portion from the IUCN API's  value. If the assessment year is 2000 or earlier, task 19 sets IUCN2.3 otherwise IUCN3.1. The threshold date is taken from Conservation status. When status_system is missing, task 19 adds a new parameter at the end of the taxobox. Updates and additions are noted in the edit summary, confirmations are not.

IUCN status reference
To update or add status_ref, task 19 inspects the parameter value for a date that task 19 would have written () or the existing citation's access-date (in that order). When a date can be extracted from one of these, it is compared to the current date. Task 19 will attempt to update status_ref only when the difference between the current date and the reference date is greater than six months or when no date can be extracted. This six-month limit was arbitrarily chosen on the presumption that IUCN updates their database twice a year.

Task 19 will not update templated citations in status_ref if the citation has one of: Similarly, task 19 will not update plain-text citations in status_ref if the citation has one of: This because the IUCN API does not provide the &lt;year> of amendment or errata.
 * &lt;year>
 * &lt;year>
 * (amended version of &lt;year> assessment)
 * (errata version published in &lt;year>)

When the six month limit is met, and when the citation in status_ref does not hold the amended or errata parameters or strings, task 19 then inspects the associated reference tag:
 * 1)  – unnamed reference;
 * 2) *replaces the value assigned to status_ref with &lt;new from IUCN API>
 * where  in   is a copy of the value assigned to the new  template's access-date parameter
 * 1)  – named reference:
 * 2) *replaces that reference with &lt;new from IUCN API>
 * 3) *replaces all instances of with
 * where  in   is a copy of the value assigned to the new  template's access-date parameter
 * 1) – named self-closed reference:
 * 2) *swaps the self-closed reference tag with the reference definition
 * 3) *replaces the citation as described in 2
 * 4) *if the definition was (and now the self-closed ref tag is) inside then the self-closed ref tag is deleted

template updates
For templates that have old-form urls, task 19 extracts the taxon id from the url and attempts to fetch citation data from the IUCN API using the taxon id. If the IUCN API does not recognize the taxon ID, task 19 will attempt to get a citation from the API by using the value assigned to title in the template. When successful, task 19 replaces the old template with a new  template that has parameter values from the IUCN API citation.

When the taxon/assessment ids in a new template's page and doi parameters are not the same, the citation is not updated because  will emit a doi / url mismatch error message. The mismatch is usually an indication that the assessment has errata. The citation rendered on an IUCN species web page indicates the errata year but, at the time of this writing, that value is not available in the citation returned from the IUCN API. IUCN have been notified of this discrepancy.

plain-text citation updates
For the purposes of this task, plain-text references are untemplated IUCN references inside named or unnamed  tags or IUCN references as a line item in an unordered list ( markup). Task 19 will update plain-text references when it can extract a taxon id from an IUCN page identifier, from an IUCN doi (as a doi inside or as a url), or from an IUCN url.

duplicate citations
Task 19 will replace named and unnamed references that hold templates that match  in status_ref with  tags. associated with named references that hold templates that match  in status_ref are replaced with  tags.

Duplicate references that wholly make up an entry in an unordered list are deleted as redundant.

Task 19 does not remove any other references.

ancillary tasks
Task 19 may update a template's status value in its first positional parameter  from the IUCN API when  has a valid taxon id as its second positional parameter.

As with all other monkbot tasks, task 19 does not run with AWB general fixes turned on.

abandoned edits
Task 19 will abandon edits when:
 * the article uses
 * the article uses  parser functions
 * the number of templates evaluated is equal to the number of IUCN API calls that returned nil values
 * the article contains

edit summaries
Task 19 emits terse edit summaries. An edit summary is a concatenation of one or more of these message fragments:
 * IUCN status confirmed (n×) – number of taxobox status and values that were confirmed to match the IUCN API returned value; when there is only one confirmation (the most common case), the parenthetical count is omitted
 * IUCN status updated (n×) – number of taxobox status and values that were updated to match the IUCN API returned value; when there is only one update, the parenthetical count is omitted
 * IUCN status added – a taxobox status parameter was added using the IUCN API returned value
 * IUCN status system updated – a taxobox status_system parameter was updated to match the IUCN API returned value
 * IUCN status system added – a taxobox status_system parameter was added using the IUCN API returned value
 * IUCN status ref updated – a taxobox status_ref parameter was updated to match the IUCN API returned value
 * IUCN status ref added – a taxobox status_ref parameter was added using the IUCN API returned value
 * [duplicate removed] or [duplicates removed (n×)] – suffix added to 'IUCN status ref updated' or 'IUCN status ref added' messages when duplicate reference(s) have been removed
 * IUCN status ref current – the citation in status_ref is not older than six months
 * evaluated n template(s) – the number of templates that task 19 inspected for use of old-form urls
 * n template(s) modified – the number of templates with old-form urls that task 19 updated
 * evaluated n reference(s) – the number of plain-text references that task 19 inspected
 * n reference(s) modified – the number of plain-text references that task 19 updated
 * API species nil return (id) (n×) – emitted when IUCN API did not return species data for a given taxon id
 * API species nil return (name) (n×) – emitted when IUCN API did not return species data for a given species name
 * API cite nil return (n×) – emitted when IUCN API did not return citation data (species name or taxon id)
 * unrecognized binomial: binomial – the binomial that task 19 used to fetch data from the IUCN API for the taxobox parameter
 * (n/mm:ss.ms) – n is the number of IUCN API calls; mm:ss.ms – minutes, seconds and milliseconds required to process the article

script
tags for error log //

private string code_nowiki (string text) {	return " "; }

//---< E R R O R _ L O G _ A D D > // // adds an error message to the error log list. Probably superfluous. //

private void error_log_add (string message) {	error_log_list.Add (message); }

//---< L O G _ E R R O R S >-- // // writes the content of the error log list to the log file, prettified with wiki markup. //

private void log_errors (string article_title, List error_log_list) {	System.IO.StreamWriter sw; string	time = DateTimeOffset.Now.ToString("u").Substring (11, 9); string	date = DateTimeOffset.Now.ToString("u").Substring (0, 10);

string	log_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\logs\" + date + ".txt";

int		seconds = DateTimeOffset.Now.Second; int		minutes = DateTimeOffset.Now.Minute; int		hours = DateTimeOffset.Now.Hour;

sw = System.IO.File.AppendText (log_file); sw.WriteLine ("*" + article_title + " (" + time + "):");

foreach (string list_item in error_log_list) sw.WriteLine ("*:" + list_item);

error_log_list.Clear;

sw.Close; }

//---< C O U N T E D _ R E P L A C E > // // common function to replace with and bump until no more //

private string counted_replace (string template, string pattern, string replace, ref int count) {	Regex rgx = new Regex (pattern);											// make a new regex from

while (Regex.Match (template, pattern).Success)								// look for in 		{ template = rgx.Replace (template, replace, 1);							// replace one copy of with count++;																// bump the counter }

return template; }

//===========================<< S T A T I C  D A T A >>======================================================

static bool		status_added = false;					// set to true when |status= created in taxobox

static int		plain_text_modified_count = 0;			// number of plain-text citations that were modified from the iucn api static int		plain_text_count = 0;					// total number of plain-text iucn references

static int		api_call_count = 0;						// number of api calls made; this value not reported in edit summary static int		api_fetch_fail_count = 0;				// number of api fetches that failed static int		api_no_cite_return_count = 0;			// number of times that the api returned a non-citation value like: {"value":"0","species":"202965"} static int		parse_fail_count = 0;					// number of times that we couldn't parse the api return static int		page_doi_skip_count = 0;				// number of templates or plain-text references skipped because page and doi assessment ID mismatch (could be errata but since no errata date ...) static int		api_no_species_return_name_count = 0;	// number of times that the api returned a non-species value (species name) static int		api_no_species_return_id_count = 0;		// number of times that the api returned a non-species value (species id for ) static int		iucn_status_updated_count = 0;			// number of times that we updated the iucn status in taxobox-like templates static int		iucn_status_confirmed_count = 0;		// number of times that we confirmed the iucn status in taxobox-like templates static int		iucn_status_system_updated_count = 0;	// number of times that we updated the iucn status system in taxobox-like templates

static string	taxobox_blank = null;					// gets blank taxobox as flag static bool		status_ref_added = false;				// set to true when |status_ref= created static bool		status_system_added = false;			// set to true when |status_system created static bool		status_ref_updated = false;				// set to true when |status_ref= updated static bool		status_ref_current = false;				// set to true when |status_ref= less than 6 months old static int		duplicates_removed_count = 0;			// number of duplicate status references removed

static string	sc_ref_tag_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?";	// these for taxobox |status_ref= handling static string	sc_ref_tag_end = @"""?\s*/\>";

static string	ref_def_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?";		// these for taxobox |status_ref= handling to locate the matching definition static string	ref_def_end = @"""?\s*\>[^\<]*\";

static string	reflist_cleanup = @"(\{\{\s*[Rr]eflist[^\}]*\|\s*refs\s*=[^\}]*)\<\s*[Rr][Ee][Ff][^\>]*/\>";

static string	hide_non_ref_tag_pattern = @"\<((?!/[Rr][Ee][Ff]|[Rr][Ee][Ff])[^\>]*)\>"; static string	angle_open = "__4ng13_0__"; static string	angle_close = "__4ng13_C__"; static string	hide_non_ref_replace_val = angle_open + "$1" + angle_close;

static int		iucn_template_count = 0;				// total number of cite IUCN templates static int		other_template_count = 0;				// total number of cite journal/web templates

//---< A P I >

static string	api_species_url = "http://apiv3.iucnredlist.org/api/v3/species/";	// for fetching species data from the api by name static string	api_species_id_url = api_species_url + "id/";						// for fetching species data from the api by taxon id (for ) static string	api_id_url = api_species_url + "citation/id/";						// for fetching citation data from the api using taxon id static string	api_name_url = api_species_url + "citation/";						// for fetching citation data from the api using binomial

static string	iucn_api_token_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\iucn_api_token";	// token required to be private; stored locally here static string	api_token = null;													// stored at iucn_api_token_file

//---< C I T E  I U C N >

static string	IS_CITE_IUCN = @"(?:[Cc]ite iucn|[Cc]ite IUCN)"; static string	iucn_template_pattern = @"\{\{\s*" + IS_CITE_IUCN + @"[^\}]+\}\}";				// basic cite IUCN template pattern static string	iucn_title = @"\|\s*title\s*=([^\|\}]*)";										// everything in cite IUCN |title= for api calls

static string[] url_patterns = new string[] {		@"https?://www\.iucnredlist\.org/details/(\d+)/\b(?:all|full)", @"https?://www\.iucnredlist\.org/details/full/(\d+)/\d+", @"https?://www\.iucnredlist\.org/details/(\d+)/\d+", @"https?://www\.iucnredlist\.org/details/(\d+)/?", @"https?://www\.iucnredlist\.org/details/summary/(\d+)", @"https?://www\.iucnredlist\.org/search/details\.php/(\d+)/(?:all|summ)", @"https?://oldredlist\.iucnredlist.org/details/(\d+)/\d+", };

static string	ref_param_empty = @"\|\s*ref\s*=\s*([\|\}])"; static string	ref_param_not_empty = @"\|\s*ref\s*=\s*([^\|\}]+)";

//---< C I T E  J O U R N A L / W E B >--

static string	IS_CITE_OTHER = @"(?:[Cc]ite journal|[Cc]ite web)";		// TODO: expand this to include more redirects? static string	other_template_pattern = @"\{\{\s*" + IS_CITE_OTHER + @"[^\}]+\}\}";				// basic cite IUCN template pattern

//---< N E W  C I T E   I U C N > // // parse_pattern doesn't work for citations like this (from Cantleya) because of the 'extra' year ahead of // the binomial: //		Asian Regional Workshop (Conservation & Sustainable Management of Trees, Viet Nam, August 1996) 1998. Cantleya corniculata. The IUCN Red List of Threatened Species 1998: e.T33197A9760751. https://dx.doi.org/10.2305/IUCN.UK.1998.RLTS.T33197A9760751.en .Downloaded on 1 October 2021 // // Haven't seen enough of these to attempt a second parse pattern //

//static string	citation_from_api_pattern = @"\[\{""citation"":""([^""]*)""\}\]"; static string	citation_from_api_pattern = @"\[\{""citation"":""([^\}]*)""\}\]"; static string	parse_pattern = @"(^\D+)(\d{4})\.(\D+)\. The IUCN Red List of Threatened Species (\d{4}): (e\.T\d+A(\d+))\.\D+(10\.2305\/IUCN\.UK\.[\d\-]+\.RLTS\.T\d+A(\d+)\S+)\D+(\d{1,2} [A-Za-z]+ \d{4})";

static string[][] search_and_replaces = {	new string[] {@"(.+?)\sssp\.\s+(.+?)\s(\([^\)]+\))$",		@"$1 ssp. $2 $3"},		// binomen ssp. subspecies (zoology) with errata or amended text	new string[] {@"(.+?)\sssp\.\s+(.+)",						@"$1 ssp. $2"},			// binomen ssp. subspecies (zoology)	new string[] {@"(.+?)\ssubsp\.\s+(.+?)\s(\([^\)]+\))$",		@"$1 subsp. $2 $3"},	// binomen subsp. subspecies (botany) with errata or amended text new string[] {@"(.+?)\ssubsp\.\s+(.+)",						@"$1 subsp. $2"},		// binomen subsp. subspecies (botany) new string[] {@"(.+?)\svar\.\s+(.+?)\s+(\([^\)]+\))$",		@"$1 var. $2 $3"},		// binomen var. variety (botany) with errata or amended text	new string[] {@"(.+?)\svar\.\s+(.+)",						@"$1 var. $2"},			// binomen var. variety (botany)	new string[] {@"(.+?)\ssubvar\.\s+(.+?)\s(\([^\)]+\))$",	@"$1 subvar. $2 $3"},	// binomen subvar. subvariety (botany) with errata or amended text new string[] {@"(.+?)\ssubvar\.\s+(.+)",					@"$1 subvar. $2"},		// binomen subvar. subvariety (botany) new string[] {@"(.+?)\s*(\([^\)]+\))$",						@"$1 $2"}					// binomen with errata or amended text	};

static string	errata_text = @"\(errata version published in (\d{4})\)"; static string	amended_text = @"\(amended version of (\d{4}) assessment\)";

//---< T A X O B O X >

static string	HIDE_ALL_BUT_TAXOBOX = @"(?:[Tt]axobox\s*\||[Ss]peciesbox\s*\|)";							// this to prevent confusion with when hiding static string	IS_TAXOBOX = @"(?:[Tt]axobox|[Ss]peciesbox)";												// for hiding all non-taxobox-like templates static string	taxobox_template_pattern = @"(\{\{\s*(" + IS_TAXOBOX + @"))[^\}]+(\}\})";					// basic taxobox-like template pattern; TODO: ? static string	taxobox_blank_pattern = @"\{\{\s*" + IS_TAXOBOX + @"\}\}";

static string	taxobox_new_stat_sys_ref_pattern = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]+?)(\s*)(\}\})";		// used to create new |status=, |status_system=, and |status_ref= params in taxobox static string	taxobox_status_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\]*\>)[^\<]*(\)";	// used to replace |status_ref= param in taxobox static string	taxobox_status_ref_empty_pattern = @"(\|\s*status_ref\s*=[ \t]*)([\r\n]*[\|\}])";			// used to add reference to |status_ref= param in taxobox

static string	taxobox_status_sc_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\<[Rr][Ee][Ff][^\>]+/\>)";		// used to replace |status_ref= param in taxobox

static string	taxobox_status_ref = null;																	// the 'new' value for |status_ref static string	taxobox_status_ref_open_tag = null;															// it matching ref open tag static string	taxobox_status_ref_sc_tag = null;															// and its matching self-closed tag

static string	stray_dot = @"(\|\s*status_ref\s*=\s*)\.";													// delete stray dot; because I found one such (Astroblepus pholeter) static string	stray_splat = @"(\|\s*status_ref\s*=\s*)\*";												// delete stray spat; because I found one such (Gray short-tailed bat) static string	stray_equal = @"(\|\s*status_ref\s*=\s*)=";													// delete stray equal; because I found one such (Cyprinus hieni) static string	stray_nbsp = @"(\|\s*status_ref\s*=\s*) ";												// delete stray  because I found one such (Euconocephalus remotus) static string	html_comment = @"(\|\s*status_ref\s*=[^\|\}]*)\<!\-\-[^\>]*\-\-\>";							// and html comments static string	unrecognized_species_name = null;															// gets taxobox species name that IUCN doesn't recognize

//---< T A X O B O X _ S T A T U S >--

static string	IS_IUCN_STATUS = @"(\b(?:LC|LR/lc|NT|LR/nt|LR/cd|VU|EN|CR|PE|PEW|EW|EX|DD|NE)\b)";			// also used with

static string	taxobox_status_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*="; static string	taxobox_status_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([\|\}])"; static string	taxobox_status_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([^\|\}]+)";

static string	taxobox_status_pattern = @"(\|\s*status\s*=\s*)[^\|\}]*?(\s*[\|\}])";

static string	status_from_api_pattern = @"""category"":""([^""]+)""";				// for |status=

//---< T A X O B O X _ S Y S T E M >--

static string	IS_IUCN_SYSTEM = @"(\b(?:IUCN2.3|IUCN3.1)\b)";

static string	taxobox_system_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*="; static string	taxobox_system_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([\|\}])"; static string	taxobox_system_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([^\|\}]+)";

static string	taxobox_system_pattern = @"(\|\s*status_system\s*=\s*)[^\|\}]*([^\|\}])";

static string	status_system_from_api_pattern = @"""assessment_date"":""(\d+)";	// for |status_system=

//---< T A X O B O X _ S T A T U S _ R E F >--

static string	taxobox_status_ref_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*="; static string	taxobox_status_ref_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([\|\}])"; static string	taxobox_status_ref_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([^\|\}]+)";

static string	ref_tag_named_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""\>]*)""?\s*\>)"; static string	ref_tag_named_sc_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""/]*)""?\s*/\s*\>)"; static string	ref_tag_unnamed_pattern = @"(\<[Rr][Ee][Ff]\>)";

//---< T A X O B O X _ S P E C I E S _ N A M E >--

static string	binomial_pattern = @"\|\s*binomial\s*=\s*([^\|\}]*)";				// taxobox

static string	taxon_pattern = @"\|\s*taxon\s*=\s*([^\|\}]*)";						// speciesbox static string	genus_pattern = @"\|\s*genus\s*=\s*([^\|\}]*)";						// these two combined to make binomial name static string	species_pattern = @"\|\s*species\s*=\s*([^\|\}]*)";

static string	name_pattern = @"\|\s*name\s*=\s*([^\|\}]*)";						// taxobox and speciesbox

//---< D A T E S >

static Dictionary date_patterns = new Dictionary {	{"dmy", @"\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4})"},		// dmy {"mdy", @"([JFMASOND][a-z]+)\s+\d{1,2}\s*,\s+(\d{4})"},	// mdy {"ymd", @"(\d{4})\-(\d{2})\-\d{2}"}						// ymd };

static string	preferred_status_ref_tag_name = @"iucn status (\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4}))"; static string	access_date = @"\|access\-?date=([^\|\}]+)";

static Dictionary months = new Dictionary {	{"january", 1},										// these for dmy and mdy {"february", 2}, {"march", 3}, {"april", 4}, {"may", 5}, {"june", 6}, {"july", 7}, {"august", 8}, {"september", 9}, {"october", 10}, {"november", 11}, {"december", 12}, {"jan", 1},											// these for dmy and mdy {"feb", 2}, {"mar", 3}, {"apr", 4}, //	{"may", 5},											// same as whole month name; can't have two with the same key {"jun", 6}, {"jul", 7}, {"aug", 8}, {"sep", 9}, {"oct", 10}, {"nov", 11}, {"dec", 12}, {"01", 1},											// these for ymd {"02", 2},	{"03", 3},	{"04", 4},	{"05", 5},	{"06", 6},	{"07", 7},	{"08", 8},	{"09", 9},	{"10", 10},	{"11", 11},	{"12", 12},	};

//--- R E M O V E  D U P L I C A T E   S T A T U S   R E F >-

static string[]	symbols = new string[] {	@"\{",	@"\(",	@"\|",	@"\.",	@"\-",	@"\)",	@"\}",	};

static string	ref_open_tag_unnamed = @"\<[Rr][Ee][Ff]\>"; static string	ref_open_tag_named = @"\<[Rr][Ee][Ff][^\>]*\>"; static string	ref_close_tag = @"\"; static string	bib_open_ul = @"[\r\n]+\*\s*"; static string	bib_close_ul = @"([\r\n]+)";

//---< S P E C I E S _ N A M E _ C L E A N U P >-- // // these things must be removed from binomial before calling the api with the binomial //

static string[][] cleanup_patterns = {	new string[] 	{ref_open_tag_named + @"[^\<]*" + ref_close_tag,	""},	// references; Lampadioteuthis caused api fetch exception new string[] 	{@"\<[Rr][Ee][Ff][^\>]+/\>",	""},						// self-closed references; Sand cat new string[]	{@"\<!\-\-[^\>]*\-\-\>",		""},						// html comment new string[] 	{@"[\.;:]+$",		""},									// trailing punctuation new string[] 	{"(.+)",		"$1"},									// bold wiki markup new string[] 	{"(.+)$",		"$1"},									// italic wiki markup new string[] 	{@"""",				""},									// double quote marks new string[] 	{"†",				""},									// extinction markers new string[] 	{@"\[\[",			""},									// opening wikilink markup new string[] 	{@"\]\]",			""},									// closing wikilink markup new string[] 	{@"\s*\([^\)]+\)",	""},									// disambiguation	new string[] 	{@"[\.;:]+$",		""},									// trailing punctuation (again)	new string[] 	{@"\<nowiki/\>",	""},									// self-closed tag	new string[] 	{@"\<nowiki\>",		""},									// opening tag	new string[] 	{@"\</nowiki\>",	""},									// closing tag	};

//< P L A I N _ T E X T >- // // for plaintext references wrapped in tags or in unordered markup (bibliography); must have a // recognizable page identifier or doi or a url from which a taxon id can be extracted //

static string	plain_text_ref_pattern = @"(\< *ref[^\>]*\>)([^\<]*)(\ )";								// ref tags and reference are captured static string	plain_text_bib_pattern = @"([\r\n]+\*)([^\r\n]*iucnredlist\.org[^\r\n]*)([\r\n]+)"; 		// some sort of iucn ref in unordered list

static string	plain_text_page_taxon_id = @"\be\.T(\d+)A\d+";												// get taxon id from page static string	plain_text_doi_taxon_id = @"\bRLTS\.T(\d+)A\d+";											// get taxon id from doi static string	plain_text_taxon_id_url = @"https?://(?:www|oldredlist)\.iucnredlist\.org/\S+?/(\d+)\S+";	// get taxon id from url

//---< I U C N  S T A T U S >

static string	iucn_status_template_pattern = @"(\{\{\s*IUCN status[^\}]+\})"; static string	iucn_status_lead = @"(\{\{\s*IUCN status\s*\|\s*)"; static string	iucn_status_status = iucn_status_lead + IS_IUCN_STATUS; static string	iucn_status_id = @"(\{\{\s*IUCN status\s*\|[^\|]+\|\s*)(\d+)";

// Monkbot_task_19_cite_iucn_update.cs