User:Tom.Reding/JPL to Infobox planet

What this code does
This C# AWB module is meant to facilitate, and reduce the error associated with, tediously transcribing data from JPL's Small-Body Database to most of WP's ~3,200 minor planet articles. {{cmn|colwidth=14em| }}
 * 1) General conformity: Generally tries to model itself on the lowest-numbered MPs, which are popular enough to be considered the consensus for what to display and how (after looking at a subset of 15-25 of them). If I have mischaracterized something, please bring it up on this page's talk page.
 * 2) Uses  with aphelion,undefined semimajor, perihelion, moid, & jupiter_moid.
 * 3) Uses  when error is known for rotation, albedo, mean_radius, & abs_magnitude, otherwise uses the bare.
 * 4) If background != 3-6-digit hexidecimal or does not exist, then make/add default #FFFFC0 (see User:Rfassbind/sandbox/color-scheme).
 * 5) Spacing conformity & standardization:
 * 6) Maintains the existing spacing format of the infobox before and after   and   characters, based on most existing, non-empty parameters (up to 108 out of 114, and the most frequent usage wins).
 * 7) If 3 or more spaces exist before the , it's assumed that some form of  -alignment is being used, so h-alignment is maintained.
 * 8) 2 or more spaces after the   are assumed to be typos and are truncated to 1.
 * 9) If one of the 3   parameters start with , adds   to the other 2, if they're not empty.
 * 10) If   is used > 40% of the time that it could have been used between values and  /s, then apply it 100% of the time, including headings (&#42;_refs).
 * 11) Unicode tabs are replaced with a single space prior to whitespace standardization.
 * 12) Standardization/Formatting/Cleanup:
 * 13) Appends/adds a JPL ref to orbit_ref, if no ref names in that parameter contain "jpl|sbdb|orbit" (case insensitive) nor possess a JPL URL. If a "jpl" ref name doesn't exist in the article, nor a named ref with a JPL URL, adds a full ref named "jpldata" to orbit_ref; otherwise, the 1st ref containing "jpl|sbdb|orbit" or JPL URL is used, in that order.
 * 14) Fills in "bare" JPL & Lowell FTP refs anywhere.
 * 15) Replace unnamed Cite SBDB refs with master , which is modeled after Cite SBDB.
 * 16) Ref-wrap any bare URL in orbit_ref.
 * 17) Removes deprecated parameters (listed below).
 * 18) Corrects Infobox planet aliases to.
 * 19) Moves end-of-line   to the next line.
 * 20) Moves parameters on the " {{Infobox planet " line to the next line.
 * 21) Moves multiple infobox-parameters (in-line citation-parameters are untouched) on the same line to multiple lines.
 * 22) If 1 OE parameter contains a symbol (,  , etc.) symbols will be added to/maintained on all other OE params.
 * 23) Moves mean_radius under {{para|dimensions}} for easier checking.
 * 24) Moves text between last  //{{tlx|Cite SBDB}} and EndOfLine to before.
 * 25) Moves lines starting with   or   up to the end of the previous line.
 * 26) Removes lines that don't start with ,  , or  , with/out leading whitespace.
 * 27) Removes parameter-description comments from non-empty parameters.
 * 28) Removes redundant JPL link in ==External links==.
 * 29) Fix mislinks to List of minor planets: 1001–2000, etc.
 * 30) Remove text after last   if a reference exists before it, but not after it.
 * 31) Remove text after  no matter what, for {{para|period}}.
 * 32) Precision: All modified values are also rounded. Mimics the existing infobox precision if >= 5 digits; otherwise it defaults to 5-digit precision (similar to JPL's error-precision) or the average precision of existing parameters, whichever is higher, if possible. Exceptions:
 * 33) {{para|period}}: Year's precision is unmodified because it is rarely (if ever) more precise than 0.01 years (3.6525 days). Day's value, which seems to be derived from years, instead of vice versa, is truncated to 0-2 decimal places, inversely proportional to the size of year (i.e. a several hundred+ year period is given an integer day value). It's tempting to ascribe 4-decimal precision to days, but multiplying an uncertain value by a very certain (or exact) value does not bestow that certainty to the result. I.e. 0.0001 days leads the reader to assume an 8.64 second precision, which is actually swimming in a 3.65 day uncertainty.
 * 34) JPL's MOID values are unmodified, due to not having any listed error values.
 * 35) {{para|mean_motion}} (new) uses {{tl|Deg2DMS}} if < 1°.
 * 36) Uncertainty:
 * 37) Uncertainty  is not displayed for Orbital Elements (OE), because 1) the uncertainty is generally extremely small (MPs with low {{para|observation_arc}} might be skipped in the future, or maybe uncertainties >= 10% will be shown, haven't decided yet (needs consensus, ideally)), and 2) OE uncertainties generally don't appear in the lowest #'d MPs (see 1. General Conformity, above).
 * 38) When available, uncertainties are used, for: abs_magnitude & albedo since MP size is very sensitive to these values, and {{para|mean_radius}}, {{para|dimensions}}, and rotation.
 * 39) Display uncertainty for OE if they're large (>= 1, or {{para|mean_motion}} error > 1/2 the value).
 * 40) Updating access-date:
 * 41) Appends or replaces the {{para|access-date}} value with   in the jpl/sbdb/orbit 'master' ref (excludes {{tl|JPL small body}} & {{tl|MPCit JPL}}).
 * 42) Adopts the local spacing convention when appending access-date.
 * 43) Accommodating existing values:
 * 44) Prepends JPL's value + JPL ref for {{para|abs_magnitude}}, {{para|rotation}}, {{para|albedo}}, {{para|mean_radius}}, {{para|dimensions}} if non-JPL refs exist in that parameter; otherwise updates the JPL value (assuming the ref name contains "jpl", "sbdb", or JPL URL, case insensitive).
 * 45) Prefers to  -separate different values, if  -separation doesn't exist.
 * 46) Prefers  -separation for {{para|abs_magnitude}}, since multiple values can easily render on 1 line.
 * 47) Skipping:
 * 48) Skips pages without an Infobox planet or alias (to do, low priority: allow code to build an infobox if none exist).
 * 49) Skips infoboxes that don't end with   or   on a separate line (~8% of MP infoboxes prior to fixing).
 * 50) Skips pages with recently updated JPL references, based on {{para|access-date}} and user-defined month & year lists.
 * 51) New parameters: Appends new parameters to the bottom of the infobox, except {{para|minor_planet|yes}} and {{para|background|#FFFFC0 }}, which are inserted at the top, and {{para|mean_radius}}, which is inserted/moved to below {{para|dimensions}}.
 * 52) Scope: The following 41 {{tl|Infobox planet}} parameters are accounted for/operated on in some way (alphabetized,  = deprecated, * = fix spacing only):
 * {{para|abs_magnitude}}
 * {{para|albedo}}
 * aphelion
 * {{para|arg_peri}}
 * {{para|asc_node}}
 * {{para|atmosphere_ref}}
 * background
 * ({{para|bgcolour}})
 * {{para|caption}}*
 * ({{para|designations}})
 * ({{para|diameter}}){{sup|never used}}
 * {{para|dimensions}}
 * {{para|discovered}}
 * ({{para|discovery}})
 * {{para|discovery_ref}}
 * {{para|eccentricity}}
 * {{para|epoch}}
 * {{para|inclination}}
 * jupiter_moid
 * {{para|label_width}}
 * {{para|mass}}
 * {{para|mean_anomaly}}
 * {{para|mean_motion}}
 * {{para|mean_radius}}
 * {{para|minorplanet}}
 * moid
 * {{para|mp_name}}*
 * {{para|name}}*
 * {{para|named_after}}*
 * {{para|observation_arc}}
 * orbit_ref
 * ({{para|orbital_characteristics}})
 * {{para|p_orbit_ref}}
 * perihelion
 * {{para|period}}
 * ({{para|physical_characteristics}})
 * {{para|rotation}}
 * semimajor
 * {{para|tisserand}}
 * {{para|uncertainty}}
 * ({{para|width}})

The settings file is essentially empty. The only boxes checked are Options > Apply general fixes, Skip > No changes made and Page is redirect, and Start > Minor edit.

What this code does not
This is not a "set it and forget it" script.
 * 1) It must be babysat.
 * 2) It's only meant to keep obscure, infrequently edited minor planets up-to-date (i.e. best for numbered MPs > 500-1000, and more care required for those < 500).
 * 3) Frequently edited pages with multiple-ref infoboxes might not play well with this script (but many do).
 * 4) Each modified parameter-line in the infobox must be examined so that no useful information is lost nor wikisyntax broken. This is due to the large number of display-variants, which the script attempts to account for, but exceptions will always exist and they need to be spotted.
 * 5) Most display-variants (mutiple  s (-delimited or  -delimited), multiple values with or without their measurement error, efn, small, val, convert) are accounted for for these parameters only (the most common offenders, and because you're not allowed to make custom classes in AWB): abs_magnitude, rotation, albedo, mean_radius, dimensions.

~ Tom.Reding (talk ⋅dgaf) 17:22, 18 March 2016 (UTC)

AWB custom module
(Syntax highlighting is broken because code is too long)