User:Ohconfucius/script/Sources

Objectives
Main objectives, as applied to reference sections or otherwise within citation templates, are as follows:
 * 1) make source name congruent with WP article namespace of same
 * 2) italicisation is applied in accordance with WP:ITALICS
 * 3) Wiki-link neutral, usually links will not be removed although links may be piped in certain cases where necessary
 * 4) Space neutral – there should be no impact on the disposition of spaces before or after parameters in edit mode
 * 5) clean up superfluous data, parameter miscategorisations, etc. from data trawling by Reflinks
 * 6) retraining of redirecting (indirect) piped links, where these impact the working of the script
 * 7) remove unpopulated parameters within citation templates
 * 8) remove hyperlinks within journal, website, work and publisher fields (CS1 errors)
 * 9) where the contents of work and publisher is identical, the two are merged (i.e. one of them is discarded).
 * 10) unification: ensure uniqueness of each of work publisher and location; please check that the desired one is retained.

General principles
The rationale and principles applied are as follows:
 * urls situated within url are protected; this protection extends to any linking text (e.g.: whether "http://time.com", or "Siklos, Richard. “Made to Measure” Fortune Magazine, February 20, 2008" );
 * sources cited are to be retrained where a journal is traditional media (e.g. The Times) and its online version (e.g. Times Online or times.co.uk) is cited
 * the terms 'online', 'magazine' or 'newspaper' is dropped unless its use conforms with the Wiki naming conventions of the traditional source. (e.g. Time and not Time Magazine; The Guardian and not Guardian Unlimited)
 * the traditional journal name (e.g. The New York Times) should reflect the article namespace, with attention being paid to the article in the subject name (e.g.  The New York Times); similarly, consistent stylisation should also be ensured (e.g. The Globe and Mail - without the ampersand); 'AFP' will be expanded to 'Agence France-Presse'
 * italicisation will be done on an 'opt-in' basis, although an 'intuitive basis' will also be applied
 * sites with names sounding like traditional media or that contain words like 'Daily', 'Weekly', 'Monthly', 'Magazine', 'Times', 'Observer' are italicised.
 * new media sources will be non-italicised by default; names suffixed,  ,  , etc are classed as 'publisher' and unitalicised
 * In line with convention, television channels (e.g. BBC1, Fox News) and networks (particularly US TV and radio stations that use 4-lettered call signs beginning with a "K" or "W") remain unitalicised, whilst only programmes (e.g. Newsnight or Today) are considered 'works'
 * Portals (e.g., Yahoo!, Google, ESPN, etc), as well as their individual channels (e.g., Yahoo! Music, Google News, ESPNcricinfo, etc), are unitalicised
 * news agencies (e.g., Reuters, AFP etc) will be classed as 'agencies' within citation templates even though they may also be acting as publishers in certain cases. They remain un italicised.
 * via is used for Self-published sources such as Youtube or Vimeo
 * functionally, correct italicisation will be performed by switching to an appropriate parameter (to or from work, newspaper or journal <–> publisher); 'work' is used to achieve italicisation when switching from publisher as the script cannot customise to the citation template being used).


 * Citations to primary sources (social media sites such as Twitter, Facebook) are tagged Primary source inline
 * as title renders the title with double quote marks, extra double quote marks bounding the title will be removed.
 * journal, work, newspaper, periodical, where correctly used to denote journals or other works that ought to render as italicised (per WP:ITALIC) will not be disturbed.
 * publication locations
 * are not given for e-sources; but they are generally not removed either
 * are unlinked
 * may be used to disambiguate names that are used for publications of different places (e.g. The Sun may refer to unrelated publications in Hong Kong, Malaysia, Nigeria and the United Kingdom)


 * In general, linking status will be respected by the main function unless such preservation involves complex piping that cannot be easily scripted for; a separate button is provided for unlinking all sources.
 * Where sources are news reports, publisher name is unnecessary – per documentation at citation – the cited publications themselves are often better-known than their publishers. Thus some publishers fields and publisher names are removed outright to reduce template clutter (e.g. "The New York Times Company" is removed for The New York Times, "Time Inc." is removed for Time).
 * as indicated on the doc to the citation templates, publication locations are given only where the source is not well-known (i.e. not BBC or CNN) or this isn't obvious from the journal name (San Francisco Chronicle vs The Telegraph);
 * Citations to internal articles (even in other non-English language WPs) and certain deprecated sources may be removed. Care should therefore be exercised when the script is used on articles for The Epoch Times and Daily Mail, as use is permitted under WP:SELF.
 * some unpopulated fields within citation templates may be removed


 * Correction of CS1 errors:
 * Removal of external link in any of the CS1 or CS2 citation title-holding parameters;
 * Where the "title" mistakenly contains an URL, it will be blanked with a commented ;
 * Where parameters other than url (e.g. chapter, journal, magazine, newspaper, publisher, title, work, via) contain hyperlinked text, the URL part is removed, leaving only the text; the strings  and   are systematically removed in any event;
 * Removal of italic or bold  wikimarkup in:  publisher and periodical parameters.

CITE name function
This function attempts to generate unique names for citations and adds "name= " to the tag. The unique name is generated in two possible ways and in the following order:
 * 1) The regex searches the url of the citation for the first numerical string of 6 digits or more, and suffixes it with the domain name.
 * 2) The regex looks up the date within the url of the citation and suffixes it to the domain name in the format; it further appends the first "word" (alphabetical string) found after the date string such that the string is.

It will therefore not work if no unique identifier strings or dates can be found.

When faced with citations without names where the date is populated, the script will prefix the domain name with the date

Fill DOMAIN_NAME function

 * The regex looks at the url, extracts the domain name and populates the publisher field.

Actions and test
Link to script code: User:Ohconfucius/script/Sources.js

Speed of script execution may vary depending on browser.

Should the script stall when working on large articles, press  on the pop-up menu – once is usually sufficient.

Some examples of what the script does on its own follow: Once you are in edit mode, there are [FOUR] buttons from this script in the toolbox in the left margin:
 * 1) 'Fix SOURCES' ('New source module' in the current version);
 * 2) 'Add REFTAGS' (Insert missing ref tags – use when the article contains bare urls);
 * 3) 'CITE name' (gives names to all citations)
 * 4) 'Fill DOMAIN_NAME' (imports domain names to publisher field; requires the existence of an empty publisher)

Known limitations or contraindications

 * 1) The script renames certain parameters so duplications may occur, for example with aliases. (see the citations in 1, 2 and 3, 4 for example)
 * 2) Journals with similar or shared names may cause false negatives: for example, where journals differ only in the definite article in the name, the script may fail to detect and correct (e.g. The Daily Star vs Daily Star).
 * 3) a publication (using publisher) which was italicised may lose italicisation due to automatic removal of the toggle if it is not included in the dictionary of journals and periodicals within the script.

Disclaimer
Users are expected to exercise careful judgement in the context of each article in which they run this script. Use at your own risk and make sure you check the edit changes before you save. It's not my fault if someone misuses this script.

Test page

 * User:Ohconfucius/script/Sources/test (Year-2020 version).
 * User:Ohconfucius/test/Sourcestest