User:BrownHairedGirl/No-reflinks websites

This page is for my work on bare URL references to websites where WP:REFLINKS never finds an article title. Where testing here and on sample URLs confirms that WP:REFLINKS consistently fails, I have been using WP:AWB to add Bare URL inline to all WP:Bare URLs refs to that website.

As of December 2021, the lists included over 1,400 websites.

After that, I changed my workflow, and just added new websites as I found them instead of making batches. As of September 2022, the list probably includes over 2,000 websites, maybe over 3,000.

Purpose
'This tagging applies only'' to the URL which is tagged. In most cases WP:REFLINKS will be able to fill other bare URLs on an article which has been tagged in this way.'''

In all cases, other tools are available.

Reference-filling tools
Unfortunately, Reflinks has been unmaintained for several years, so there is no prospect of a fix to its flaws, which include: If you use ReferenceExpander, please note that it saves changes without preview. So check the diff of its edit very carefully, and be ready to revert its edit. Note that ReferenceExpander tries to rewrite every ref which doesn't use a cite template, including. In simple cases (e.g. ) it usually works fine, but since it strips all existing info, it can lose a of detail on a more complex ref. For example, in this (reverted) edit it mangled a lot of refs, such as changing  to   ... losing the author and date. If a ref has been archived using webarchive, it will strip that too. It fixes lots of refs which Reflinks cannot fix, but it can also go spectacularly wrong on some refs. Preview its output very carefully, by checking the diff before saving. It has a particularity nasty habit of trimming the URL where the website doesn't issue a proper 404 error, but redirects to the homepage. For example if there is a bare ref to http://example2.com/somethingaboutnothing (or a formatted ref such as  or  ), and the page http://example2.com/somethingaboutnothing is redirected to http://example2.com, then Reflinks will change the URL to http://example2.com, and change the page title. Nasty; please beware. The major weakness of Citation bot is that it uses an external tool (the "zotero") to get the title of a webpage. There are many sites for which the zotero never returns a title; and additionally the zotero is frequently overloaded and requests to it timeout, so a failure to get a title cannot be counted as definitive.
 * WP:REFLINKS: will handle only bare links, which means that it doesn't spend time checking refs which have already been filled. If there are only a few bare URLs in an article with many refs, that can be a big timesaver.
 * Reflinks cannot connect to many thousands of live websites, so it cannot get any info about those webpages. That is why I have taken to tagging the pages where it fails.
 * It doesn't recognise the Bare URL inline tag, and skips any bare URLs which have that tag. That's why I apply the inline tag only to links to websites which Reflinks can never handle anyway.
 * Reflinks uses cite web's publisher parameter when it should use website
 * Reflinks often puts junk in the author parameter
 * ReferenceExpander: excellent on bare links, but can go bonkers in some cases; see a disastrous result at.
 * WP:REFILL: powerful but buggy. Use with care!
 * Unlike the tools above,  works only on completely bare refs, or on refs which already have a cite template; it will not apply a cite template to a ref of the form .  However it is very accurate, and actively maintained by a very conscientious bot owner, who rapidly fixes any bugs.
 * of course, references may always be filled manually.

Secondary tasks
If a page is edited to add one or more Bare URL inline tags, the following secondary tasks may be performed:

General fixes
General fixes (see WP:GENFIXES) are a set of semi-automated edits that are enabled by default in AutoWikiBrowser. They are intended to be uncontroversial and require minimal human oversight; many are cosmetic and improve wikitext readability but do not affect display to readers.

Fixing &lt;br&gt;
Per H:BR, a line break formatted as an unclosed tag  breaks some syntax highlighters. This AWB job will convert such uses to.

This is a cosmetic change which makes no difference to how the page is displayed, but it assists those editors who use some syntax highlighters.

Unhiding bare URLs
Note that since 21 November 2011, these edits have also been making the text of bare URL and dead link refs visible. A ref text "[1]" or "[3]" is useless to the reader.

For example, a ref of the form  will be displayed in the reference list simply as a number: 5

That bare number tells the reader nothing about what the link is, so this task strips the square brackets, making the ref render as https://example.com/foo

See for example this edit to Ruben Katoatau.

T291704

 * Function disabled as of 15 December 2021. T291704 has been fully fixed in v2.0.8.5 of 

Adds dashes to undashed uses of the cite parameters access-date, archive-date, and archive-url. For example, accessdate → access-date.

This is a work-around to the bug T291704 in. That bot currently doesn't recognise the undashed form, and may create a duplicate parameter, which is an error tracked in Category:CS1 errors: redundant parameter. The bot owners are working on a fix, but have not yet got a solution.. So until a fix to the bot is implemented, adding the dashes helps reduce errors.

Failure types
In most of these cases, WP:REFLINKS consistently fails at the initial stage of setting up an HTTPS connection. For example, reflinks always fails to get a title for links to the Wall Street Journal website https://www.wsj.com/, giving errors such as

In some cases, such as most major Australian newspapers, REFLINKS successfully connects, but consistently returns a useless generic title such as "No Cookies" or "Loading 3rd party ad content". One of the most common bare URL ref is to Twitter, where Reflinks returns the title "JavaScript is not available".

Lists of websites to be tagged
The list of websites has grown very large, so it has been split into sub-pages:
 * 1) User:BrownHairedGirl/No-reflinks websites/Set 1
 * 2) User:BrownHairedGirl/No-reflinks websites/Set 2
 * 3) User:BrownHairedGirl/No-reflinks websites/Set 3
 * 4) User:BrownHairedGirl/No-reflinks websites/Set 4
 * 5) User:BrownHairedGirl/No-reflinks websites/Set 5: timeouts
 * 6) User:BrownHairedGirl/No-reflinks websites/Set 6

Websites which are still being tested, and are not yet being processed by AWB, are listed at User:BrownHairedGirl/No-reflinks websites/sandbox.

Testing the list of websites
All of the refs tagged by this AWB job have been repeatedly tested with WP:REFLINKS.

Feel free to run the tests yourself, but ... '''please note that the test sets are very large. They take many minutes to process, and impose a heavy load on the server which hosts Reflinks'''.


 * 1) test Set 1
 * 2) test Set 2
 * 3) test Set 3
 * 4) test Set 4
 * 5) test Set 5: timeouts
 * 6) test Set 6
 * 7) test the sandbox

False positives
The following websites consistently failed to give titles when tested, but after the refs had been tagged, they started giving titles:
 * http://allafrica.com
 * https://www.kttc.com
 * http://articles.orlandosentinel.com/
 * http://www.orlandosentinel.com/
 * https://www.chicagotribune.com
 * https://english.elpais.com
 * http://hamptonroads.com
 * https://www.sun-sentinel.com
 * http://articles.sun-sentinel.com