MediaWiki talk:Captcha-addurl-whitelist

more links
Hi, can we add the BBC, Guardian and independent?:
 * https://www.bbc.co.uk/
 * https://www.theguardian.com
 * https://www.independent.co.uk/
 *  Ϣere Spiel  Chequers  17:58, 21 March 2018 (UTC)
 * At least 1 day hold - not sure if this should have wider review (perhaps at WP:RSN)? — xaosflux  Talk 12:22, 26 March 2018 (UTC)
 * I think this needs a wider review as it has a broad impact. Please bring post at least a discussion on WP:RS for the specific additions you would like.  If there is no objection after a week (or if consensus forms for support) please include that discussion link here and reactivate the edit request. —  xaosflux  Talk 14:13, 28 March 2018 (UTC)
 * We have the New York Times, which is unfortunately paywalled, and it seems to me that dozens of other sources could safely be listed. In the UK, the BBC and Guardian are obviously of a similar calibre.  The Independent is not impartial but I think it still qualifies along with the Times (of London), Financial Times, Telegraph and some of the tabloids.  I expect most other countries with free speech could provide a similar list. Certes (talk) 14:39, 28 March 2018 (UTC)
 * this page has very little watchers, I suggest you bring this up at WP:RSN or another large forum. Please include specific URL/domain names in discussions for review.  This certainly CAN be expanded easily from a technical level. —  xaosflux  Talk 14:46, 28 March 2018 (UTC)
 * Thanks. I certainly didn't know about this page until you kindly pointed me at a link to it.  My reply was more aimed at WereSpielChequers or anyone else bringing the topic up at WP:RSN.  It's all too easy for us to take easy editing for granted and to overlook the obstacles which (perhaps for good reasons) lie in the way of newcomers. Certes (talk) 15:35, 28 March 2018 (UTC)
 * if you brought this up at a larger venue like RSN and it was OK, feel free to reactivate the edit request and add the discussion link below. — xaosflux  Talk 00:29, 19 April 2018 (UTC)

From the Wikipedia Library
Hi,

Sam Walton provided this list of websites from the Wikipedia Library partners. Clayoquot (talk &#124; contribs) 23:13, 29 March 2018 (UTC)


 * I posted at Reliable_sources/Noticeboard for a review, if no issues in a week please activate the edit request tag at the top of this section. Thanks, —  xaosflux  Talk 01:51, 30 March 2018 (UTC)
 * pending RSN or time. — xaosflux  Talk 14:51, 30 March 2018 (UTC)
 * Thanks. The relevant discussion is now archived and there were no objections. Cheers, Clayoquot (talk &#124; contribs) 22:14, 18 April 2018 (UTC)


 * — xaosflux  Talk 23:09, 18 April 2018 (UTC)
 * ✅ these have been added, let me know if you see any trouble. —  xaosflux  Talk 23:14, 18 April 2018 (UTC)
 * Excellent! I'm glad I mentioned it, which I think is what led to all this activity. Thanks for getting some sensible updates through, all. :) Quiddity (WMF) (talk) 23:41, 18 April 2018 (UTC)
 * Thanks to as well. —  xaosflux  Talk 00:27, 19 April 2018 (UTC)
 * No problem! There are definitely many more sites that could be added here, but that's a good start :) Samwalton9 (WMF) (talk) 09:50, 19 April 2018 (UTC)

Proposal to add major newspapers etc.
A short RSN discussion showed some support for the principle of adding major newspapers to this list, and I think we can extend that to some other media such as the BBC. Should we produce a full list for approval? Please can non-UK editors add respected journals from their own countries? The Washington Post, The Globe and Mail and The Hindu have been suggested. I've left off tabloids such as The Sun (United Kingdom) and the Daily Mirror to maximise the chance of approval. I hope we can leave the initial www off the URL pattern, to allow variants such as news.bbc.co.uk. The Times has a paywall; is it worth including such sources?

Someone recently posted a link to a useful article with a Venn diagram classifying news sources by political bias and level of detail, but I've lost it. Please can someone point us at that again? Thanks, Certes (talk) 10:44, 19 April 2018 (UTC)
 * activated an edit request too see if any patrolling admins want to comment before processing. — xaosflux  Talk 12:03, 19 April 2018 (UTC)
 * Would it be better to start this discussion somewhere else, returning if and when it has enough detail and support to qualify as an edit request? If so, is WP:RSN the right forum?  I don't think anyone doubts that these are reliable sources; the question is whether they should be added to this whitelist. Certes (talk) 12:19, 19 April 2018 (UTC)
 * RSN is the best forum I can think of for these, you can move it there, or just link in to this from there with a summary. Basically if domains are representative of reliable sources, are useful for new users, and not being abused (such as for spam, advertising, selling subscriptions, etc) they are OK to be on this list as far as I'm concerned. —  xaosflux  Talk 12:27, 19 April 2018 (UTC)
 * A notice was posted at WP:RSN on 19 April asking that people come here to comment. EdJohnston (talk) 14:39, 22 April 2018 (UTC)
 * FWIW, I fully support this. Ed [talk] [majestic titan] 19:57, 22 April 2018 (UTC)
 * — xaosflux  Talk 20:08, 22 April 2018 (UTC)
 * ✅ — xaosflux  Talk 20:12, 22 April 2018 (UTC)
 * Thank you! I still hope editors from beyond the UK will contribute similar lists for their countries. Certes (talk) 22:48, 22 April 2018 (UTC)

What exactly is this?
I wonder what exactly is this? Is this just a list of urls that don't require a CAPTCHA for unregistered users? Therefore should we add all low risks but popular URLs? --Emir of Wikipedia (talk) 20:49, 22 April 2018 (UTC)
 * yes, normally unregistered and new editors have to solve a captcha to add links; these specific domains are exempt from that.  There is some performance to consider, so keeping this to "popular" as in links that are actually being appropriately added to pages is a factor.  In general this means the links should be for "reliable sources".  It is important that the exemptions are not useful for disruptive use as well. We have only recently begun using this and this page is not well watched - I suggest discussing additions at WP:RSN first. —  xaosflux  Talk 21:39, 22 April 2018 (UTC)
 * Thanks for the information. I have seen the discussions at RSN and came here for clarification. --Emir of Wikipedia (talk) 20:01, 23 April 2018 (UTC)

Please add IPCC and National Academies domains
Could you please add:


 * ipcc.ch (Intergovernmental Panel on Climate Change)
 * nap.edu (National Academies of Sciences, Engineering, and Medicine)

? Clayoquot (talk &#124; contribs) 22:52, 22 February 2020 (UTC)
 * ❌ (not yet) following the directions, please link to where this was discuss additions publicly such as at the Reliable sources/Noticeboard. — xaosflux  Talk 14:02, 23 February 2020 (UTC)
 * , it's pretty inconceivable that a discussion at RSN would yield a result other than "yes, those are reliable sources". Would you consider pulling an IAR to add these two without going through a community process? Best, Clayoquot (talk &#124; contribs) 17:57, 23 February 2020 (UTC)
 * I'll leave this open for at least a day in case anyone else wants to skip the discuss (which on these is usually more of a 'no objections, go ahead') type. I've never heard of ipcc.ch, (it appears to only have 5 article usages).  nap.edu only appears to have 4 article usages as well - so at the very least these don't seem to be popular sources. —  xaosflux  Talk 19:00, 23 February 2020 (UTC)
 * , For www.nap.edu, I'm seeing usage in 957 pages, and www.ipcc.ch appears to be referenced in 736 pages. Clayoquot (talk &#124; contribs) 17:42, 24 February 2020 (UTC)
 * Looks like I had my wildcard wrong, more popular than my first count indeed :) — xaosflux  Talk 18:14, 24 February 2020 (UTC)
 * , We've all done that :) Clayoquot (talk &#124; contribs) 02:54, 25 February 2020 (UTC)
 * please post at WP:RSN if you are ignored for a week, reactivate and I'll add here. — xaosflux  Talk 15:26, 27 February 2020 (UTC)
 * Posted there. Thanks. Clayoquot (talk &#124; contribs) 18:20, 27 February 2020 (UTC)
 * Done. There were no objections: https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Noticeboard/Archive_286#CAPTCHA_exemption_for_reliable_domains Clayoquot (talk &#124; contribs) 22:00, 7 March 2020 (UTC)
 * Could someone make this change please? ? Clayoquot (talk &#124; contribs) 17:30, 11 March 2020 (UTC)
 * ✅ as there were no objections, I've added. —  xaosflux  Talk 17:38, 11 March 2020 (UTC)

RfC on adding generally reliable sources to the CAPTCHA whitelist
There is a request for comment on adding generally reliable sources from the perennial sources list to the CAPTCHA whitelist, which allows new and anonymous users to cite them in articles without needing to solve a CAPTCHA. If you are interested, please participate at. —  Newslinger  talk   19:42, 7 March 2020 (UTC)
 * The discussion has passed with "near-unanimous" consensus in favour of the proposal and should be implemented. For future reference, it is now archived at Reliable_sources/Noticeboard/Archive_291. 107.190.33.254 (talk) 17:01, 7 May 2020 (UTC)
 * Would someone please regex this up in to a ready to go addition, then activate the edit request here? — xaosflux  Talk 00:57, 8 May 2020 (UTC)

Not sure why this discussion died out, but on WP:RSNP, this did the trick:

\babcnews\.com \babcnews\.go\.com \btheage\.com\.au \bafp\.com \baljazeera\.com \baljazeera\.net \bamnesty\.org \badl\.org \baon\.com \barstechnica\.com \barstechnica\.co\.uk \bap\.org \bapnews\.com \btheatlantic\.com \btheaustralian\.com\.au \bavclub\.com \bavn\.com \baxios\.com \bbbc\.co\.uk \bbbc\.com \bbehindthevoiceactors\.com \bbellingcat\.com \bbloomberg\.com \bbusinessweek\.com \bburkespeerage\.com \bbuzzfeednews\.com \bbuzzfeed\.com \bcsmonitor\.com \bclimatefeedback\.org \bcnet\.com \bcnn\.com \bcodastory\.com \bcommonsensemedia\.org \btheconversation\.com \btelegraph\.co\.uk \bdeadline\.com \bdeadlinehollywooddaily\.com \bdebretts\.com \bdeseretnews\.com \bdw\.com/en \bdigitalspy\.co\.uk \bdigitalspy\.com \bthediplomat\.com \beconomist\.com \biranicaonline\.org \bengadget\.com \bew\.com \bft\.com \bforbes\.com \bfoxnews\.com \bfoxbusiness\.com \bgamedeveloper\.com \bgamasutra\.com \bgameinformer\.com \bwyborcza\.pl \bgeonames\.usgs\.gov \bgizmodo\.com \btheglobeandmail\.com \btheguardian\.com \bguardian\.co\.uk \btheguardian\.co\.uk \bhaaretz\.com \bhaaretz\.co\.il \bthehill\.com \bthehindu\.com \bhollywoodreporter\.com \bhuffpost\.com \bhuffingtonpost\.com \bhuffingtonpost\.co\.uk \bhuffingtonpost\.ca \bhuffingtonpost\.com\.au \bhuffpostbrasil\.com \bhuffingtonpost\.de \bhuffingtonpost\.es \bhuffingtonpost\.fr \bhuffingtonpost\.gr \bhuffingtonpost\.in \bhuffingtonpost\.it \bhuffingtonpost\.jp \bhuffingtonpost\.kr \bhuffpostmaghreb\.com \bhuffingtonpost\.com\.mx \bidolator\.com \bign\.com \bindependent\.co\.uk \bindianexpress\.com \binsider\.com \bthisisinsider\.com \bipsnews\.net \bipsnoticias\.net \bipscuba\.net \btheintercept\.com \bifcncodeofprinciples\.poynter\.org \bjacobinmag\.com \bcatalyst-journal\.com \bjamanetwork\.com \bthejc\.com \bkirkusreviews\.com \bkommersant\.ru \bkommersant\.com \bkommersant\.uk \blatimes\.com \bmg\.co\.za \bthemarysue\.com \bmetacritic\.com \bgamerankings\.com \bmonde-diplomatique\.fr \bmondediplo\.com \bmotherjones\.com \bmsnbc\.com \bthenation\.com \bnationalgeographic\.com \bnbcnews\.com \bnewrepublic\.com \bnymag\.com \bvulture\.com \bthecut\.com \bgrubstreet\.com \bnydailynews\.com \bnytimes\.com \bnewyorker\.com \bnzherald\.co\.nz \bnewslaundry\.com \bnewsweek\.com \bnpr\.org \bpeople\.com \bpewresearch\.org \bpeople-press\.org \bjournalism\.org \bpewsocialtrends\.org \bpewforum\.org \bpewinternet\.org \bpewhispanic\.org \bpewglobal\.org \bpinknews\.co\.uk \bplayboy\.com \bpolitico\.com \bpolitifact\.com \bpolygon\.com \bpropublica\.org \bqz\.com \brfa\.org \brappler\.com \breason\.com \btheregister\.co\.uk \breligionnews\.com \breuters\.com \brollingstone\.com \brottentomatoes\.com \bsciencebasedmedicine\.org \bscientificamerican\.com \bscotusblog\.com \bnews\.sky\.com \bsnopes\.com \bscmp\.com \bsplcenter\.org \bspace\.com \bspiegel\.de \bsmh\.com\.au \bthewrap\.com \btime\.com \bthetimes\.co\.uk \bthesundaytimes\.co\.uk \btimesonline\.co\.uk \btorrentfreak\.com \btvguide\.com \btvguidemagazine\.com \busnews\.com \busatoday\.com \bvanityfair\.com \bvariety\.com \bventurebeat\.com \btheverge\.com \bvogue\.com \bvoanews\.com \bvox\.com \bwsj\.com \bwashingtonpost\.com \bweeklystandard\.com \bthewire\.in \bthewirehindi\.com \bthewireurdu\.com \bwired\.com \bwired\.co\.uk \bnews\.yahoo\.com \bzdnet\.com

\bbbc\.com \bbbc\.co\.uk \bft\.com \bindependent\.co\.uk \bjamanetwork\.com \btelegraph\.co\.uk \btheguardian\.com \bthetimes\.co\.uk

I participated in that discussion, but see no reason think the consensus isn't still valid. Suffusion of Yellow (talk) 19:19, 20 May 2023 (UTC)


 * @Suffusion of Yellow I was only here as an edit request patrolling admin, the ER wasn't ready - if it's ready now, please reactivate the request to enqueue this again. — xaosflux  Talk 19:50, 20 May 2023 (UTC)
 * Well, I don't see any problems, but can't hurt to ask who probably has RSNP memorized. Does it look like I generated that list properly? Suffusion of Yellow (talk) 23:21, 20 May 2023 (UTC)
 * Minor quibble: does the /en after bdw.com actually work? I'm not exactly how the check does with the whitelist, but I imagine it works only on the domain (not the path within the host), to prevent citations such as wikipedia.org.spamsite.tld/spamspamspam.html. Certes (talk) 11:10, 21 May 2023 (UTC)
 * Oops, it doesn't: see (updated today) below. Certes (talk) 22:34, 21 May 2023 (UTC)
 * I've reactivated the request, per lack of objection. Please:
 * Add all lines from the "RSNP" list above
 * Remove all lines from the "Duplicates" list
 * Thanks. Suffusion of Yellow (talk) 20:54, 23 May 2023 (UTC)
 * ✅ Izno (talk) 23:09, 24 May 2023 (UTC)

Adding NCBI to the list
Is undeniably a source of reliable peer-reviewed journal articles and is often used in citations (eg. WP:PUBMED) - i.e. same as jstor.org, which is already on the list. 107.190.33.254 (talk) 17:08, 7 May 2020 (UTC)
 * www.ncbi.nlm.nih.gov
 * The entire nih.gov domain is already on the list - is it not working? — xaosflux  Talk 17:48, 7 May 2020 (UTC)
 * My bad; then; I only searched for "ncbi" using ctrl+f and couldn't find it. Through I could have sworn it didn't always work; maybe it was some other website as result of citation templates or maybe I was adding multiple sources. Anyway, now it works without a doubt, case closed. Thanks, 107.190.33.254 (talk) 18:19, 7 May 2020 (UTC)

Protected edit request on 14 May 2020
Remove "such as those used in cite doi." from the header and "and in Template:Cite doi" from the comment after doi.org, since Template:Cite doi was deprecated. * Pppery * it has begun... 19:35, 14 May 2020 (UTC)
 * ✅. Thanks for submitting this! —  Newslinger  talk   21:46, 14 May 2020 (UTC)

Protected edit request on 11 April 2021

 * Change every single regex entry to have  at the end. Two example lines:

I've indicated with  and   what the respective changes for these lines should be, but I think the changes should be self-explanatory.

The reason this change is necessary is because currently this whitelist also whitelists urls such as http://wikipedia.org.phishing.site.example.org/my_virus_url, just to give a blatant example of a bad url. Please do test this yourself, but from my testing on another wiki, those URLs were accepted as long as the regular expressions are not finished with a. As the page states: "Every non-blank line is a regex fragment which will only match hosts inside URLs". This means that the end of the domain name can safely be finished with a  marker, since the text that will be matched against will never contain anything after the last character in the domain name.

I'm not sure if this should be communicated to other international versions of wikipedia, but it seems relevant for you guys to change this since you are the first hit on Google when I search for the system message name ("MediaWiki:Captcha-addurl-whitelist"). Joeytje50 (talk) 17:43, 11 April 2021 (UTC)
 * I'm pretty sure this would break it to only allow, and not say  . If I'm right, what you actually want is to add a   to the end. Anomie⚔ 01:03, 12 April 2021 (UTC)
 * If the trailing slash is optional then we need something like, though I think this still allows  . Certes (talk) 10:14, 12 April 2021 (UTC)
 * The \b boundries aren't stopping that? — xaosflux  Talk 17:58, 16 April 2021 (UTC)
 * ❌ this needs more review and testing before bulk changes are made. — xaosflux  Talk 17:58, 16 April 2021 (UTC)

Some tests at test2wiki (testwiki's link handling is broken) Anything not marked (captcha) didn't get a captcha:


 * https://acm.org
 * https://acm.org.spam.site
 * https://acm.orgg.spam.site
 * https://aacm.org (captcha)
 * https://spam.site/acm.org (captcha)
 * https://acm.org/index.html
 * https://acs.org
 * https://acs.org/ (captcha)
 * https://acs.org.spam.site (captcha)
 * https://anb.org (captcha)
 * https://apa.org
 * https://apa.org/
 * https://apa.org/index.html
 * https://apa.org.spam.site (captcha)
 * https://foo.apa.org/
 * https://foo-apa.org/
 * https://bbc.com/
 * https://foo.bbc.com
 * https://foo-bbc.com/ (captcha)
 * https://bbc.com.spam.site (captcha)
 * https://dw.com/en
 * https://dw.com/spam (captcha)
 * https://dw.com (captcha)
 * https://foo-bbc.com/ (captcha)
 * https://bbc.com.spam.site (captcha)
 * https://dw.com/en
 * https://dw.com/spam (captcha)
 * https://dw.com (captcha)
 * https://dw.com (captcha)

So yes, the problem is real. It looks like the right format is  Not sure what to do here. Adding all those  seems cheap enough. But what about all those  lookbehinds? Could that cause a performance hit? Suffusion of Yellow (talk) 21:54, 21 May 2023 (UTC)


 * Even that will match https: //malicious.domain/pretending.to.be.some.good.site/virus.exe, though not https: //some.good.site:80/innocent.html. Is the whole URL matched against the pattern?  If so, we may need to parse the whole URL, starting the regexp with ^.  There's at least one whole website devoted to how to do that properly, or see page 50 of https://www.ietf.org/rfc/rfc3986.txt. Certes (talk) 23:03, 21 May 2023 (UTC)
 * No, see the https://spam.site/acm.org example above. Assuming this is the right place, the regexes are bundled together, then prefixed with . We could use the   option and supply the prefixes ourselves, but would that be even slower? Or we could do the bundling ourselves, but that would make this page as unreadable as some edit filters. Suffusion of Yellow (talk) 23:47, 21 May 2023 (UTC)
 * If the prefix  is added, then that would be an issue in MediaWiki itself, right? You would expect the prefix to require a period at the end, if there is any subdomain preceding the whitelisted domain. Otherwise I'm pretty sure almost every single wiki that has a whitelist is vulnerable to adding a link to   (demo). A simple   is not sufficient, due to the existence of the dash in domain names.
 * So regardless of this protected edit request, I'd say MediaWiki should change the prefix to  to enforce the period at the end. Let me know what you guys think about that.
 * Regarding this edit request, I'd say the testing done by Suffusion of Yellow is pretty conclusive that some changes are needed. The lookbehind is required because of the aforementioned issue with hyphens (simple  is insufficient), and the lookahead for the trailing slash or string terminator is required because otherwise   would be whitelisted as well. I haven't re-enabled the edit request template at the top, but if anyone knows what the impact would be on performance, I think this request can be re-enabled. If performance is impacted significantly, I think the aforementioned change to MediaWiki software is even more important, and if lookbehinds are impacting performance, I'd assume changing the lookbehind to   as a regular capturing group would work as well.
 * The updated edit request is now:
 * At the start of every line: →
 * At the end of every line:
 * Joeytje50 (talk) 11:49, 29 January 2024 (UTC)
 * Thanks, that looks good to me. It's hard to be sure without analysing the code which will apply the regexp, but I am hopeful that it will work without side effects. Certes (talk) 13:48, 29 January 2024 (UTC)

Protected edit request on 20 May 2023
Please add:

\btoolforge\.org

I assume this will be uncontroversial; wmflabs is already there. Suffusion of Yellow (talk) 00:22, 20 May 2023 (UTC)
 * ✅ — xaosflux  Talk 01:07, 20 May 2023 (UTC)

Protected edit request on 1 June 2023
Please add the following URLs (except for books.google.com and cnbc.com, those are auto-generated by various CS1 templates when the required IDs are passed to them; see Template:Citation Style documentation/id2):

\bapi\.semanticscholar\.org \barxiv\.org \bbiorxiv\.org \bbooks\.google\.com \bciteseerx\.ist\.psu\.edu \bcnbc\.com \bhdl\.handle\.net \blccn\.loc\.gov \bmathscinet\.ams\.org \bopenlibrary\.org \bosti\.gov \bpapers\.ssrn\.com \btools\.ietf\.org \bui\.adsabs\.harvard\.edu \bzbmath\.org 93.72.49.123 (talk) 14:50, 1 June 2023 (UTC)
 * ✅ &mdash; Martin (MSGJ · talk) 12:28, 13 June 2023 (UTC)

Protected edit request on 8 June 2024
Please Google and Bing to the list: \bgoogle\.com \bbing\.com

Since AfC submission/pending template includes links to the search engines through the find sources invocation, unconfirmed users are forced to enter captchas when submitting drafts. Unconfirmed users have a rate limit of 8 edit attempts per minute which is not much. The counter is incremented every time an edit is interrupted due to a captcha requirement, and also every time a captcha entered is incorrect. According to the metrics collected from the submission wizard, 10% of all submits fail with a rate limit error. The issue has also been reported by users: Wikipedia talk:WikiProject Articles for creation, Wikipedia talk:WikiProject Articles for creation/Submission wizard.

Links to search results don't help with SEO or otherwise have much spam potential. – SD0001  (talk) 06:39, 8 June 2024 (UTC)


 * This seems like a bad idea. General purpose commercial search engines like Google and Bing are certainly not reliable sources and shouldn't be getting linked to; change the template to fix problem with this one use case. — xaosflux  Talk 09:26, 8 June 2024 (UTC)
 * Are you saying that find sources should not link to Google or Bing? – SD0001  (talk) 11:05, 8 June 2024 (UTC)
 * Or AfC submission/pending could not transclude find sources&hellip; jlwoodwa (talk) 04:26, 9 June 2024 (UTC)
 * Yup, more along that. Improving that workflow seems like a better idea. — xaosflux  Talk 12:39, 9 June 2024 (UTC)
 * So the idea is to use the captcha system to generate friction for editors trying to add search engines? Sounds like it is too broad if it is also generating friction when using official templates such as AfC submission/pending. Not sure what the performance cost would be, but an edit filter could potentially warn against this with a better warning message and less false positives. The regex would be something like . Although I suppose this would only catch refs and not external links. Hmm. – Novem Linguae  (talk) 08:48, 9 June 2024 (UTC)


 * ❌ these are certainly not reliable sources; additional discussion needed. Reliable sources/Noticeboard is the standard venue for such discussion. — xaosflux  Talk 12:41, 9 June 2024 (UTC)