User talk:The Earwig/Archive 14

Signpost issue 4 – 29 March 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 18:49, 29 March 2018 (UTC)

Presidents and Vice Presidents of Palau
In my opinion, it would be a good idea to combine the articles President of Palau and Vice President of Palau. Both are short articles. We could redirect Vice Presidents of Palau to President of Palau. I would like your thoughts on this.Векочел (talk) 01:58, 4 April 2018 (UTC)
 * Векочел, I don't know enough about the politics or history of Palau to give an informed response. However, the subjects of the two articles are quite distinct and each one could be expanded to have unique content that wouldn't fit in the other article. Looking around, Category:Vice presidents by country is well populated and I don't see any other countries that have chosen to merge them. —  Earwig   talk 02:23, 4 April 2018 (UTC)

Bot request
Erasing copyvio detector bot is one of the best thing I have seen. Can this bot be used on mrwp? -- ✝iѵ ɛɳ  २२४० †ลℓк †๏ мэ 07:32, 10 April 2018 (UTC)
 * Tiven2240, thank you. You can use the tool anywhere, ideally, though I don't guarantee it handles other languages as well as English. However, I don't run a bot that detects copyvios and removes them automatically. There are too many incorrect identifications (the results are not reliable enough) for this to be a good idea; humans should always have the final say. —  Earwig   talk 02:05, 11 April 2018 (UTC)

EarwigBot on Template:AFC_statistics
Template:AFC_statistics hasn't been updated since yesterday. Is there something wrong with the bot? -- » Shadowowl  &#124;  talk  13:33, 18 April 2018 (UTC)
 * Shadowowl, there is an issue with lagging databases on Wikimedia Cloud Services; the data is about a day and a half old so the chart can't update. You can check on that here (that page seems to be misbehaving as well, but you can still see the lag). Anyway, it looks to be going down and should resolve itself soon. —  Earwig   talk 01:57, 19 April 2018 (UTC)

The Signpost: 26 April 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 01:51, 26 April 2018 (UTC)

Copyvio detector not working
Hello Earwig, I have a problem with the copyvio detector today: It's returning an error "An error occurred while using the search engine (Google Error: HTTP Error 403: Forbidden)." Any help would be appreciated. Thanks! — Diannaa 🍁 (talk) 12:27, 5 April 2018 (UTC)
 * Kaldari, do you have any idea? Unfortunately, I'm not seeing anything in the logs that could help diagnose—just the 403 error. — Earwig   talk  03:51, 6 April 2018 (UTC)
 * It looks like we hit the daily query limit (10,000 queries per day). Any idea why there was such a big spike today? Usually, we only get to about 5,000 queries a day. Kaldari (talk) 04:41, 6 April 2018 (UTC)
 * No idea why that would happen. It's working again today. Thanks for looking into this. — Diannaa 🍁 (talk) 10:26, 6 April 2018 (UTC)
 * This is likely related to SQLBot's AFC-Ores reports, which are using the tool. —&thinsp;JJMC89&thinsp; (T·C) 05:14, 7 April 2018 (UTC)
 * Nope, explicitly didn't use the google search functionality (ever), and in the last 24 hours rewrote to cut the amount of api pulls by 85%. SQL Query me!  05:25, 7 April 2018 (UTC)
 * Query levels seem to be back to normal today. Kaldari (talk) 05:36, 7 April 2018 (UTC)
 * So then your bot is basically just comparing articles against the external links included in the page? How useful is this? —  Earwig   talk 18:14, 7 April 2018 (UTC)
 * Seems to be pretty helpful so far. It would probably be better with google on - for sure, but I was trying to follow the 'Etiquette' section (I also use a sleep in between queries), and not consume more than my share of finite resources. And, looking at today's high score, Draft:Asli_Demirguc-Kunt - my query shows 90.2% confidence, while bypassing the cache and using google shows 89.6%. I've spot checked a lot of them, and most seem to have a similarly negligible difference. That mainly leaves articles with no links. I'm not 100% sure how I should proceed on those ones yet. SQL Query me!  01:29, 8 April 2018 (UTC)

I got the same error again late yesterday (circa 22:00 UTC) and the tool is functioning normally again this morning. Posting as information. — Diannaa 🍁 (talk) 11:49, 2 May 2018 (UTC)
 * Yes, we’ve been discussing this one over at T193559. —  Earwig   [alt]   talk 15:38, 2 May 2018 (UTC)
 * It looks like we're hitting the daily quota every 5 days exactly due to a regularly timed spike. On April 26, May 1, May 6, and May 11, there were huge spikes in Google Search API usage from Tool Forge resulting in hitting the quota and then being denied service for the rest of the day. I'm going to file a Phabricator task to investigate further. Ryan Kaldari (WMF) (talk) 20:11, 11 May 2018 (UTC)
 * Thanks Ryan. — Diannaa 🍁 (talk) 20:43, 11 May 2018 (UTC)
 * From looking at the proxy logs we were able to confirm that the traffic spike is coming from Earwig's Copyvio Detector. Earwig, could you look at the logs on your end and see if there's anything there that could be helpful in tracking it down. As I mentioned, the last spike was between 1 and 2am PST this morning. Ryan Kaldari (WMF) (talk) 22:34, 11 May 2018 (UTC)
 * Thanks for investigating. Sure, I'll see what I can find in the logs tomorrow morning (just got home, a bit tired). —  Earwig   talk 02:19, 12 May 2018 (UTC)
 * Replied at T194541. —  Earwig   talk 21:13, 12 May 2018 (UTC)

The Signpost: 24 May 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 15:17, 24 May 2018 (UTC)

Copyright detector
Tool did not detect this case per here Wikipedia_talk:WikiProject_Medicine

The page Agenesis of superior vena cava was entire copied from here https://journals.lww.com/md-journal/Fulltext/2018/06010/The_first_reported_case_of_factor_V_Leiden.1.aspx yet it missed it.

Best Doc James (talk · contribs · email) 20:19, 4 June 2018 (UTC)


 * Doc James, I took a look. In this case, the tool searches Google for the right phrases, but Google does not return that page as result. Sometimes it seems their API is not as accurate as the regular web search us humans have access to. My general advice is that the tool can't detect everything: while a hit is a good sign that a copyvio might be present, the absence of a hit certainly does not mean an article is copyvio-free. — Earwig   talk  02:14, 5 June 2018 (UTC)
 * Interesting. Thanks for the follow up. Doc James  (talk · contribs · email) 08:51, 5 June 2018 (UTC)

Women in Red tools and technical support
We are preparing a list of tools and technical support for Women in Red. I have tentatively added your name as you have provided general technical support, including tool developments. Please let me know whether you agree to be listed. You are of course welcome to make any additions or corrections.--Ipigott (talk) 07:29, 8 June 2018 (UTC)
 * Sure Ipigott, I'm happy to help and to continue maintaining things as necessary. (Though I can't promise significant new features.) —  Earwig   talk 02:18, 9 June 2018 (UTC)

Module:AfC
Notifying you of the requested move on this module, because it would affect one of 's tasks. &#123;&#123;3x&#124;p&#125;&#125;ery (talk) 21:54, 26 June 2018 (UTC)
 * Thanks, I will comment there. —  Earwig   talk 02:42, 27 June 2018 (UTC)

The Signpost: 29 June 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 01:29, 30 June 2018 (UTC)

Copyvio Bot on Punjabi Wikipedia
Hi I am a Punjabi Wikipedia admin and I think the Copyvio Bot will be great addition on Punjabi Wikipedia. Besides, running it on new articles from now, can we also run the bot on existing articles on Punjabi Wikipedia as well ? Let me know if anything else in required. --Satdeep Gill (talk • contribs 07:29, 30 June 2018 (UTC)
 * Hi Satdeep Gill. While I do have a tool to check for copyvios, I don't have a bot that does it automatically. The main reason is that checking for copyvios is slow and expensive (there is a daily limit of about 1,000 checks due to the data source we use), and there are enough false positives that I think humans should always review the results before they get shown to other people (like the article creator). See my response to a similar question here. —  Earwig   talk 14:36, 30 June 2018 (UTC)
 * I totally agree that humans should check it. What we are looking for is to have it enabled and that the tool adds a template to articles that might have copyvio. --Satdeep Gill (talk • contribs 07:43, 1 July 2018 (UTC)

Thursday July 12: Wiki Loves Pride Edit-a-thon @ Jefferson Market Library
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Sunday July 29: Annual Wiki-Picnic @ Prospect Park
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 31 July 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 23:51, 31 July 2018 (UTC)

August 29: WikiWednesday Salon and Skill-Share NYC
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 30 August 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 02:04, 30 August 2018 (UTC)

Earwig Bot!
Heya, thanks for all the things ya do! I noticed the AfC bot is on strike. Hopefully y'all can settle this labor dispute :D I was gonna tinker with the bot run setting thingy, but didn't wanna bork it. Anywho, thanks in advance! Drewmutt ( ^ᴥ^ ) talk  17:27, 7 September 2018 (UTC)
 * Thanks for letting me know, Drewmutt. I restarted him and he should be back to working now after a short delay. —  Earwig   talk 00:03, 8 September 2018 (UTC)
 * Seems it is doing something unusual at Template:AFC statistics. Curb Safe Charmer (talk) 17:01, 10 September 2018 (UTC)
 * what do you mean? —  Earwig   [alt]   talk 18:09, 10 September 2018 (UTC)
 * Yes, quite odd indeed.. here's how it looks to me.. Drewmutt ( ^ᴥ^ ) talk  19:10, 10 September 2018 (UTC)
 * That is, unfortunately, expected behavior. The backlog is large enough that the status page is too long for MediaWiki to render all of it. We need more reviewers! —  Earwig   [alt]   talk 23:15, 10 September 2018 (UTC)
 * Dang. Well, until backlog drive season, can we make it simply link to the draft as opposed to having a somewhat useless invoke tag? Not sure if this helps the issue, or if that's even feasible. Drewmutt ( ^ᴥ^ ) talk  00:01, 13 September 2018 (UTC)
 * I don't recommend that. It's not easy to tell in advance where the cutoff point is. For what it's worth, we're only losing about 15% of the page, and probably a fair bit of that are drafts that have already been declined/accepted. If you really want a list of every draft, there's always CAT:PEND. By the way, I've wanted to move the status page to Labs for a while so we don't need to deal with rendering it on-wiki, but I haven't had the time/desire to make that change yet. —  Earwig   talk 00:29, 13 September 2018 (UTC)

September 26: WikiWednesday Salon / Wikimedia NYC Annual Meeting
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvio Detector
Hi Ben; it seems that people using your Copyvio Detector are occasionally too quickly jumping to the conclusion that a Wikipedia article must have been taken from some external site when it's in fact the other way round. The text of Wikipedia articles that have been around for some time might appear on many websites, sometimes lacking appropriate attribution. So I wonder whether you might consider adding a caveat to the page of your tool - something like: "If the Wikipedia article was created some time ago, please check whether similar content on other websites might be based on the Wikipedia article before assuming a copyright violation on Wikipedia's side"? Gestumblindi (talk) 11:48, 29 September 2018 (UTC)
 * That's reasonable, Gestumblindi, I'll add something similar. —  Earwig   talk 17:37, 29 September 2018 (UTC)

The Signpost: 1 October 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 00:45, 1 October 2018 (UTC)

The Signpost: 28 October 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 19:10, 28 October 2018 (UTC)

Copyvio tool downtown
Hey Earwig, just wanted to let you know that Earwig's Copyvio Detector wasn't working for about half a day due to an issue with Google. It has been resolved and is working again. Sorry for the inconvenience. Kaldari (talk) 19:19, 31 October 2018 (UTC)
 * Got it, thanks for letting me know. —  Earwig   [alt]   talk 21:16, 31 October 2018 (UTC)

ZackBot 12
Regarding ZackBot 12, and, how do I go about getting the bot flag on that account? -- Zack mann  (Talk to me/What I been doing) 18:58, 19 November 2018 (UTC)
 * You should already have a bot flag on that account? It's been flagged since 2016. —  Earwig   talk 01:54, 20 November 2018 (UTC)
 * Hmm... How do I get my edits tagged with the bot flag then? -- Zack mann  (Talk to me/What I been doing) 01:57, 20 November 2018 (UTC)
 * Oh. You need to send a special parameter with each edit for the flag to be used. Your bot framework should have an option for it (if you’re using one). The raw API parameter is just “&bot=true” I think. —  Earwig   [alt]   talk 17:06, 20 November 2018 (UTC)
 * I tried that a while ago and got an error message that I needed to have the param assigned to my account. I'll re-investigate. :-) Thanks! -- Zack mann  (Talk to me/What I been doing) 17:40, 20 November 2018 (UTC)
 * Also, when you get a chance, would love input on Bots/Requests for approval/ZackBot 13. :-) -- Zack mann  (Talk to me/What I been doing) 20:09, 20 November 2018 (UTC)

Template:Lc and Template:Lc1 merge
I'm wondering if you can provide some background on Template:Cfd2/sandbox? CfD is now bizarrely using a monospaced version at 110% size with a hyphen instead of the normal Template:Lc. The change proposed at the sandbox seems a great idea. --Bsherr (talk) 19:25, 26 September 2018 (UTC)
 * Hi Bsherr, unfortunately, I have no recollection of that edit! It seems the change to make the text larger was done here, so you would probably want to ask Redrose64 before undoing that, but the hardcoding of monospace instead of the normal font has been in place for a long time. I'm not sure why, nor do I have a strong preference either way. —  Earwig   talk 02:15, 27 September 2018 (UTC)
 * Thanks for the advice. I'm going to propose a change to just use Template:Lc or, in the alternative, to eliminate the monospaced font in favor of increasing the kerning. I'll let you know when I post should you like to comment. --Bsherr (talk) 21:54, 28 September 2018 (UTC)
 * Done. The discussion is at Templates for discussion/Log/2018 November 23. --Bsherr (talk) 21:50, 23 November 2018 (UTC)

The Signpost: 1 December 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 04:48, 1 December 2018 (UTC)

December 19: WikiWednesday Salon and Skill-Share NYC
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 24 December 2018
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 13:36, 24 December 2018 (UTC)

A email I sent.....
— fr&thinsp;❄  18:02, 2 January 2019 (UTC)
 * Replied. —  Earwig   talk 06:58, 3 January 2019 (UTC)

Copyvios
Copyvios is currently down, the connection times out. Is this related to the new workers? Best regards, Luke081515 01:19, 19 January 2019 (UTC)

Copyvio tool
Hi Earwig, your api documentation for the tool mentions that there is a global limit for requests using the search engine of 1000. I want to continue the task merlbot did until 2016, checking all new articles in dewiki for copyvios. From the stastics I calculated that these are around 300 articles per day, so pretty much. That's why I currently implemented the function without using the search engine (I don't want to consume so much of the limit, would be bad for other users), however the tool is much more effective with the search engine. Is there a way to extend the global limit? And is there a way to include Turnitin in the api request as well? I have not found anything in the api documentation about it. P.S.: Please ping me when you reply, I mostly do not look at enwiki. Best regards, Luke081515 02:03, 13 January 2019 (UTC)
 * Unfortunately I do not control the global limit, that's set by Google. However, I think it's fine if you enable the search engine for a while as a test. We can see whether it ends up making too many requests and disable it later if so. I planned to add Turnitin to the API, but haven't gotten around to it. You can access it separately, though; the URL should look like https://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=PAGE_TITLE&lang=de&report=1 I think. —  Earwig   talk 06:06, 13 January 2019 (UTC)
 * Ok, thank you. I've now set  to  . The bot will check any new articles that are not disambig pages or redirects, and runs every 30 minutes. If it's too much, please ping me and I will disable it again. Best regards, Luke081515 14:49, 13 January 2019 (UTC)
 * Is there a way to extend the limit? I'm planning to check also big insertions into dewiki, not only page creations. I know that the limit is on googles side, and I guess making it bigger would cost a bit money. I can imagine, that wmf or wmde would support this, can you tell me who is your current contact concerning the google api at wmf? Best regards, Luke081515 00:04, 20 January 2019 (UTC)
 * That would be User:Kaldari. I’m fairly certain that there is no way to raise the limit, based on previous attempts to do so. You should try to tune down the request rate if we’re hitting it too frequently. Maybe there are some simple heuristics you can apply to ignore certain pages? —  Earwig   [alt]   talk 00:17, 20 January 2019 (UTC)

Possible copyvio tool bug report
I tried to check the page Dorothy Misener Jurney using Earwig's Copyvio Detector with its default settings, to examine URLs listed in the article. It reported about 2.0% violations. HOWEVER it didn't actually check one of the sources cited, https://shsmo.org/manuscripts/descriptions/womenmedia/essays/names/j/jurney/ If I tell it explicitly to do a URL comparison to that citation, I get a > 64% violation rate. I'm working on cleaning up the article, but I'm concerned that the URL didn't get checked initially. Mary Mark Ockerbloom (talk) 02:08, 21 January 2019 (UTC)
 * Thanks for the bug report, Mary Mark Ockerbloom. It looks like that URL is causing the tool some trouble. The first time you ran the check, that page timed out before it could return any data, which gets shown as "0%". But when you did the direct comparison, it loaded fine, showing the potential match. Unfortunately there's not much we can do about this kind of situation, though I suppose the tool could indicate that error more clearly. —  Earwig   talk 03:23, 21 January 2019 (UTC)
 * I would strongly encourage a clear and visible distinction between "0%" meaning "No copyvios found" and some other marker to indicate the page could not be examined... Thanks for your work on this tool. I've found it really useful & keep it bookmarked :-) Mary Mark Ockerbloom (talk) 03:38, 21 January 2019 (UTC)

Problem in City of Stonnington and probably other locations
You appear to have past connection with template MetlinkBus which appears to have been renamed PTVBus in 2015 by yourself - not a problem in itself. In City of Stonnington there are six bus routes which use this template, with route 734 still working OK but the other 5 routes 624, 612, 623, 767 and 822 no longer working. Earlier this week the PTV put up a new version of their website where a lot of earlier links are no longer working. The reason these routes are not working may have been caused by this or possibly the data has changed earlier as I had not looked at this article before today. Can you be of any assistance in this area? Fleet Lists (talk) 07:18, 25 January 2019 (UTC)
 * I think I have solved the problem. I will try and d\fix it and let you know how I go.Fleet Lists (talk) 07:54, 25 January 2019 (UTC)
 * I have made some changes to Module:PTVBus/data‎ which seem to have solved the problem. I found another article which has a large number of this type of error but that will need to wait until another day to fix those. I was surprised to find that that module had not had changes made to it since late 2015.Fleet Lists (talk) 08:15, 25 January 2019 (UTC)
 * OK. This was a while ago and I don’t remember the situation, so your guess is as good as mine as to what needs to be done here. Glad to hear you’ve mostly figured it out. —  Earwig   [alt]   talk 14:18, 25 January 2019 (UTC)

Rejected AFC submissions and AFC statistics
Last year a new AFC review result, "rejected", was introduced. It is more severe, more final, than "declined" in that it doesn't give the submitter a path to improve and resubmit the draft. (A random example is User:Naveengrande/sandbox.)

Now that the reject option is being used, questions are arising about when it should be used, how much it's being used, whether it's being used properly, etc.

EarwigBot shows recently rejected submissions the same way as recently declined ones on Template:AFC statistics. It would be useful if one could distinguish the rejects on that page. Perhaps EarwigBot could display them in a different section from "declined", or with "rejected" in the notes column. Is something like that an enhancement you'd be willing to make? --Worldbruce (talk) 15:37, 28 January 2019 (UTC)
 * Thanks for the suggestion and for letting me know about the new status. I added 'rejected' as a note for the declined section. It will take a while for the whole table to update, but freshly declined submissions should have it starting now. —  Earwig   talk 03:49, 29 January 2019 (UTC)

The Signpost: 31 January 2019
 * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 06:51, 31 January 2019 (UTC)

definitions.net
Hey,

You might want to look into adding definitions.net onto the Wikipedia mirror list, I've been going through Category:Articles with improper non-free content and quite a few of them, after looking at various archives, appear to be copied from Wikipedia, generating false copyvio reports.

Thanks,

 SITH   (talk)   16:30, 8 February 2019 (UTC)
 * Thanks for the suggestion. Added. —  Earwig   talk 23:23, 10 February 2019 (UTC)

Copyvio Detector
Hi, I am unable to access at Copyvio Detector. It shows some "502 Bad Gateway" and  "The server timed out". Please fix it. I think the main problem is the server speed getting slow. X ain36 ( talk ) 08:18, 16 February 2019 (UTC)
 * Please look two threads up. —  Earwig   talk 17:12, 16 February 2019 (UTC)

Copyvio Detector not working
He Ben, the copyvio detector quit working a couple hours ago, with the page failing to load but not timing out. If I leave it spin long enough it shows a 502 Bad Gateway. Any assistance you can offer to get it working again would be most appreciated. Thanks, — Diannaa 🍁 (talk) 23:05, 1 February 2019 (UTC)
 * It's working again! in fact it's zippy and full of pep. Thank you, — Diannaa 🍁 (talk) 01:04, 2 February 2019 (UTC)
 * Well, I see some bizarre errors in the log that I've never seen before, like we're running out of memory. I'll see if I can defend against this for the future. —  Earwig   talk 01:16, 2 February 2019 (UTC)
 * Hi Ben, the copyvio detector is not working. I'm not sure how long it's been down; it failed to load on my first attempt to use it this morning and it's been down for at least half an hour. Any assistance would be appreciated. Thanks, — Diannaa 🍁 (talk) 13:14, 13 February 2019 (UTC)
 * I kicked it, think it's OK now. This looks like the same issue as before. Didn't have a chance to investigate then, but I'll try to do it later when I have some free time. —  Earwig   talk 13:42, 13 February 2019 (UTC)
 * Thanks so much Ben. I don't know how I ever got along without this tool, so helpful for copyright cleanup. — Diannaa 🍁 (talk) 13:49, 13 February 2019 (UTC)
 * Hi Ben. The page is once again failing to load :/ Could you please take a look? Thanks, — Diannaa 🍁 (talk) 02:36, 15 February 2019 (UTC)
 * It looks like the bot is running - do you just mean the webpage? — xaosflux  Talk 02:50, 15 February 2019 (UTC)
 * There's two different tools. The reason I posted here is because Earwig's copyvio detector tool is not working. It spins for a while and then produces a 502 Bad Gateway. Eran's CopyPatrol is also failing to load; the last time I was able to use the page properly was at around 03:02 UTC. — Diannaa 🍁 (talk) 03:45, 15 February 2019 (UTC)
 * This time it’s definitely not my fault! Toolforge has been experiencing an unlikely combination of issues that would bring down most tools using a database for anything. That’s presumably why CopyPatrol was affected too. I’m not sure when things will fully stabilize. I will kick it in a little bit, but I don’t know how long that will last. —  Earwig   [alt]   talk 12:40, 15 February 2019 (UTC)
 * Thanks. I have some cases that will be impossible to solve without your tool, and not having it triples the time it takes to do the checks, so anything you can do to keep it working in the interim would be appreciated. — Diannaa 🍁 (talk) 14:39, 15 February 2019 (UTC)

Just following up. Unfortunately, things on Labs are in even worse shape now, and there doesn't seem to be anything I can do to fix it myself. Will continue to keep an eye out, but I think I just have to wait for now. —  Earwig   talk 03:31, 16 February 2019 (UTC)
 * Just a "thanks" for writing and supporting this tool. I turned to it today for a DYK check ... hope it's back soon! ☆ Bri (talk) 17:17, 16 February 2019 (UTC)
 * Update: The issues will likely not be resolved until Tuesday at the earliest. — Diannaa 🍁 (talk) 17:34, 16 February 2019 (UTC)
 * Well, I rewrote the tool to remove the dependency on the broken part of Toolforge. We seem to be OK for now. Since I'm not sure how this change will affect performance in general, I will continue to monitor things throughout the day. —  Earwig   talk 19:26, 16 February 2019 (UTC)

Forbidden error on earwig
Hi, I keep getting:

An error occurred while using the search engine (Google Error: HTTP Error 403: Forbidden). Try reloading the page. If the error persists, repeat the check without using the search engine.

When using Earwig's copyvio tool.

Any advice, RhinosF1(chat) (status)(contribs) 21:48, 24 February 2019 (UTC)
 * There's a daily limit on the number of searches with Google that was exceeded. It will reset at midnight. —  Earwig   talk 22:04, 24 February 2019 (UTC)
 * Thanks, RhinosF1(chat) (status)(contribs) 22:12, 24 February 2019 (UTC)
 * RhinosF1, I think it's at Midnight Pacific time, where Google's servers are located. — Diannaa 🍁 (talk) 00:55, 25 February 2019 (UTC)

Quote Box
Have only used this tool recently and it seems great. Can I comment it does not seem to identify content within Template:Quote box in the article compare pane giving an increased risk of false positives unless the article is checked. If it is not possible to do this would it be advisable to indicate to users they need to manually check this? Thank you. Djm-leighpark (talk) 18:27, 25 February 2019 (UTC)
 * That's strange, because I thought it did look inside quote boxes. Do you have an example page? I tried in my sandbox and it seems to work. —  Earwig   talk 02:28, 26 February 2019 (UTC)
 * The 18:48 version of this page ... to be absolutely clear it matches the text the the quote in red however in the left compare pane the user (ie person runnning the tool) cannot see that it is inside a quote (without looking at the article). Issue is with the quote One of my proudest moments ... Amererica (by P. R. Brown) not being easily identifiable in a quote in the left hand pane.  Hope it makes sense what I am trying to say.  Thankyou.Djm-leighpark (talk) 03:21, 26 February 2019 (UTC)
 * Oh, I see, you're saying that the text inside the quote box is not identified as being part of a quote. That's true. I think this falls under the general disclaimer that all results from the tool need to be manually reviewed. False positives can also come from inline quotes in the article text as well as things like book titles and long proper nouns, and detecting these would be difficult. —  Earwig   talk 03:48, 26 February 2019 (UTC)
 * That's fair enough. I do wonder if the emphasis on the tool initiation page of Be aware that other websites can copy from Wikipedia, so check the results carefully, especially for older or well-developed articles without mention to do a manual check of the results for quotes can be misleading ... perhaps especially with articles such as Dead to the World Tour and this source.  Its just a thought from a user.  One other though would be to change the submit button from active from once the tool is launched ... I've now got used to looking for the spinning working icon from the chrome browser but an active looking Submit button holds my eye and I am so tempted to press it again!  Just of couple of thoughts.  Thankyou.   [[User:Djm-leighpark|Djm-leighpark] (talk) 04:19, 26 February 2019 (UTC)
 * Those are reasonable suggestions, thank you. I'll see what I can do. —  Earwig   talk 02:05, 27 February 2019 (UTC)

Video tutorial regarding Wikipedia referencing with VisualEditor
Hi, I have received a grant from WMF to support production of a video tutorial regarding creating references with VisualEditor. I anticipate that the video will be published in March 2019. If this tutorial is well received then I may produce additional tutorials in the future for English Wikipedia and possibly other projects such as Commons and Spanish Wikipedia. If you would like to receive notifications on your talk page when drafts and finished products from this project are ready for review, then please sign up for the project newsletter.

Regards, --Pine</b><sup style="color:#01796F">✉ 00:30, 28 February 2019 (UTC)

The Signpost: 28 February 2019
<div class="hlist" style="margin-top:10px; font-size:90%; padding-left:5px; font-family:Georgia, Palatino, Palatino Linotype, Times, Times New Roman, serif;"> * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 11:16, 28 February 2019 (UTC)

Project Tagging based on Category
Hi. I know that quite a few pages that should be tagged with the Children's Lit WikiProject banner lack them. I was wonder if articles lacking the project banner in the following two categories (inclusive) could be tagged: Category:Children's literature and Category:Young adult novels? Best, Barkeep49 (talk) 02:00, 28 December 2018 (UTC)
 * From a cursory look, this should be possible. I'll let you know when I start/finish the task, or if I have any questions before I start, probably within the next couple days. — Earwig   talk  03:11, 28 December 2018 (UTC)
 * Just checking in on this. Thanks and Best, Barkeep49 (talk) 02:19, 13 January 2019 (UTC)
 * Apologies for the delay, I had to do some work to migrate the bot to a new backend on Toolforge. I'll try to start this when I come home from work tomorrow. — Earwig   talk  07:45, 14 January 2019 (UTC)
 * Here's the full list of categories the bot will process (all subcategories recursively of those two you mentioned): User:The Earwig/Sandbox/Children's Lit. Can you help me look through this and remove anything that doesn't belong? It seems mostly OK, but there are some things I imagine we don't want to tag, like anything including "video game"... — Earwig   talk  07:56, 15 January 2019 (UTC)
 * I chopped a few hundred from the list - the project has generally covered derivative properties to some extent and so when that connection felt strong I left it but when it got too faraway from the original book (or if it was not a literary property to begin with), I removed it. I also removed many of the comic/manga categories as only a smaller percentage of those would be covered in our scope - its intended audience would have to be children or young adults which is not the case for a substantial percentage of comics/manga. Let me know if you have any other questions and thank you for your ongoing help with this. Best, Barkeep49 (talk) 18:04, 15 January 2019 (UTC)
 * Excellent, that's exactly what I needed. The bot is running now. — Earwig   talk  03:18, 16 January 2019 (UTC)
 * Thanks. I'm abashed to admit I already knew this because an article on my watchlist got the banner... Thanks for all your assistance. Best, Barkeep49 (talk) 05:49, 16 January 2019 (UTC)
 * I paused the task until I get home and can look a bit more carefully. I see we’ve been tagging films based on children’s books (see the bot’s recent contribs); I understand the consideration for derivative works, but do you think the relationships are clear enough in general to tag automatically? — Earwig [alt]   talk  22:34, 16 January 2019 (UTC)

Hello! I'm curious as to why Don Paterson has been tagged with the Children's Literature project banner. I don't associate him with children's literature, and nothing in the article or its categories seems to support this. Am I missing something obvious? --Deskford (talk) 20:41, 16 January 2019 (UTC)
 * The connection comes from the Costa Book Awards; he is in the category of winners, which is in a category of children’s literary awards. This is an incorrect relationship, as the CBA does not look exclusive to children’s literature. I’ll corrrect this when I get home. — Earwig [alt]   talk  21:41, 16 January 2019 (UTC)
 * Ah, that makes sense. Thanks! --Deskford (talk) 21:54, 16 January 2019 (UTC)
 * I've recently reverted EarwigBot's edits to Talk:Tommen Baratheon, Talk:Arya Stark, Talk:Bran Stark, and Talk:Rickon Stark, edits that added and WikiProject Children's Literature banner to the talk page. While the characters are children, A Song of Ice and Fire is definitely not children's literature, so I'm wondering why this happened. -- T<small style="font-size:60%;">ed E<small style="font-size:60%;">dwards  21:19, 16 January 2019 (UTC)
 * Thank you for pointing that out. This is coming from Category:Child characters in literature, which is in Category:Children's literature, a clearly incorrect relationship. We’ll fix this. — Earwig [alt]   talk  21:41, 16 January 2019 (UTC)
 * Earwig anything I can do to be of assistance at this point? Best, Barkeep49 (talk) 02:06, 17 January 2019 (UTC)
 * See my comment above in case it got lost; I think we should be a little more careful with the categories that pertain to derivative works like films. While some of those works might be in scope, there's a high enough false-positive rate that I don't think a bot determination is safe. If we pare down the list a bit more, I'll feel more comfortable restarting the task. I can also have the bot revert its taggings for certain categories that we decide were mistakes (like a couple of the ones mentioned above)—this has happened before, so I'm somewhat used to it and it's not a problem. — Earwig   talk  03:25, 17 January 2019 (UTC)
 * Just an update that this newsletter has been requested to go out and so hopefully I'll be able to get some help with this update soon. Best wishes, Barkeep49 (talk) 18:17, 13 February 2019 (UTC)

So, I finished going through the bot's tagging and have reverted what I consider mistagged (by category, primarily non-written works or people/books with only dubious connections to children). This leaves about 4000 of the original 5000 taggings (for the first half of the category list). While idly spot-checking afterwards, I found unreverted yet questionable examples like Rush Limbaugh and Laura Bush that came from a category I hadn't thought to re-check: American children's writers. The problem is that often cats are used for non-defining classification, which isn't necessarily unreasonable—those people have published books for children—but I think you would agree that they aren't well known enough for that to place them within the project's scope? Maybe I am wrong, but it's enough that I'm nervous to rerun the bot, even on the new doubly reduced list. Hmm... —  Earwig   talk 04:47, 4 March 2019 (UTC)
 * I would agree we should have Rush Limbaugh and Laura Bush tagged and the issue of people who've sometimes written for children but not always (e.g. Gaiman) certainly caused concern the first time through. Where does that leave things then? Best, Barkeep49 (talk) 04:53, 4 March 2019 (UTC)
 * I'm not sure. Some cats in the list should definitely be fine, if they exclusively contain in-scope works of literature, like Polish children's novels. I don't have a problem running the bot on these. In contrast, I don't feel comfortable running "Works based on"-type categories because these are often in other genres and only tenuously related (and the pages that are in-scope usually fall under another category anyway), so I'll probably remove these. Unfortunately that still leaves about 2/3 of the list. I'm not sure what to do with articles about people, which is a large number of them. I'm wondering if there is a reliable semi-automated test to decide whether a person is in-scope? I'm thinking of looking to see whether the article lead mentions "children", but I'm not sure how well this will work. —  Earwig   talk 05:05, 4 March 2019 (UTC)
 * For the categories which are troublesome are you able to just have the bot log where it would tag? I would then go through and remove the big red flags. In spot checking the first 50 A's in that category the hit rate was very high (only possible question marks would be Britt Allcroft E.J. Altbacker and Aubrey Ankrum and no clear cut nos like Limbaugh or Bush). Now that's for everyone so it includes people already tagged. Presumably the error rate for untagged people would be higher but in an essential category like American children's writers I really am wondering if it would be within a margin the project would find OK, especially as they will get rated (most of the activity that happens on the project is article assessment at the moment). Best, Barkeep49 (talk) 05:24, 4 March 2019 (UTC)
 * I can definitely do that. I'll follow up over the next day or so. —  Earwig   talk 05:26, 4 March 2019 (UTC)
 * Sorry that took so long, Barkeep49. I updated User:The Earwig/Sandbox/Children's Lit with the full list of untagged/unprocessed pages after running the bot through another 50 categories. —  Earwig   talk 07:49, 17 March 2019 (UTC)

Nomination for deletion of Template:List of crambid genera
Template:List of crambid genera has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page.  Zack mann  (Talk to me/What I been doing) 21:33, 19 March 2019 (UTC)

The Signpost: 31 March 2019
<div style="margin-top:10px; font-size:90%; padding-left:5px; font-family:Georgia, Palatino, Palatino Linotype, Times, Times New Roman, serif;"> News, reports and features from the English Wikipedia's weekly journal about Wikipedia and Wikimedia <div style="margin-top:10px; font-size:90%; padding-left:5px; font-family:Georgia, Palatino, Palatino Linotype, Times, Times New Roman, serif;">Read this Signpost in full · Single-page · Unsubscribe · Global message delivery 15:41, 31 March 2019 (UTC)
 * From the editors: Getting serious about humor
 * News and notes: Blackouts fail to stop EU Copyright Directive
 * In the media: Women's history month
 * Discussion report: Portal debates continue, Prespa agreement aftermath, WMF seeks a rebranding
 * Featured content: Out of this world
 * Arbitration report: The Tides of March at ARBCOM
 * Traffic report: Exultations and tribulations
 * Technology report: New section suggestions and sitewide styles
 * News from the WMF: The WMF's take on the new EU Copyright Directive
 * Recent research: Barnstar-like awards increase new editor retention
 * From the archives: Esperanza organization disbanded after deletion discussion
 * Humour: The Epistolary of Arthur 37
 * Op-Ed: Pro and Con: Has gun violence been improperly excluded from gun articles?
 * In focus: The Wikipedia SourceWatch
 * Special report: Wiki Loves (50 Years of) Pride
 * Community view: Wikipedia's response to the New Zealand mosque shootings

EarwigBot not working
It hasn't edited for 3 days (I noticed it wasn't working when task 3 (creating AfC categories) wasn't running). Just wanted to let you know in case you weren't already aware. Thanks, --DannyS712 (talk) 04:15, 14 April 2019 (UTC)
 * Thanks for letting me know. It should be back up now, and I think I've fixed the auto-restart so this should be prevented in the future. —  Earwig   talk 06:19, 14 April 2019 (UTC)
 * It still hasn't edited yet... --DannyS712 (talk) 06:20, 14 April 2019 (UTC)
 * It's not supposed to yet. The AFC status page gets updated hourly, and the category creation runs nightly at 00:00 UTC. —  Earwig   talk 06:41, 14 April 2019 (UTC)
 * Oh, okay. --DannyS712 (talk) 06:43, 14 April 2019 (UTC)

The Signpost: 30 April 2019
<div class="hlist" style="margin-top:10px; font-size:90%; padding-left:5px; font-family:Georgia, Palatino, Palatino Linotype, Times, Times New Roman, serif;"> * Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 17:37, 30 April 2019 (UTC)

ArbCom 2019 special circular
<div class="notice" style="background:#fff1d2; border:1px solid #886644; padding:0.5em; margin:0.5em auto; min-height:40px; line-height:130.7%; font-weight: 130.7%;"> <span style="color:#5871C6;cursor:pointer" class="mw-customtoggle-ArbCom_2019_special_circular"> <div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-ArbCom_2019_special_circular" style="display:none"> <div style="border-style: dotted; border-color: #886644; border-width: 0 3px 3px 3px; padding: 0 0.5em 0.5em 0.5em;">

This message was sent to all administrators following a recent motion. Thank you for your attention. For the Arbitration Committee, Cameron11598 02:49, 4 May 2019 (UTC)

Administrator account security (Correction to Arbcom 2019 special circular)
ArbCom would like to apologise and correct our previous mass message in light of the response from the community.

Since November 2018, six administrator accounts have been compromised and temporarily desysopped. In an effort to help improve account security, our intention was to remind administrators of existing policies on account security — that they are required to "have strong passwords and follow appropriate personal security practices." We have updated our procedures to ensure that we enforce these policies more strictly in the future. The policies themselves have not changed. In particular, two-factor authentication remains an optional means of adding extra security to your account. The choice not to enable 2FA will not be considered when deciding to restore sysop privileges to administrator accounts that were compromised.

We are sorry for the wording of our previous message, which did not accurately convey this, and deeply regret the tone in which it was delivered.

For the Arbitration Committee, -Cameron11598 21:04, 4 May 2019 (UTC)

Question about copyvio detector functioning
Howdy - I just happened upon some startling behaviour in the copyvio detector, and wanted to ask whether this is a known thing or a fluke. Draft:Nathaniel Bartlett comes out squeaky-clean, but when running the tool on the identical draft once it was moved to mainspace, it finds the full-page copyvio. The difference here must to be the AfC header, I guess... is that known behaviour? If the AfC header has this capacity to throw off copyvio detection, maybe it would be worth thinking about a function to strip it from an article before comparison? After all, AfC is probably one of the heaviest users of the tool - bit of a scary scenario. Cheers -- Elmidae (talk · contribs) 22:01, 9 May 2019 (UTC)
 * Hi Elmidae. The AfC header does not make a difference here—we already strip out templates from the article text before we start looking for matches. (The exact article text you see on the results page is what we try to find copies of, and in this case, you can see that neither include the template.) However, there is another difference: the one were we missed the violation has its categories rendered as normal wikilinks (prefixed with colons), and this makes them show up in the article text when normally they wouldn't. Because of an unlucky sequence of events, this is enough for us to fail to find the correct source. If you're interested in a more detailed explanation why, I've written up one below, but the main takeaway should be that this kind of outcome is always a risk because of how the tool works, but in general it should be uncommon enough that the tool remains useful.
 * For the full explanation, I'll need to go into a bit of detail about how the tool finds possible sources. The problem it's trying to solve is that we have a large string of text and we need to query a search engine with that text to search for exact (or very close) matches. We can't paste the entire article into Google, because Google doesn't accept strings that large and it would miss cases where sentences are added or rearranged. Instead, we divide the article into chunks of text (about sentence length, 10-20 words), and search for each chunk independently, the idea being that at least one of them should be a near-verbatim copy of the plagiarized source (if one exists) and will give a hit. But the problem is that we can't search for every single chunk in a particular article, because an article might have hundreds of sentence-sized chunks of text, and Google limits the number of searches we can make per day, so we can only make up to 8 searches per article. This means we have the task of selecting about 8 representative sentences from throughout the article in hopes that at least one of them will contain the violation, if one is present. (We do this by picking a sentence from the start of the article, then the end, then the middle, then around the 25% mark, and so on, until we run out of text or reach 8 chunks.) For articles that are heavily copied, the odds of this working out are quite good, but we sometimes get very unlucky, like we did here. Because those wikilinks added text to the end of the article, our algorithm ended up picking 8 chunks for which not a single one returned the correct match in Google. (If you're curious what it searched for, I've reproduced below.)

Violation found:


 * 1) Nathaniel Bartlett (April 22, 1727 – January 11, 1810), pastor of the Congregational Church of Redding, Connecticut during the
 * 2) The History of Redding, Connecticut Puritan Protagonist- President Thomas Clap of Yale College University of North Carolina
 * 3) The Bartlett family, however, was firmly united in support of the American cause.
 * 4) Jonathan Bartlett (1764–1858) served as co-pastor with his father for a few years, but resigned due to ill health prior to his
 * 5) (Russell Bartlett was living in Cooperstown, Otsego County, New York, and Daniel Collins Bartlett was living in Amenia, Dutchess
 * 6) 1753-1810, was one of the numerous Colonial American clergymen who played an active role during the American Revolution.
 * 7) of Kraus-Thompson Organization Ltd.
 * 8) animosity between neighbors in so small a community, and no doubt many families experienced divided loyalties as well.

Violation missed:


 * 1) Nathaniel Bartlett (April 22, 1727 – January 11, 1810), pastor of the Congregational Church of Redding, Connecticut during the
 * 2) the American Revolution :Category:American Revolution chaplains
 * 3) In addition to verbal assaults on the enemy, Bartlett supported the war effort by officiating as Military Chaplain to
 * 4) The Rev.
 * 5) Upon his death in 1858, the Rev.
 * 6) 1753-1810, was one of the numerous Colonial American clergymen who played an active role during the American Revolution.
 * 7) Congregationalist ministers :Category:People from Guilford, Connecticut :Category:Yale Divinity School alumni :Category:Clergy
 * 8) Army General Israel Putnam's Division during their encampment in Redding the winter of 1778/79.


 * Thinking about this further, I believe the tool should have stripped out the disabled category links as well, because despite being "article text", they should not appear in any sources. This is something I can add in the future. However, it's important to keep in mind that because of how the chunking logic works, as long as we don't search for every chunk, and we can't, there's always a chance that we could miss the violation. Something to keep in mind. Thanks. —  Earwig   talk 04:04, 10 May 2019 (UTC)
 * Thank you, that's both informative and interesting! So in essence, it's a bit of potluck of whether a given selection of chunks contains detectable material; and a random frame-shift mutation (e.g. by adding a few lines of category text) may result in a selection that registers entirely differently. That's heuristics for you, I guess :) Cheers -- Elmidae (talk · contribs) 14:08, 10 May 2019 (UTC)