User talk:JL-Bot/Archive 4

TAR / Questionable & Unicode
Perl is deprecating the use of strings with code points over 0xFF in XOR. This impacts the WP:JCW/TAR and WP:JCW/CRAP matching logic for some citations. I have changed the logic to "de-Unicode" characters prior to doing the string matching. This change is in the results just uploaded for both. In general, it seems to be working for the better, but there are now some additional false positives. For example in TAR3, #240 is now picking up "Палеонтологический Журнал" which is the Russian name, but #299 is now picking up "КМЕ" which is a false positive (while it is only a 1 letter difference from NME, the Cyrillic K was causing the original XOR to miss it). There could also be some unintended oddities. If anyone sees anything weird, please let me know. Thanks. -- JLaTondre (talk) 22:20, 24 August 2018 (UTC)


 * Yes, it seems to be working better. Very good at catching Russian/Cyrillic names that are actual matches with journals. False positives are rare and easy to exclude. Headbomb {t · c · p · b} 23:45, 24 August 2018 (UTC)

tld|JCW-include
Could the bot support this?

The idea would be that instead of declaring something like

and hope nothing creeps up in the future, we could just have

Headbomb {t · c · p · b} 00:33, 25 August 2018 (UTC)
 * So in TAR, exclude anything that doesn't match the include? Yes, that would be doable (though it will miss future typos). Did you want to support both? That would take a little more work. -- JLaTondre (talk) 13:34, 28 August 2018 (UTC)


 * It would be supporting both. The idea is that tings like ABA Journal in WP:JCW/Target10 have like 20 exclusions to setup, and as soon as some other 3-letter organization creates journal it'll also be picked up. It should still pickup redirects as normal though (tweaked the template to make the intent clear), so if something like American Bar Association (1935-) is created and redirects to ABA Journal, we'd see that. It would just be telling the bot "don't bother looking for variants". Headbomb {t · c · p · b} 14:08, 28 August 2018 (UTC)


 * Though if you have only time to work on either grouping for WP:CRAPWATCH or this, grouping has a higher priority. Headbomb {t · c · p · b} 14:16, 28 August 2018 (UTC)


 * Actually put this one on pause for a little while. I need to put my thinking hat on. Headbomb {t · c · p · b} 14:18, 28 August 2018 (UTC)

JCW target matching
Lots of journals are named something like If you find  at the end of something, then consider that equivalent to the same string without The Official Whatever.
 * Foobar: Official Journal of the Blah Society
 * Foobar: The Official Journal of Blah Society
 * Foobar: Official Organ of the Blah Society
 * Foobar: The Official Journal of thr Blah Society

The idea is that Official Journal of the European Union and The Official Journal of the International Hepato Pancreato Biliary Association are legit, but Obesity Reviews : an Official Journal of the International Association for the Study of Obesity = Obesity Reviews. Headbomb {t · c · p · b} 00:32, 19 August 2018 (UTC)


 * To confirm, this is for the initial processing (i.e. WP:JCW/ALPHA) and not just the TAR processing? -- JLaTondre (talk) 20:23, 19 August 2018 (UTC)
 * This would be for TAR and CRAP. Headbomb {t · c · p · b} 00:07, 20 August 2018 (UTC)
 * Implemented. Updated TAR has been uploaded so you can check it out. -- JLaTondre (talk) 00:00, 1 September 2018 (UTC)


 * Great to hear! I'll take a look and get back to you! Headbomb {t · c · p · b} 00:08, 1 September 2018 (UTC)


 * Seems to work bang on. Next up, grouping! Headbomb {t · c · p · b} 01:29, 1 September 2018 (UTC)

Fail to pickup an entry?
On WP:JCW/Target8, we have
 * Bulletin of the Atomic Scientists
 * The Bulletin of the Atomic Scientists

However, it failed to pick up for some reason. Headbomb {t · c · p · b} 18:01, 19 September 2018 (UTC)
 * Bulletin of the Atomic Scientist
 * Looking at B36 shows:
 * The spaces on the first one are the issues as from the software's perspective, they are different targets. It's also why it's not properly resolving the title type. Looking at Bulletin of the Atomic Scientist shows the page is:
 * The bot is not properly handling spaces within a redirect. That's an easy fix. I made the change and will re-run to verify it. Unfortunately, it's in the dump parsing which is the first and longest step... -- JLaTondre (talk) 00:13, 20 September 2018 (UTC)
 * Fixed, updated results saving now. -- JLaTondre (talk) 22:19, 20 September 2018 (UTC)
 * Yup. Not a whole lot of entries changed, but it makes a big difference on selected journals (e.g. Acta Crystallographica went from #206 to #154). Movement in WP:JCW/Target9/WP:JCW/Target10 is mostly due to better exclusions. Thanks. Very much looking forward to the next dump and last few bits of polish on WP:JCW/CRAP. Headbomb {t · c · p · b} 00:58, 21 September 2018 (UTC)
 * The bot is not properly handling spaces within a redirect. That's an easy fix. I made the change and will re-run to verify it. Unfortunately, it's in the dump parsing which is the first and longest step... -- JLaTondre (talk) 00:13, 20 September 2018 (UTC)
 * Fixed, updated results saving now. -- JLaTondre (talk) 22:19, 20 September 2018 (UTC)
 * Yup. Not a whole lot of entries changed, but it makes a big difference on selected journals (e.g. Acta Crystallographica went from #206 to #154). Movement in WP:JCW/Target9/WP:JCW/Target10 is mostly due to better exclusions. Thanks. Very much looking forward to the next dump and last few bits of polish on WP:JCW/CRAP. Headbomb {t · c · p · b} 00:58, 21 September 2018 (UTC)

Tweak to JL-Bot?
Thinking out loud here, when it comes to certain things, e.g. half a million redirects to OMICS Publishing Group, things don't quite rise up to the level of being featured on WP:JCW/TAR, but it's still be useful to have a centralized page to have an idea of what's linked, and what's typoed, and all that.

I'm thinking of having a sort of WP:JCW/CRAP, where the bot compiles things as it would in WP:JCW/TAR, but for specific targets (declared at User:JL-Bot/Questionable.cfg). What's possible here? Headbomb {t · c · p · b} 06:49, 3 August 2018 (UTC)
 * Yes, generating common targets based on a list would be an easy extension. I don't understand the intention of the TARGET2+ stuff? Thanks. -- JLaTondre (talk) 12:27, 4 August 2018 (UTC)

Basically it's a manual way of creating 'groups' of targets. For instance

would be shorthand for

+
 * Anything that redirects to any of those.
 * Typos and variants

For the simpler case of

This would look something like

Headbomb {t · c · p · b} 15:43, 4 August 2018 (UTC)
 * So the first parameter will always be a page? But the second+ parameters can be pages or categories? -- JLaTondre (talk) 13:48, 5 August 2018 (UTC)
 * Well I suppose in theory it could be a category, but I can't really conceive of case where, in practice, you'd have a category without a main article. Headbomb {t · c · p · b} 15:50, 5 August 2018 (UTC)

Break
First cut here. It does not have the hierarchy in the Entries column. How important is that? The existing TAR logic doesn't easily lend itself to that. If needed, I'll figure out away to squeeze it in, but will take longer. Other than that, let me know if it is matching what you were looking for. If it does, I'll integrate it into the normal bot running. -- JLaTondre (talk) 23:09, 5 August 2018 (UTC)

Do you mean the alphabetical order in "Target"? Not very important. In the second column, it'd be very nice to have a sort the bulleted hierarchy
 * Link 1
 * Link 1 redirect
 * Link 1 typo
 * Link 2
 * Link 2 redirect
 * Link 2 typo

Typos and redirects being omitted if they aren't used, but the direct links would be included even if they aren't used. Headbomb {t · c · p · b} 00:14, 6 August 2018 (UTC)

Looking at the first cut, a few things. First this.

But also entries 10/23/34 (Frontiers in Psychology/Frontiers in Plant Science/Frontiers in Endocrinology) should have been grouped with the first entry Frontiers Media, as declared in

See Mockup (entries #1, #2 and #4). For entry #2, I only merged Abstract and Applied Analysis, Advances in High Energy Physics and BioMed Research International, but you can imagine the other journals of Category:Hindawi Publishing academic journals being done the same way. Headbomb {t · c · p · b} 00:24, 6 August 2018 (UTC)


 * It should also pickup redlinks + typos of those redlinks. E.g.  should report Asian Journal of Chemistry (7) (per WP:JCW/A66), and matches for things like Asan Journal of Chemistry if they exist. Headbomb {t · c · p · b} 01:11, 6 August 2018 (UTC)
 * By hierarchy, I meant the grouping. The existing TAR code isn't set up to handle that so will take me a bit to get that in there. I'll check into the missing page(s). -- JLaTondre (talk) 20:26, 6 August 2018 (UTC)
 * For the Asian Journal of Chemistry, it doesn't have a target (target of &mdash; on A66). Therefore, when searching for common targets, it doesn't return anything. I'm thinking that instead of matching on targets (like TAR does), this should really match on display values. If the display value has a target, then it would search for other pages with the same target. Does that make sense? Or am I missing something? -- JLaTondre (talk) 20:53, 7 August 2018 (UTC)


 * Well for declared targets that are redlinks, the target is the redlink itself. So  means match  +variants. Structure that however makes sense, code-wise. Headbomb {t · c · p · b} 21:02, 7 August 2018 (UTC)

You could also feed Asian Journal of Chemistry in to get additional variants (Asian J. Chem./Asian J Chem and typos of said variants (Asian J-Chem). Headbomb {t · c · p · b} 21:08, 7 August 2018 (UTC)
 * would your site be handle to handle this extra load without chocking on itself? Headbomb {t · c · p · b} 21:09, 7 August 2018 (UTC)

Selected format
I would like to limit the format of User:JL-Bot/Questionable.cfg to not allow "|" within the notes or source fields (i.e. it is only used to separate the parameters of the JCW-selected template). See this change. That significantly reduces the complexity of parsing that template and reduces the chances for error. -- JLaTondre (talk) 19:30, 12 August 2018 (UTC)
 * sure, make the changes you want, I'm at a family event for the next few hours. In the long run, it will limit what we can put in note/source (e.g. other templates) tough, but that shouldn't be too much of a big deal. At least for now. Headbomb {t · c · p · b} 19:38, 12 August 2018 (UTC)
 * After reflection, there is an easy way to handle it. As long as the notes & sources fields are always at the end (i.e. the bot will ignore anything after whichever one comes first), we will be good. The benefits of sleeping on it. ;-) Thanks. -- JLaTondre (talk) 22:51, 14 August 2018 (UTC)


 * Sure that works! Do you have an ETA for a prototype? Headbomb {t · c · p · b} 23:33, 14 August 2018 (UTC)

Second Version
I uploaded the next version. It should be catching the cases like "Asian Journal of Chemistry" now. I still need to work on the grouping and having the bot save it to the wiki with paging. I also haven't done anything with LTWA abbreviations. I will continue working on it, but progress will be a bit slow as have some other things occupying me also. -- JLaTondre (talk) 00:26, 17 August 2018 (UTC)
 * That's fine, we can do a lot with bad grouping and no-LTWA lookups even if it's not 100% ideal. Looking forward to both these things being implemented though! (Of the two, grouping would be the most beneficial I'd argue). Headbomb {t · c · p · b} 01:13, 17 August 2018 (UTC)


 * btw, even if you haven't done any code improvements, a new upload of WP:CRAPWATCH would still be very useful. Headbomb {t · c · p · b} 14:28, 23 August 2018 (UTC)
 * I uploaded a new version. -- JLaTondre (talk) 21:59, 24 August 2018 (UTC)

Grouping
Ran into the following situation: For, most of the pages within the category redirect to Allied Academies so that is already their target. So how do you want this represented in the table? Just listed under Allied Academies? Listed under Allied Academies and also as it's own? In other words, this: Or this: I would assume the first would be sufficient, but the second is if I take the original description literally. -- JLaTondre (talk) 15:23, 1 September 2018 (UTC)

The first, yes. The hierarchy for the 'entries' column would be

Headbomb {t · c · p · b} 15:36, 1 September 2018 (UTC)

Grouping, take 1
First version of grouping has been posted based on the 20180901 dump. It's not perfect - the target is always getting listed on the entries side even if it has no citations. I'll need to look into that as well as integrating it into the main bot run so the bot uploads the results with pagination. -- JLaTondre (talk) 22:47, 3 September 2018 (UTC)

Looks good. It'll need a few refinements, but it's on the right track. In general, something like the current
 * International Journal of Environmental Research and Public Health
 * Int J Environ Res Public Health (26 in 24)
 * Int. J. Environ. Res. Publ. Health (1 in 1: 1)
 * Int. J. Environ. Res. Public Health (7 in 7)
 * International Journal of Environmental Research and Public Health (170 in 155)

should be instead (e.g. if you have hits on the "first level", put the numbers on the "first level") and can just be This, however (no hits on the first level, but hits on the second level) is correct, since we can see the underlying redirect structure. Whereas
 * International Journal of Environmental Research and Public Health (170 in 155)
 * Int J Environ Res Public Health (26 in 24)
 * Int. J. Environ. Res. Publ. Health (1 in 1: 1)
 * Int. J. Environ. Res. Public Health (7 in 7)
 * Algorithms (journal)
 * Algorithms (5 in 5: 1, 2, 3, 4, 5)
 * Algorithms (5 in 5: 1, 2, 3, 4, 5)
 * Croatian Wikipedia
 * Wikipedija (2 in 2: 1, 2)


 * MDPI

would be useless, since there's no hit to it, or to its redirects. Headbomb {t · c · p · b} 00:11, 4 September 2018 (UTC)
 * Changes made. -- JLaTondre (talk) 15:24, 15 September 2018 (UTC)
 * Seems to only be missing this part
 * Algorithms (journal)
 * Algorithms (5 in 5: 1, 2, 3, 4, 5)
 * which can just be
 * Algorithms (5 in 5: 1, 2, 3, 4, 5)
 * Headbomb {t · c · p · b} 22:15, 15 September 2018 (UTC)
 * Fixed. -- JLaTondre (talk) 18:14, 6 October 2018 (UTC)

Moved to /Questionable
I moved whatever was at /Selected# to /Questionable# btw (including /Selected.cfg to /Questionable.cfg). This is a clearer name, but JCW-selected remains the same. I have some additional plans for that template and the compilation, but those can be done once /Questionable is polished and the last few kinks worked out and the multipage stuff implemented. Headbomb {t · c · p · b} 01:04, 17 September 2018 (UTC)


 * Could we get a new run of WP:CRAPWATCH (at its new location), even if there isn't any other improvements to the code? Headbomb {t · c · p · b} 15:58, 24 September 2018 (UTC)
 * 20181001 dump results uploaded. This weekend, if all goes well, I hope to complete the remaining items as well as role it into the regular run. -- JLaTondre (talk) 21:03, 3 October 2018 (UTC)
 * Fixed the "(journal)" case above & implemented saving as part of the bot run. It is doing 1500 lines per page (actual output will be longer as it keeps adding rows until the last row added makes the total over 1500, example: total is 1450, next row is 75, final total will be 1525). Since the number of lines in a row can vary much more significantly for questionable targets than for the common targets, using lines seemed better than the number of rows. Looking at using redirects in false positives next. -- JLaTondre (talk) 18:21, 6 October 2018 (UTC)


 * Amazing. I'll deep review for little things, but the quick review results look good. Headbomb {t · c · p · b} 18:28, 6 October 2018 (UTC)

WP:CRAPWATCH question
In #6 (OMICS Publishing Group), you have the same false positive that happens in several journals. To setup exclusions, do we need to use

to cover all instances, or individual exclusions like



The former would be much more useful. Headbomb {t · c · p · b} 14:34, 7 September 2018 (UTC)
 * So ignore all redirects to a target as well as the target? Yeah, that is doable. Just for questionable? Or for TAR also? -- JLaTondre (talk) 22:56, 7 September 2018 (UTC)


 * Well /TAR works fine as is. /Questionable is the one with the repetitions, so this would be for /Questionable. Headbomb {t · c · p · b} 03:24, 8 September 2018 (UTC)
 * With respect to the goupings, I assume if the group heading (example: "Biomedical Research" in the "Allied Academies" table example further up the page) is excluded, everything under that group should be excluded as well (even if not matching an exclusion)? -- JLaTondre (talk) 23:57, 13 September 2018 (UTC)


 * Might as well, yes. Headbomb {t · c · p · b} 21:43, 14 September 2018 (UTC)


 * This one is still not implemented. Headbomb {t · c · p · b} 18:37, 6 October 2018 (UTC)
 * Yes, that was what I said below I was going to work on next. ;-) -- JLaTondre (talk) 18:41, 6 October 2018 (UTC)
 * Exclusions based on redirects has been implemented & uploaded. Please review. -- JLaTondre (talk) 00:17, 7 October 2018 (UTC)


 * Seems to work! I think everything here on this page can be archived. I'll have something else, but it'll be an easy thing to do. Headbomb {t · c · p · b} 00:57, 7 October 2018 (UTC)

Portal talk:Scotland
Hello JL-Bot / JLaTondre I'm wondering why your recent updates to recognized content at Portal talk:Scotland no longer include the content of "Former featured articles", "Featured Lists", "Former good articles", or "Did you know? articles", amongst others?

I did notice that before the edit by TheTranshumanist these (and the other items) were included. Is this some consequence of a community discussion that I was unaware of? It's a pity because this previous content was of immense value for me in terms of efficiency of working on EN:WP.

I'd appreciate your reply. - Cactus.man  &#9997;  20:12, 7 October 2018 (UTC)
 * In the edit you link, TheTranshumanist removed those data types from the bot configuration. The bot only provides the types requested by the project. As those types were no longer requested, they were no longer provided. -- JLaTondre (talk) 23:00, 7 October 2018 (UTC)

Up targets to /15?
I've finally processed and cleaned up the first 1000 entries of WP:JCW/TAR last month or so. Going through them once per dump is really quick now, so we can up the entries to 1500 or even 2000 for targets. Headbomb {t · c · p · b} 01:20, 11 October 2018 (UTC)
 * Expanded to 1500. Easy to go to 2000 if you decide you want that after looking at the 1500. -- JLaTondre (talk) 13:46, 13 October 2018 (UTC)


 * Thanks. It's going to take me a while to hit the new 500. Lots of exclusions to setup, redirects to create, typos to flag, insource:/Foobar/ searches to do and cleanup... Headbomb {t · c · p · b} 14:33, 13 October 2018 (UTC)

Methods in Molecular Biology doesn't pick up Methods in Molecular Biology (Clifton, N.J.) ?
See Entry #1022 in WP:JCW/Target11. Headbomb {t · c · p · b} 14:44, 13 October 2018 (UTC)
 * Methods in Molecular Biology (Clifton, N.J.) has . That isn't valid syntax and I'm surprised the redirect works. I've updated the redirect parser to catch that case. -- JLaTondre (talk) 16:56, 13 October 2018 (UTC)

WT:CRAPWATCH exception
Could you put an exception for this one. Headbomb {t · c · p · b} 01:10, 14 October 2018 (UTC)


 * Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable2


 * Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable3


 * Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable4


 * Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable5


 * Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable6



Should also all redirect to Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia/Questionable1. Headbomb {t · c · p · b} 01:10, 14 October 2018 (UTC)
 * Should be working. -- JLaTondre (talk) 13:34, 14 October 2018 (UTC)

Question about disambiguators
In 2011, you wrote "... [the bot] should now should properly detect all the (journal), (magazine), or (newspaper) variants.", referring to cases like Nature (journal) vs Nature, Flight (magazine) vs Flight, etc...

What disambiguators are supported here? Because it would be useful if this was extended to say,


 * Journal > Magazine > Newspaper > Website > Database > Encyclopedia > Book > Publisher

Headbomb {t · c · p · b} 22:38, 17 October 2018 (UTC)
 * Wow, 2011! It has been that long? It did just the first three. It was an easy expansion for the remaining types. There were only two changes. Results are uploading now. -- JLaTondre (talk) 23:47, 17 October 2018 (UTC)


 * Well there's gonna be a few more now that I know this is supported. E.g. eLS (encyclopedia), but that's likely going to be in the next dump. Which hopefully will have the remaining CRAPWATCH things sorted out. Headbomb {t · c · p · b} 00:06, 18 October 2018 (UTC)

Category links
In WP:JCW/C43, you have something like

this should be

With a : in front of the category. Headbomb {t · c · p · b} 22:01, 25 October 2018 (UTC)
 * Done. I also did the same with File: links (probably not likely to ever happen, but just in case). -- JLaTondre (talk) 13:32, 27 October 2018 (UTC)

Skip the (journal) bypass if it's tagged with R from unnecessary disambiguation / in Category:Redirects from unnecessary disambiguation
E.g. Evol Dev in WP:JCW/E31 uses Evol Dev, and the bot fetches the information from a pointless page (Evol Dev (journal)), rather than the good one (Evol Dev). Since Evol Dev (journal) is tagged with R from unnecessary disambiguation / categorized in Category:Redirects from unnecessary disambiguation, the bot should just make use of Evol Dev as if Evol Dev (journal) did not exist. Headbomb {t · c · p · b} 13:35, 8 November 2018 (UTC)
 * LOL You like to complicate things, don't you? ;-) Shouldn't be too much trouble. -- JLaTondre (talk) 18:40, 10 November 2018 (UTC)


 * I don't, but people refuse to delete those redirects. Headbomb {t · c · p · b} 22:16, 10 November 2018 (UTC)
 * Implemented. Couple of comments though:
 * Evol Dev (journal) is currently not tagged with R from unnecessary disambiguation. I manually edited my parsed data set so that when the bot processed it, it would recognize it as one and set the output correctly. You will need to update the actual page to add the template before the next dump if you wish it to keep doing that.
 * In doing this, I realized that for this request, I had only half implemented the change. There are two places in the code that are impacted (how things are counted & how they are outputted). I had only done the counting. I fixed the output as well which means there are a lot more changes than the original two.
 * Everything looks good to me, but there are lot of deltas so I could have missed something. Results are uploading. Let me know if you see anything wrong. -- JLaTondre (talk) 01:18, 12 November 2018 (UTC)

More things that don't count
You know how 'Series' or 'Part' don't count for matching purposes? I though 'Supplement' was also covered, but turns out it's not.

So here's a few more things that should be ignored for matching purposes i.e, if you find Foobar Supplement, Foobar, New Series, or Foobar (N.F.), they should be grouped with Foobar in WP:JCW/TAR (and WP:JCW/CRAP too). Headbomb {t · c · p · b} 16:40, 5 November 2018 (UTC)
 * Supplementum
 * Supplement
 * Suppl.
 * Suppl
 * Nouvelle Série
 * New Series
 * N.S.
 * NS
 * Neue Folge
 * N.F.
 * NF
 * Done. Let me know if you see anything odd. -- JLaTondre (talk) 01:11, 7 November 2018 (UTC)


 * I'll take a look! So far, it looks like it's doing a few nice pickups. Headbomb {t · c · p · b} 01:39, 7 November 2018 (UTC)
 * You could add also "Monographs/Monograph/Monogr./Monogr". I'm also debating if adding "Letters/Letter/Lett./Lett." would be more helpful than not. A trial with the "Letters" stuff in would yield a lot of insight. If it's not helpful, it could be yanked. Headbomb {t · c · p · b} 14:19, 7 November 2018 (UTC)
 * Version with both uploaded. -- JLaTondre (talk) 22:10, 7 November 2018 (UTC)


 * There's some issues. Things like Letters to Nature stop being picked up for Nature (journal) (first diff line in ). And things like The Powys Review Letters started being picked up for Physical Review. Headbomb {t · c · p · b} 22:51, 7 November 2018 (UTC)
 * I don't understand the Letters to Nature case. Since it's a redirect to Nature, it shouldn't be impacted. I'll investigate that. "The Powys Review" was actually added to "Physical Review Letters" which makes sense. "Physical Review Letters" contains various forms of "Phys Rev Lett". If "lett" is stripped from those, you are left with only a two character difference ("rev" & "review" are treated the same, "the" is also stripped). -- JLaTondre (talk) 23:38, 7 November 2018 (UTC)
 * Yeah the second one makes sense. The Letters to Nature however doesn't. There's a few cases like that, and it always seem to be with leading stuff, rather than trailing stuff. Could be wrong about that thought. Headbomb {t · c · p · b} 23:54, 7 November 2018 (UTC)
 * Letters to Nature (and similar cases) should be fixed. -- JLaTondre (talk) 18:25, 10 November 2018 (UTC)

Seems to work fine. As a side note, I've cleaned up everything up to WP:JCW/Target15. Could increase it to /20 or even /25. Headbomb {t · c · p · b} 21:58, 13 November 2018 (UTC)

Weirdness

 * /

There's weird stuff going on there. Headbomb {t · c · p · b} 17:06, 12 November 2018 (UTC)
 * In what way? They all match the "Journal > Magazine > Newspaper > Website > Database > Encyclopedia > Book > Publisher" logic. Those forms are the first one in that chain (,, & ) and the two redirects resolve to the correct locations. -- JLaTondre (talk) 21:45, 12 November 2018 (UTC)
 * Well, for instance, the old version linked to Stylus, a dab page that lists both Stylus Magazine and The Stylus as possible entries. Maybe that simply means that Stylus (magazine) needs to be created and redirected to the dab page though. I'll give it a though now that I know what is happening. Headbomb {t · c · p · b} 19:26, 13 November 2018 (UTC)
 * this certainly is an issue though. American Conservative redirects to The American Conservative, but the bot thinks its meant to refer to American Conservative (book). Likewise for (a magazine, now overruled by a website). Headbomb {t · c · p · b} 19:36, 13 November 2018 (UTC)


 * I think for this bit of logic, it's best to stop at "Journal > Magazine" and forget "Newspaper > Website > Database > Encyclopedia > Book > Publisher". Headbomb {t · c · p · b} 19:45, 13 November 2018 (UTC)
 * Reverted the changes & also removed the newspaper case. Re-running now so updated results will be in a bit. -- JLaTondre (talk) 23:44, 14 November 2018 (UTC)


 * Nothing was uploaded btw. And if you could increase targets to /20 or even /25. that would be great. Headbomb {t · c · p · b} 18:40, 15 November 2018 (UTC)
 * A day is still a bit, isn't it? ;-) Sorry, was interrupted. Uploading now along with the /25. -- JLaTondre (talk) 23:13, 15 November 2018 (UTC)
 * Also another weird thing WP:JCW/Target1 lists Science twice. Once with the dab, the other without. Headbomb {t · c · p · b} 02:29, 16 November 2018 (UTC)
 * Fixed. -- JLaTondre (talk) 15:43, 18 November 2018 (UTC)

any word on the new dump being processed? Headbomb {t · c · p · b} 20:16, 26 November 2018 (UTC)
 * Holiday delayed it. Should be up now. -- JLaTondre (talk) 12:24, 27 November 2018 (UTC)

Dab issue?
In WP:JCW/Target3, entry #293, RNA links to RNA instead of RNA (journal). Headbomb {t · c · p · b} 01:15, 28 November 2018 (UTC)
 * The problem is caused because "RNA (journal)" and "Rna" both normalize to "rna". However, Rna is a redirect to RNA and there was no Rna (journal) equivalent (though you have since created). When the common normalization is run, it's picking the "Rna" target over the "RNA (journal)" target. I can easily put a workaround for that case in the common output, but I'd rather solve the selection logic instead. However, since you created the redirect, shouldn't be an issue with next dump run. I'll still look at it for future cases. -- JLaTondre (talk) 22:17, 29 November 2018 (UTC)

Is JL-Bot supposed to be updating "Women writers articles by quality and importance"?
Hi. I'm not sure if this is the place to ask the question, so apologies for cross-posting. Regarding WP:WikiProject Women writers, I've noticed that the table on our mainpage, "Women writers articles by quality and importance" isn't updating. JL-Bot is updating others areas of the Articles section, so is it supposed to be updating the table, too? Thank you. --Rosiestep (talk) 22:16, 2 January 2019 (UTC)
 * Do you mean this table? If so that's updated by, which currently has some issues. There is some information at Wikipedia talk:Version 1.0 Editorial Team/Index and Bots/Noticeboard, but long story short is that people are looking into it (the new operator is if you want to contact them about about where things are). Headbomb {t · c · p · b} 22:51, 2 January 2019 (UTC)
 * Yes,, that is the table which needs updating. I'll contact as per your suggestion. --Rosiestep (talk) 22:57, 2 January 2019 (UTC)

Don't strip final comma?
In this article, we have The Transactions of the Linnean Society of London, Series 2,. However in WP:JCW/Target10, this is reported as The Transactions of the Linnean Society of London, Series 2. Not really sure why the final comma is stripped, but it should be kept. Headbomb {t · c · p · b} 00:29, 7 January 2019 (UTC)
 * There was a specific step to strip trailing commas. I feel like that was requested, but not seeing anything in the archives. Before there was the TAR processing, I believe some cleanup was requested to remove some minor differences and make more entries match (another one is to remove '' at the end). I have removed the comma one and re-ran. Please look at the results and see what you think. I will also create a documentation page that describes all the manipulations (template processing, clean-up, normalization, etc.). It would be good to have a listing for future reference. It might take a couple of days to get to that, though. -- JLaTondre (talk) 02:12, 8 January 2019 (UTC)


 * I think that was mostly for WP:JCW/POP purposes back when it was our main way of prioritizing work. WP:JCW/TAR could get closer to raw entries to allow for cleanup and standardization. Commas and other garbage should be stripped in the comparison step, but ultimately reported. Whitespace can still be normalized, since the reader wouldn't see that. Headbomb {t · c · p · b} 02:18, 8 January 2019 (UTC)


 * Anyway, I cleaned all instances of final commas with User:JCW-CleanerBot. Headbomb {t · c · p · b} 02:33, 8 January 2019 (UTC)

It seems to have reprocessed everything but WP:JCW/TAR btw. Headbomb {t · c · p · b} 14:09, 8 January 2019 (UTC)
 * I only finished the 'regular' output yesterday. The remaining is running now. -- JLaTondre (talk) 23:55, 8 January 2019 (UTC)


 * Have you been doing some tweaks to WP:JCW/CRAP logic too? It's been a few times when you rerun the bot that the output changes on that page, beyond the 'fresh dump'. Headbomb {t · c · p · b} 04:34, 9 January 2019 (UTC)
 * No, there have been no logic changes for the questionable processing. Any changes would be the result of upstream changes in the data or your configuration settings. -- JLaTondre (talk) 00:18, 10 January 2019 (UTC)


 * What causes changes like (see e.g. the bunch of Open journals in Bentham Science Publishers) then? Because I can't see anything in the config settings that would cause that. Headbomb {t · c · p · b} 00:42, 10 January 2019 (UTC)
 * Not sure. The Open journals that have been removed are the duplicate ones (per the section above). The redirect versions are still there. However, I haven't deployed any changes that should cause that nor am I'm not seeing any changes in the pages themselves. I'm currently replacing the existing code to improving speed and remove the remaining duplicates so not going to spend time isolating it on the old version. -- JLaTondre (talk) 19:15, 13 January 2019 (UTC)

/r/science
While /r/science is fine in mainspace, in wikipedia space that causes issues. /r/science would be needed.

See the line that reads
 * The New American (3 in 3: 1, 2, 3)

in WP:JCW/Questionable2. Could apply to other places too. Headbomb {t · c · p · b} 00:56, 11 February 2019 (UTC)
 * Fixed. -- JLaTondre (talk) 23:31, 11 February 2019 (UTC)


 * It did seem to cause some weird collateral changes too. Two weird edits:, . Everything else was fine. Headbomb {t · c · p · b} 07:44, 12 February 2019 (UTC)


 * Also when the bot creates the Questionable8 talk redirects, there are some issues . Headbomb {t · c · p · b} 07:46, 12 February 2019 (UTC)
 * Redirects fixed. -- JLaTondre (talk) 20:45, 15 February 2019 (UTC)

Bot seems to be ignoring Unreliable fields and Mirrors and forks in WP:CRAPWATCH/SETUP
For instance, there is a and a selection tree in there. But I see no 'Alternative medicine' or 'Wikipedia:Mirrors and forks' sections in WP:CRAPWATCH anywhere, despite many of those journals and publications being cited. Headbomb {t · c · p · b} 16:18, 8 February 2019 (UTC)
 * Because those sections don't follow the proper syntax. The needs to be at the start of the line as in User:JL-Bot/Questionable.cfg. Anything else is ignored (to avoid picking the documentation earlier in the page, etc.). In those two sections, the entries are *  which will not be picked up. The asterisk is redundant as the template adds one. I will remove them and re-run the questionable processing. -- JLaTondre (talk) 16:33, 9 February 2019 (UTC)


 * Wow, the brain fart on that one. Thanks for finding it! Headbomb {t · c · p · b} 17:49, 9 February 2019 (UTC)


 * I setup a bunch of new exclusions to deal with the new influx. If you could rerun when you have time, that would be great. Headbomb {t · c · p · b} 21:36, 9 February 2019 (UTC)
 * Done. -- JLaTondre (talk) 20:19, 10 February 2019 (UTC)


 * Could you give it another go to when you've got the chance. Nothing that's pressing though, so if you've got a few code updates planned in the next few days it can wait until then, but it's good to have a refreshed baseline after big updates to WP:CRAPWATCH/SETUP + User:JL-Bot/Citations.cfg. Headbomb {t · c · p · b} 09:13, 11 February 2019 (UTC)
 * Running. Will have the /r/science fix. -- JLaTondre (talk) 23:32, 11 February 2019 (UTC)


 * And another run? This will likely be the last one needed before a code update / next dump. Also feel free to increase targets to /30. Headbomb {t · c · p · b} 06:52, 15 February 2019 (UTC)
 * In process. -- JLaTondre (talk) 20:46, 15 February 2019 (UTC)

Minor tweak for the next run
Instead of

just do

Same for the magazines. I've updated the JCW-Main to call JCW-Letter when needed. Headbomb {t · c · p · b} 04:46, 16 February 2019 (UTC)


 * Also, there seems to be little point in having separate JCW-exclude/MCW-exclude templates, so I'd suggest excluding things from both list using either templates (i.e. if JCW-exclude is used, exclude things from both JCW/TAR and MWC/TAR lists, and if MCW-exclude is used, exclude things from both JCW/TAR and MCW/TAR lists.). There may be corner cases, but I haven't found them yet. Headbomb {t · c · p · b} 07:12, 16 February 2019 (UTC)
 * Okay. -- JLaTondre (talk) 21:35, 17 February 2019 (UTC)


 * Updated to 'JCW-Main' after a page move. Headbomb {t · c · p · b} 08:44, 23 February 2019 (UTC)
 * Likewise, JournalsPrevNext is now JCW-PrevNext. The MCW structure has been updated with the same conventions too, with MCW instead of JCW. See Category:Journals Cited by Wikipedia templates. Headbomb {t · c · p · b} 20:20, 23 February 2019 (UTC)
 * Both of these changes (ignores & templates) were implemented in the last run. -- JLaTondre (talk) 11:33, 26 February 2019 (UTC)

WP:CRAPWATCH tweak
I've given a major, major expansion to WP:CRAPWATCH/SETUP and the list now draws from multiple sources. Could you take the source / note parameters of JCW-selected and add it to the target in the list? E.g. something like

{|class=wikitable !Rank !Target/Group (Source) !Entries (Citations, Articles) !Total Citations !Distinct Articles
 * 24
 * Pharmacognosy Reviews {{JCW-selected-source| is fine, if At doesn't exist. But At existing and not pointing to Nature (journal) should exclude At from the matches. Likewise in the TAR entry for New Scientist (#6) it matches things that clearly don't point to New Scientist, like NEWSru, Science (journal), Sun Journal and News24. Those should all be filtered out, and relatively early on, so you're not compounding the issue by looking for variants of NEWSru, Sun Journal, etc... for New Scientist. Headbomb {t · c · p · b} 00:40, 12 March 2019 (UTC)
 * Ugh, I overlooked that check in the prior version. I put it back in and am running it. Hopefully the next version looks better. Sorry about that! -- JLaTondre (talk) 02:23, 12 March 2019 (UTC)
 * It can't not look better haha. I'll have more comments, but at the moment, it's hard to even read the compilation and find out what's good/bad behaviour. The new templated versions seems fine, as far as structure goes, although some CRAP matching probably needs a bit of refinement. I'll know more once the new upload is up. Headbomb {t · c · p · b} 02:36, 12 March 2019 (UTC)
 * It can't not look better haha. I'll have more comments, but at the moment, it's hard to even read the compilation and find out what's good/bad behaviour. The new templated versions seems fine, as far as structure goes, although some CRAP matching probably needs a bit of refinement. I'll know more once the new upload is up. Headbomb {t · c · p · b} 02:36, 12 March 2019 (UTC)


 * In future runs, use ISO format for dates. (https://en.wikipedia.org/w/index.php?title=Template:JCW-date&curid=60140457&diff=887330264&oldid=886911175). Headbomb {t · c · p · b} 02:42, 12 March 2019 (UTC)
 * It's still appending extra stuff for searches . Not a critical fix that needs another run though. Headbomb {t · c · p · b} 08:23, 12 March 2019 (UTC)
 * Should really be fixed now, but will wait on another run to upload. -- JLaTondre (talk) 01:25, 13 March 2019 (UTC)
 * I uploaded a few of Journal/A pages to verify fixed, but not the whole set. Will do that when everything else looks good. -- JLaTondre (talk) 01:47, 14 March 2019 (UTC)
 * Done. -- JLaTondre (talk) 17:32, 14 March 2019 (UTC)

New version, part 2
Alright, now that the uploaded version makes more sense, we can read things more sanely.

The first things is that it's obvious that the matching algorithm is too aggressive when 'small' names are concerned, so it's got to be de-aggresivized. For example, The Astrophysical Journal matches

• CAN Journal

• CAP Journal

• CAPjournal

• CD Journal

• CE Journal

• CW Journal

• EPA Journal

• F.A.O. Journal

• GAS Journal

• Gas Journal

• IK-Journal

• JAG Journal

• Journal 1816-1845

• K. B. Journal

• KB Journal

• KPA Journal

• L'Aut' Journal

• L2 Journal

• LL Journal

• LLJournal

• M/C Journal

• MAPS Journal

• MC Journal

• N. Y. J. Suppl.

• P&S Journal

• RT Journal

• S.A.E. Journal

• SAE Journal

• SAPS Journal

• SEJournal

• THE Journal

• The CPA Journal

• The GB Journal

• The KPA Journal

• The Law Journal

• The SAE Journal

• The UMAP Journal

• The WAC Journal

• U of L Journal

• UMAP Journal

• W.E. Journal

Presumably through one of its small redirects. I'm guessing the algorithm goes something like


 * A) Search for, fetch redirects such as  / /  (which form set A)
 * B) Search for normalized variants and expanded variants of set A (which form set B, having an / /  in there somewhere)
 * C) Search for typos of set B (which form set C), finding  and most others as a typo of one or more of  / /
 * D) Exclude articles from set C which don't point back to, but keep the redlinks (e.g. most of them)

Here by 'normalized variant', I mean normalizing  → , ignoring Series/Supplement/Letters/Trailing garbage etc... By 'expanded variant' I mean artificially adding 'Journal/Magazine' to see if you can get a match.

So for small names, or perhaps in general, what it should do is instead Headbomb {t · c · p · b} 09:15, 12 March 2019 (UTC)
 * A) Search for, fetch redirects such as  / /  (which form set A)
 * B) Search for normalized variants and expanded variants of set A (which form set B)
 * C) Search for typos of set B (which form set C)
 * D Throw away things that don't point back to, keep the redlinks (which form set D)
 * E) Search for expanded variants of D (which form set E)
 * F) Throw away things that don't point back to, keep the redlinks (which form set F)
 * The processing actually works as follows:
 * For each target to be processed:
 * Find all citations that resolve to that target
 * Find all redirects to the target
 * For each of the above, use their normalization to find other citations with same pattern. Matches are defined as:
 * For strings of <3 characters, require an exact match
 * For strings of 3-5 characters, allow 1 character delta
 * For strings of 6-20 characters, allow 1-2 character deltas
 * For strings of 21+ characters, allow 1-3 character deltas
 * Toss out any false positives
 * Toss out any that resolve to articles
 * For CRAP, the logic is the same except that for step 2 it uses the additional parameters in the configuration line.
 * In the "The Astrophysical Journal" case, it has a redirect of "Ap J" which normalizes to "apjournal". That is what causes the above hits. "J" is expanded to "journal" in order that cases like "Nature J." and "Nature Journal" match. This is the logic that has been used for a long time. The problem is that redirects like these are now being pulled in. I can see three solutions:
 * Switch the normalization from "j" -> "journal" to "journal" -> "j". This would reduce the string size which would reduce the tolerance. If this was done, only "CAP Journal" in the above list would be a hit. However, I don't think this is a good idea as: a) it would stop catching typos of "journal"; and b) it would stop catching cases where there is not a space between the term and "journal".
 * Continue to normalize as is, but if both normalizations being compared end in "journal", strip the "journal" from both and compare the remainders. Use the existing delta rules. In this case, that would result in "ap" being compared and all of the above would be tossed. It should still catch the a & b case from the above option.
 * Mark these as false positives and be done with it. This is the first run with the redirect addition so should stabilize.
 * Revert the inclusion of redirects.
 * Let me think on it and see if I come up with an other options. If not, I'll give 2 a shot. -- JLaTondre (talk) 01:22, 13 March 2019 (UTC)


 * Of those, #2 seems the best (likewise for 'magazine'), or at least worth giving a try to see how it goes. Not sure how well my suggested algorithm above would perform / how easy to implement it would be. It might be better, it might be worse.
 * But manually marking them as false positives is... they're just so many of them. Headbomb {t · c · p · b} 01:48, 13 March 2019 (UTC)
 * I implemented #2 and re-ran for TAR. Take a look at it and see what you think. If good, I will update CRAP as well. I did not do Part 5 (below) yet as wanted to see the changes separately (in case something went weird). -- JLaTondre (talk) 01:46, 14 March 2019 (UTC)
 * It's brought things down to a much, much, more manageable level. Run it all (although feel free to keep #5 separate). Headbomb {t · c · p · b} 02:30, 14 March 2019 (UTC)
 * CRAP done. -- JLaTondre (talk) 17:33, 14 March 2019 (UTC)

New version, part 3
The bot stopped at WP:JCW/Questionable10, leaving WP:JCW/Questionable11/WP:JCW/Questionable12/WP:JCW/Questionable13/WP:JCW/Questionable14 useless. They should get CSD'd when no longer needed. Headbomb {t · c · p · b} 09:21, 12 March 2019 (UTC)
 * Already handled. The bot lets me know which pages are no longer needed & I delete them after the run. Normally, I would be around when it completes. -- JLaTondre (talk) 20:39, 12 March 2019 (UTC)

New version, part 4
In WP:JCW/CRAP, several entries are missing their source. E.g. entry #951 in WP:JCW/Questionable10 (Pattern Recognition in Physics) is missing BLJ from WP:CRAPWATCH/SETUP. Headbomb {t · c · p · b} 09:24, 12 March 2019 (UTC)
 * Fixed, but will wait on another run to upload. -- JLaTondre (talk) 01:25, 13 March 2019 (UTC)
 * Still have some cases not working. Will look into it more. -- JLaTondre (talk) 17:34, 14 March 2019 (UTC)

This is possibly related to entries like

which choke because of the pipe in the note. I fixed that, although I don't know if that's going to fix these issues. Headbomb {t · c · p · b} 07:44, 15 March 2019 (UTC)
 * Most cases were due to a typo that I didn't see for staring at. I've fixed that, but haven't updated as the content task has been running for the past day (I need to work on making that more efficient also). You are correct that pipes would also create an issue. If they can be avoided, I'd rather not have to deal with them. -- JLaTondre (talk) 21:10, 15 March 2019 (UTC)
 * Should be avoidable in general, although it'd be useful if they were. Not a priority in the least tough. Headbomb {t · c · p · b} 03:16, 16 March 2019 (UTC)
 * Implemented. If you see any cases it doesn't handle, please let me know. -- JLaTondre (talk) 19:13, 16 March 2019 (UTC)
 * There's nothing that needs it at the moment, but I'll update it later to have shorter notes in certain cases. Headbomb {t · c · p · b} 19:27, 16 March 2019 (UTC)
 * , still many cases it doesn't handle. See WP:JCW/Questionable10, entries 902–905 and 909. There are more (e.g. 31 entries in WP:JCW/Questionable9). Headbomb {t · c · p · b} 06:14, 17 March 2019 (UTC)
 * Ugh, got wrapped up in pipe case & forget to validate change with other cases. Should be fixed. -- JLaTondre (talk) 16:26, 17 March 2019 (UTC)

New version, part 5
In WP:JCW/CRAP (and in TAR, but mostly in CRAP), there's a lot of  types of acronyms, which match other unrelated   type of acronyms.

So if you've got something which is one single, all caps word, don't look for typos. This way something like  doesn't match   but only   + capitalized variants +.

Alternatively, perhaps simpler, there could be a final step that removes allcaps acronyms that don't match the initial input save in capitalization.

This way if you have  in WP:CRAPWATCH/SETUP (or at a redirect to something that would get picked up by WP:CRAPWATCH/SETUP), you'd keep   if it's found, but would throw away.

Headbomb {t · c · p · b} 10:07, 12 March 2019 (UTC)
 * Doable. I will roll in with changes to Part 2 above. -- JLaTondre (talk) 01:31, 13 March 2019 (UTC)
 * was this part implemented? Because entry #912 in WP:JCW/Questionable10 still matches NELS to JELS, for example. Headbomb {t · c · p · b} 06:09, 17 March 2019 (UTC)
 * No, I decided to do it separately so that I could verify each change. My next step was to ask for an example to test against. You beat me to that. ;-) -- JLaTondre (talk) 16:27, 17 March 2019 (UTC)
 * Looking forward to this being implemented. I believe it's the last 'major' thing that needs to be in for false positives to fall back to a manageable level. Could be wrong about that, but very much looking forward to it. Headbomb {t · c · p · b} 16:30, 17 March 2019 (UTC)
 * Initial version implemented. Generating TAR & CRAP output (local). Probably won't have time to look at it and validate until tomorrow. -- JLaTondre (talk) 18:51, 17 March 2019 (UTC)
 * Cool. I'll be doing a bunch of additions to the exclusions in the meantime, so if your tests show that nothing's blown up tomorrow, do a fresh run then before uploading. Headbomb {t · c · p · b} 19:00, 17 March 2019 (UTC)
 * It looks good to me. I'm re-running & will upload. By the way, in those bunch of additions you made, you entered a number that you didn't have to because they would be excluded by this test. -- JLaTondre (talk) 23:03, 18 March 2019 (UTC)
 * Yup. Hence below. I tried avoiding most that wouldn't get picked up under the new rules, but I didn't go out of my way to triple check I only included the ones that would get picked up, especially since I haven't seen the new rules in action yet. The main concern was to get rid of as many false positives as possible before the new run, at least so that WP:JCW/CRAP wouldn't be so massive. I left a couple of long entries as they were, since the lack of a typo hierachy made it hard to gauge if they were false positives, or legit pickups. Headbomb {t · c · p · b} 23:17, 18 March 2019 (UTC)

Doesn't seem to work. Taking WP:JCW/Questionable9, you've got entries 802, 818, 823, 829, 832, 833... and many many others, all matching inexact all caps acronyms. Headbomb {t · c · p · b} 07:19, 19 March 2019 (UTC)
 * It's only ignoring all uppercase words of the same length. For "International Journal on Research Methodologies in Physics and Chemistry", the configuration has "IJRMPC" (6 letters), but the reported result is "IJRAP" (5 letters). I interpreted the request that way based on your examples all being the same length, but on re-reading, I see that wasn't stated. I can change it that if the search term is all uppercase, it throws out any results that are all uppercase. -- JLaTondre (talk) 03:07, 20 March 2019 (UTC)
 * Ah I see, my initial request was ambiguous there, yes. Headbomb {t · c · p · b} 08:47, 20 March 2019 (UTC)
 * Changed to ignore uppercase words of any length (when target is also an uppercase word). Running now. -- JLaTondre (talk) 00:47, 21 March 2019 (UTC)

New version, part 6
To cut down on false positives, some words shouldn't count as far as the "length" of the string is concerned.


 * Bulletins/Bulletin/Bull./Bull
 * Journals/Journal
 * News
 * Newsletter/Newsl.
 * Magazine/Mag.
 * Proceedings/Proceeding/Proc./Proc
 * Reviews/Review/Rev./Rev
 * Online
 * Transactions/Transaction/Trans./Trans

So if you have something like say, then for purpose of comparison, the string length should be 2, rather than 6. Headbomb {t · c · p · b} 17:44, 14 March 2019 (UTC)
 * Done. Will reprocess & post. -- JLaTondre (talk) 16:02, 16 March 2019 (UTC)

New version, part 7

 * Using Questionable configuration as of 2019-03-15T08:47:19Z
 * Using False Positive configuration as of 2019-03-16T05:07:25Z

These can be shoved in  to give {|

Making use of e-id and q-id on TAR/CRAP pages as relevant. Headbomb {t · c · p · b} 06:54, 17 March 2019 (UTC)
 * Done. -- JLaTondre (talk) 18:49, 17 March 2019 (UTC)

New version, part 9
In, the following exclusion was setup

however, it is not respected in WP:JCW/Questionable9 (entry #804). Headbomb {t · c · p · b} 12:24, 19 March 2019 (UTC)

Likewise, it had

but those were still included in WP:JCW/Questionable1 (entry#3) Headbomb {t · c · p · b} 13:13, 19 March 2019 (UTC)
 * Fixed. -- JLaTondre (talk) 02:48, 20 March 2019 (UTC)

Seems to works, although some of the associations were lost, for example, used to suppress 'Biologue' from WP:JCW/Questionable1, because Biologue was a match for Biology (journal), which is under MDPI. It's not really the end of the world, since they can be re-declared, but it would be useful to have those exclusions back (especially if the 3-level hierarchy is implemented) since they were already done/working before. Headbomb {t · c · p · b} 09:03, 20 March 2019 (UTC)


 * To be clear, it's not that Biologue couldn't get picked up. If there was a different MDPI journal (named 'Biologia'), then it would be a match for that an should be reported as is. Just that if matching Biology (journal) is the only reason it's included under MDPI, then it should be excluded. Headbomb {t · c · p · b} 09:10, 20 March 2019 (UTC)
 * Changed to ignore at both levels -- the main questionable entry (the behavior just implemented) or the additional targets (the original behavior). Running now. -- JLaTondre (talk) 00:50, 21 March 2019 (UTC)

New JCW-exclude format
I've added a crapton of exclusions (so a rerun would do wonder, even if you don't have the new dump yet). However, we're hitting the template expansion limit, so in addition to the 'normal' format

could you support

? Headbomb {t · c · p · b} 13:01, 21 March 2019 (UTC)
 * The new format isn't implemented yet, but once it's supported, we could likely get User:RonBot to merge and sort entries. Headbomb {t · c · p · b} 13:13, 21 March 2019 (UTC)
 * Actually put this on hold. The 'new' format blows up post-template expansion, so it doesn't make pages easier to load/edit. Maybe later, but for now this isn't needed. Headbomb {t · c · p · b} 00:57, 23 March 2019 (UTC)

New run
Thanks for the refresh. I just setup a bunch of exclusions for the expanded crapwatch, so another run would be rather productive at the moment. Headbomb {t · c · p · b} 09:58, 26 February 2019 (UTC)
 * Will re-run later today. There were a couple of new templates in new citations that I will update the code to handle. -- JLaTondre (talk) 11:34, 26 February 2019 (UTC)


 * There's some weird stuff going on with WP:MCW/TAR, first Spin (magazine) ≠ Scan Magazine doesn't seem to work (entry #18). There are other examples where the most recent exclusions didn't kick in on the WP:MCW/TAR pages. Many seem related to (magazine) entries, but Pacific RailNews ≠ RiaNews also didn't seem to work on WP:MCW/Target2. Second, the counts are a bit different for similar entries. E.g. Billboard (4365 in 1844) going up to Billboard (4445 in 1880). Headbomb {t · c · p · b} 06:54, 27 February 2019 (UTC)
 * For the exclusions, I believe the issue was the processing was from before those entries were added to the configuration page. While the save timestamp is after they were added, the run had actually occurred before that and was uploaded later (normally doesn't happen, but sometimes the processing gets broken up). I re-ran the target processing and it is excluding them. I will update both the target and questionable output to include the timestamps of the configuration page for future reference. For the Billboard case, the diff is comparing the results of the 02/01 dump with those from the 02/20 dump so they should differ. -- JLaTondre (talk) 02:34, 28 February 2019 (UTC)

Been a while since the new dump is out. Even you don't have time to implement the latest tweaks, a new run would be useful. Headbomb {t · c · p · b} 09:52, 8 March 2019 (UTC)
 * The old version is running. I hope to have the new version wrapped up this weekend. -- JLaTondre (talk) 03:30, 9 March 2019 (UTC)
 * Bot seems to have chocked on the Crapwatch. Did everything else fine though. Headbomb {t · c · p · b} 12:33, 9 March 2019 (UTC)
 * Questionable pages saved. -- JLaTondre (talk) 13:43, 9 March 2019 (UTC)
 * See, although this will be superceded by the new format below eventually. I fixed the pages, so no need to re-run. Headbomb {t · c · p · b} 14:00, 9 March 2019 (UTC)
 * I did set up a bunch of new exclusions, but the changes would be relatively minimal (mostly affecting TAR25 to TAR30). We're approaching an asymptotically stable set here (at least with the current algorithms). A new run would be nice if you want to run this overnight while you sleep, but its certainly not critical (and could wait until after the new format is implemented). Headbomb {t · c · p · b} 16:16, 11 March 2019 (UTC)
 * Current version uploading pre-dates the new exclusions, but you will need more (see User talk:JL-Bot). -- JLaTondre (talk) 21:39, 11 March 2019 (UTC)

When could we expect a new run? I was hoping to review the latest logic with a new dump, and have a final exclusion pass before publishing that signpost piece before the end of the month. Headbomb {t · c · p · b} 09:12, 24 March 2019 (UTC)
 * Saving now. -- JLaTondre (talk) 00:54, 26 March 2019 (UTC)
 * I've added a bunch of crapwatch exclusions. If you rerun now, I can submit the Signpost piece for publication. Headbomb {t · c · p · b} 15:13, 26 March 2019 (UTC)
 * Running. It will take a couple of hours. -- JLaTondre (talk) 21:16, 26 March 2019 (UTC)
 * Hmmm, seems that when the number of citations are the same, the order is random. I will add an additional sort level (number of articles) to avoid this type of flip flop. -- JLaTondre (talk) 01:09, 27 March 2019 (UTC)
 * Done. Should be consistent from here on out. -- JLaTondre (talk) 01:26, 27 March 2019 (UTC)
 * Cool beans. Things look good, so I've submitted my article to the Signpost. I've updated some notes and I'll be polishing some more stuff, but nothing major that would required reruns. Feel free to do one on the 30th since the Signpost will be published on the 31st, hopefully with my article in it. Headbomb {t · c · p · b} 01:33, 27 March 2019 (UTC)
 * Actually, now that List of Dove Medical Press academic journals was expanded, there is an opportunity for another run prior to the 'final' one on the 30th. Headbomb {t · c · p · b} 22:51, 27 March 2019 (UTC)
 * Also with a bunch of additions from updates to Beall's lists, it'd be worth a run. Headbomb {t · c · p · b} 15:26, 29 March 2019 (UTC)

I just finalized the latest exclusions and notes to deal with the Dove journals and latest expansion of the list. One final run before the signpost publication would take care of every little nitpicky thing. By the time the new dump gets around, the traffic should have died down on the page, and there won't be as much freaking out if we have false positives. Headbomb {t · c · p · b} 04:37, 31 March 2019 (UTC)
 * Running. Results will be up in a couple hours. -- JLaTondre (talk) 17:45, 31 March 2019 (UTC)

Signpost piece
I'd appreciate some support here (concerning the publication of User:Headbomb/Crapwatch) if you think this is a good initiative. Headbomb {t · c · p · b} 12:10, 28 March 2019 (UTC)
 * Looks like it is moving forward. -- JLaTondre (talk) 17:48, 31 March 2019 (UTC)

Need help
I tried to set-up JL-Bot, but I'm not sure if it's correct. There are two pages: I created both pages on 22 March 2019‎, but nothing has appeared. I thought the bot ran once a week. Please correct me if I'm incorrect. Mitchumch (talk) 04:50, 1 April 2019 (UTC)
 * WikiProject Civil Rights Movement/Open tasks
 * WikiProject Civil Rights Movement/Quality articles
 * Nominally once a week. It did not run this weekend due to a conflict. I have run it against those two pages. I'll now have it do a normal run. -- JLaTondre (talk) 22:38, 1 April 2019 (UTC)
 * It looks good. Thank you. Mitchumch (talk) 23:03, 1 April 2019 (UTC)

Dump finally up
It took a while, but there's finally a useable April dump. &#32; Headbomb {t · c · p · b} 09:06, 10 April 2019 (UTC)