User talk:The Transhumanist/OutlineDedupeHolding.js


 * This script is under development, and is not yet functional

When completed, this script will remove duplicate list items from section (e.g.: See also, holding bin, general concepts, list section, etc.). That is, it will remove from the current section all topics that exist anywhere else in the body of the page (not in templates).

= Script's workshop =
 * This is the work area for developing the script and its documentation. The talk page portion of this page starts at, below.

Description / instruction manual for

 * This script is under development, and is not yet functional

When completed, this script will remove duplicate list items from section (e.g.: See also, holding bin, general concepts, list section, etc.). That is, it will remove from the current section all topics that exist anywhere else in the body of the page (not in templates).

This is useful for culling lists of links gathered to a holding area (in an outline) awaiting placement into the outline. It can also help cleaning up the General concepts sections, List sections, and See also sections of outlines.

General approach
(general approach goes here)

More specifically, starting at the beginning...

Desired/completed features

 * Completed features are marked with ✅


 * Check for existence of holding sections
 * Check each holding section itself for duplicates, and remove them
 * Remove duplicates in holding sections that are in rest of outline
 * Remove empty holding sections

Holding sections
I was discussing using "Place these" as the name of a holding section, but that would violate SRTA. A subheading under See also, called "Other" would be less obtrusive. But might already exist elsewhere in the outline; I'll deal with that when I encounter it.

This script will run on several holding sections:
 * Other (under See also)
 * See also
 * General concepts
 * General concepts
 * General "subject name" concepts
 * Lists
 * Lists
 * "Subject name" lists

It should remove a holding section if it is empty.

Relevant scripts
This script should process the see also section, the general concepts section, the list section, and the "Other" section.

See User:Ucucha/duplinks (highlights duplicate links, which means it must find them).

See User:Evad37/duplinks-alt (highlights duplicate links, which means it must find them).

See User talk:The Transhumanist/RedlinksRemover.js (edits an article to delete something - adapt it to delete duplicate list entries that don't have an annotation).

Based on discussion below, RedlinksRemover.js probably has all the technology in it that this script needs: regex applied to removing list items, in a nested loop.

Rough rough talk-through
This conducts semi-automated editing, and therefore needs to be on a menu item. (Should not run by default).

Script dependencies
= Discussions =


 * This is where the actual talk page starts for User:. Please post your discussion threads below...

Loading dependencies
By the way, do I need to load any dependencies for the following code?

Do all "mw." lines have dependencies? The Transhumanist 13:40, 1 January 2018 (UTC)
 * is always available, and contains lots of other useful stuff – see mw.config. Other mw modules, detailed at ResourceLoader/Core modules, aren't necessarily available unless you load them – see ResourceLoader/Migration_guide_(users) for further details. - Evad37 &#91;talk] 14:10, 1 January 2018 (UTC)
 * Thank you. Reading them now. Also, I've added these links to User:The Transhumanist/Outline of scripts, for future reference. The Transhumanist 22:12, 5 January 2018 (UTC)

Leveraging TrueMatch data mining
[To User:Evad37]

[Referring to TrueMatch] This is very exciting. That's one more step in one of the city outline building approaches that gets sped up...


 * Step 1: Create city outline using template Template:Outline city ✅


 * Step 2: Find more links using TrueMatch ✅ and/or various ViewAsOutline scripts ✅


 * Step 3: Transfer links to outline (its holding section) via copy/paste ✅ or Send (planned set of scripts)


 * Step 4: Dedupe the links in outline's holding section, using OutlineDedupeHolding.js (planned script)


 * Step 5: Use TopicPlacerFromBin.js (planned script) to move the links from the holding section to their final resting places in the outline.


 * Step 6: Process the outline with RedlinksRemover ✅

Steps 3, 4, and 5 are currently done manually, but 3 (copy/paste) isn't as tedious, so 4 and 5 have priority.

Since deduping is more complicated after links are placed, developing this first, makes the most sense.

Example of using the tools so far...
The Outline of Chicago is currently being drafted, using the above steps...

Step 1: See all the redlinks? Those are from the Template:Outline city. Many of the links in the template do not apply to Chicago, and so they turn red, but you never know what all is going to turn red when you first start a city outline, and they are time consuming to remove by hand. So, what we do is populate the outline with all the topics we can find, and then strip out the redlinks in step 6. The RedlinksRemover doesn't remove red entries that have children, it just delinks them. But, when the outline first starts out, most of the redlinks don't have children. If we strip them out too soon, we'll wind up having to type many of them back in when we find children topics for them.

Step 2: Here's where StripSearchSorted with TrueMatch comes in. You do some intitle searches, such as "in Chicago". Increase the limit in the url to 5,000 to get the maximum results you can at once. That produces the results here: https://en.wikipedia.org/w/index.php?title=Special:Search&limit=5000&offset=0&profile=default&search=intitle%3A%22of+Chicago%22&searchToken=bvc5dp6q7ldd4ayxph2a6ixg

Step 3: We copy and paste them to the "Place these" section in the outline (under See also). We repeat step 2 with further intitle searches (such as "of Chicago") and other gathering methods and send them all to "Place these" until we have all the topics we can find.

Step 4: The problem we have now is that many of the links in the "Place these" section are already in the body of the outline, like Culture of Chicago, Demographics of Chicago, and so on. And links may be duplicated in the "Place these" section itself. Therefore, we need to dedupe (remove the duplicates from) this section. That for each link in "Place these" that is found in the body of the outline (not navigation templates), or elsewhere in "Place these", gets removed.

Step 5: In this step, you take each of the topics one-by-one from the "Place these" section, after the duplicates have been removed, and put them into the body of the outline. Currently done by hand.

Step 6: Clean it all up with the RedlinksRemover. This tool is quick and painless – just click on the menu item. Without this tool, it is mind-numbingly tedious.

Design considerations for dedupe
Which brings us to the design of OutlineDedupeHolding.js.

Eventually, this will dedupe more than one section, but its initial version will just process entries in the "Place these" section.

For each item, it needs to check the rest of the outline, excluding templates, and including the rest of "Place these", for a matching entry. If a match is found, that item is deleted from "Place these". If no match, go on to the next item.

My question for you
I think this one may be within my ability level to write. I just need a little guidance...

How would you go about programming it? The Transhumanist 07:37, 22 January 2018 (UTC)
 * Detecting duplicate links is a problem that has already been solved (for prose): User:Evad37/duplinks-alt. So I would suggest starting from there, and see if you can follow the approach that script takes – but you'll need to adapt it to look at wikitext rather than html, and to actually remove duplicated links. - Evad37 &#91;talk] 08:28, 22 January 2018 (UTC)
 * Come to think of it, it's not duplicate links that I need to remove, but list items with a duplicate link in them.
 * Hmmm. Wikitext. That's it! Transcluded templates' contents don't show up in an outline's wikitext. And RedlinksRemover.js already strips out entire list items from the wikitext of outlines via regex, and it uses nested loops to do it. This one requires a nested loop solution too, I think. Looping through all the list items in "Place me", applying each as a search string in a nested loop processing all the list items in the rest of the outline, ought to handle the bulk of it. Then use a similar process to remove the duplicates within "Place me" itself (or do this step first). Thank you for the clue I needed.  I'll let you know how it turns out. The Transhumanist 12:42, 22 January 2018 (UTC)