User:GreenC/testcases/bigtenorg

Steps to process bigten.org

This is a real request that was recently made. The steps below are exactly what I did to process. This is if everything goes smoothly and there are no problems that require changes to the core code of the bot - which can happen frequently. This request had no URL transformations (step 6) which would normally require some additional code.

1. Request was created by a user:

https://en.wikipedia.org/wiki/Wikipedia:Link_rot/URL_change_requests#bigten.org

2. Create a list of articles containing the domain, at the same time coin a new project name ('bigtenorg')

wikiget -a "insource:bigten insource:/bigten[.]org/" | shuf > bigtenorg.auth

3. Create a skeleton source file that contains domain-specific changes:

cp urlchanger_SKELETON_HARD.nim urlchanger_bigtenorg.nim

4. Edit urlchanger_bigtenorg.nim and modify basic domain information:


 * 1) - CONFIG START

Runme.urlchangerSum       = "WP:URLREQ"  # Edit summary

Runme.urlchangerDRe       = "bigten[.]org"            # Old name: hostname/domain/path - regex # Used to parse URLs from wikitext Runme.urlchangerDDRe      = "bigten[.]org"            # Same as ^ - hostname/domain only - no path Runme.urlchangerDPRe      = "bigten.org"                # Same as ^ - no regex and no path Runme.urlchangerNRe       = "bigten[.]org"   # New name - hostname/domain - regex # Used to identify when it's been switched to new URL # If DRe and NRe have the same values use the same entry for eachRunme.urlchangerNPRe      = "bigten.org"       # Same as ^ - no regex Runme.urlchangerNPPRe     = "Big Ten Conference"            # Wikitext to replace with when it finds NRe in metadata fields -  OK Runme.urlchangerNRPRe      = "Big Ten Conference"                # Plain text string to replace named refs -  NOT OK Runme.urlchangerTCRe       = &"(?i){mypipe}[^$]*[^$]*" Runme.skipapicheckexception = "bigten[.]org"


 * 1) - CONFIG END

5. Add code to do URL transformations.

None required for this project.

6. Compile medic binary

lx -n bigtenorg

7. Create project directories and files. Project name (-p) is the number of articles to process ie. run the bot on articles 1 to 1326 as listed in bigtenorg.auth created in step #2

wc bigtenorg.auth 1326	projectm -c -p bigtenorg.0001-1326

8. Run the bot on 1,326 articles:

runbot -n bigtenorg.0001-1326 -v medic-bigtenorg -r 8 -f auth

9. As it is running, check the logs for known trouble areas, such as soft-404s, that the bot will discover as it is running.

10. Cancel the bot and add code to handle discovered soft-404s ie. edit urlchanger_bigtenorg.nim and add the following code:

# Soft-404 traps here: if newloc ~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/?$") and newurl !~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/?$"): sendlog(Project.syslog, CL.name, url & " " & newloc & "  Redirect to home found  urlchanger7.1.3") return "DEADLINK" if newloc ~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/mbb/?$") and newurl !~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/mbb/?$"): sendlog(Project.syslog, CL.name, url & " " & newloc & "  Redirect to mbb  urlchanger7.1.4") return "DEADLINK"

The above code is saying if a redirected URL ends in "/mbb" this indicates a soft-404 and treat it as a dead link.

11. Kill the original project and recreate it and re-run the bot:

projectm -x -p bigtenorg.0001-1326 projectm -c -p bigtenorg.0001-1326 runbot -n bigtenorg.0001-1326 -v medic-bigtenorg -r 8 -f auth

12. Repeat steps #8-10 until it is running clear, then run to completetion.

13. After completion, follow a lengthy manual process of checking for known problems that show up in the logs. Sample steps:

(meta) if(-e logembway) cat logembway # Check these - something went wrong (meta) grep fixcommentarchive syslog # look at diffs for problems / see also the first "error" step why those didn't get fixed (meta) if(-e logradicalurl) cat logradicalurl | awk -F"" '{print $3}' # check for legit archive URLs and add to logradicalurl in medic.nim (meta) grep removearchive2 cbignore # check for embedded templates that should be added to encodeWiki etc..

Modify the bot code as needed and rerun any articles as needed. To re-run a single article:

bugm -n "Feudalism" -r

14. For new archive.today links, need to manually verify each one is working, per a process outlined in the docs.

15. Push 5 diffs up to Wikipedia

push2wiki -s5

16. Manually verify the diffs on Wikipedia look good and there are no problems

17. Push the remaining diffs

push2wiki -s0

18. Any articles with intervening edits by other users (edit conflicts), reprocess them and upload

push

19. Generate statistics and copy-paste into the request from step #1. Add a ✅ flag to the page.

stats bigten.org