User:GreenC/WaybackMedic/trial2


 * Ellen Forney -> Bhutanese Americans


 * Edit 1 – ❌ Added a dead link template, but the link is live . Afterward, the link was rescued by Cyberbot II (diff). North America1000 21:06, 14 April 2016 (UTC)
 * Caused by the "=" being encoded by the bot to %3D which should work but that site is not behaving. I'll remove encoding of "=" in query. -- Green  C  02:57, 15 April 2016 (UTC)


 * Edit 2 - ✅ Correctly removed 13 wayback links blocked by robots.txt (403). -- (GreenC)
 * Edit 3 – ✅ Correctly added dead link template (the link is dead). North America1000 21:12, 14 April 2016 (UTC)
 * Edit 4 - ✅ Correctly removed one wayback links not found and replaced snapshot date for three others. -- (GreenC)
 * Edit 5 - ❌ Removed a working wayback link. Logs show Wayback API reported it was not available and a header check also reported not available. In fact I ran this one twice days apart and got same negative results. It is working now however, and a re-run shows the link working. No explanation. Probably related to something at Internet Archive (robots.txt?) -- (GreenC)
 * Edit 6 - ❌ Known bug fixed see discussion above re: webcitation. -- (GreenC)
 * Edit 7 - ✅ Correctly removed the last link which is a hard 404. The first two links were originally hard-404s and WaybackMedic replaced them with soft-404s as reported by the Wayback API. The bot can't detect most soft 404s in the wayback archive, I don't think any bot could. -- (GreenC)
 * Edit 8 - ✅ Correctly replaced a non-working link with working link. -- (GreenC)
 * Edit 9 - ✅ Correctly removed two non-working links. -- (GreenC)
 * Edit 10 - ✅ Fixed a formatting error. -- (GreenC)
 * Edit 11- ✅ Correctly removed a link blocked by robots.txt -- (GreenC)


 * Edit 12 - ✅ Correctly removed a non-working link (404) -- (GreenC)
 * Edit 13 - ✅ Correctly removed a non-working link (404) -- (GreenC)
 * Edit 14 - ❌ Correctly removed a non-working link (403), but error in formatting as noted above due to hidden LF character. This bug is fixed. -- (GreenC)
 * Edit 15 - ✅ Correctly removed a non-working link (404) -- (GreenC)
 * Edit 16- ✅ Correctly removed a non-working link (302 -> page not found) -- (GreenC)
 * Edit 17 - ❌ Correctly marked 6 links as dead. One link (the 7th) it incorrectly marked as dead (link is live). A re-run correctly detects it live. This link is a redirect, it's possible the page was 404 and the site owners recently added the redirect or some other transient. -- (GreenC)
 * Edit 18 - ✅ - Correctly modified the snapshot dates for 3 links. -- (GreenC)
 * Edit 19 - ✅ Correctly removed a non-working link (404) -- (GreenC)
 * Edit 20 - ✅ Corrected about 6 formatting problems with spurious "1=" in cite templates. -- (GreenC)
 * Edit 21 - ✅ Fixed two malformed urls. -- ((GreenC)
 * Edit 22 - ✅ Corrected dead link. The previous version linked to this dead link (diff) and the bot edit corrected it to this live link (diff). However, the only problem with the initial link is that it had a colon at the end of it, which the bot did not remove (unlikely a bot problem, as it doesn't seem that it would be able to). I removed the colon (diff) and then removed the bot addition (diff) because the link is actually live and did not actually require having an archived link. North America1000 21:23, 14 April 2016 (UTC)
 * GreenC bot successfully removed the colon in the archive version as designed, but you are right it's outside the scope to try and fix the original URL in this case. At least it got the archive version working. -- Green  C  02:57, 15 April 2016 (UTC)


 * List of accidents and incidents involving military aircraft (1940–44) ‎ ->  Henry Fox Talbot


 * Edit 22 - ✅ One link deleted
 * Edit 22 - ✅ Two links deleted, one new snapshot date.
 * Edit 23 - ✅ Four links deleted.
 * Edit 24 - ✅ One link deleted.
 * Edit 25 - ✅ One link deleted
 * Edit 26 - ✅ Two links deleted.
 * Edit 27 - ✅ Two links deleted.
 * Edit 28 - ✅ One link deleted.
 * Edit 29 - ✅ One snapshot changed.
 * Edit 30 - ✅ Two format errors fixed.
 * Edit 31 - ✅ One snapshot change.
 * Edit 32 - ✅ One snapshot change.
 * Edit 33 - ✅ One snapshot change.
 * Edit 34 - ✅ One link deleted.


 * Edit 35 ✅ Two link deleted.
 * Edit 36 - ✅ One link deleted.
 * Edit 37 - ✅ Two snapshot changes.
 * Edit 38 - ❌ One live link incorrectly marked dead. The same bug from Edit 1 in the first set.
 * Edit 39 - ✅ One link removed.
 * Edit 40 - ✅ One link removed.
 * Edit 41 - ✅ One snapshot changed.
 * Edit 42 - ✅ One link removed.
 * Edit 43 - ✅ One link removed.
 * Edit 44 - ✅ One snapshot changed.
 * Edit 45 - ✅ Two deleted, 7 snapshot changed, 3 changed to Library of Congress.
 * Edit 46 - ✅ One deleted.

Results
Four errors fiaxable, two errors not fixable for an error rate of around 4% including all changes made.

Tests

 * Trial4 76-100 - live test (May 12)
 * Trial4 51-75 - live test (May 11)
 * Trial4 26-50 - live test (May 10)
 * Trial4 01-25 - live test (May 9)
 * Trial3 76-100 - live test (April 23)
 * Trial3 51-75 - live test (April 23)
 * Trial3 26-50 - live test (April 22)
 * Trial3 01-25 - live test (April 22)
 * Test4 - dry-run test of 50 (April 21)
 * Test3 - dry-run test of 50 using new Nim version (April 20)
 * Trial2 - live test of 50 with awk-based (April 16)