User:Moneytrees/Copyright RfC 2023

''Draft by Moneytrees and Sennecaster. This RfC is currently in its brainstorming/draft phase-- please feel free to add issues/concerns here and leave comments on the talk.''

This Request for Comment focuses on addressing issues with and updating the current systems used to address copyright violations on Wikipedia.

Introduction and background
A major aspect of maintenance on Wikipedia is dealing with copyright violations and plagiarism in articles. Project space pages related to copyright cleanup have existed since the early 2000s. As Wikipedia changed and expanded over the years, certain pages and processes were deprecated and superseded. There are now three main venues for Copyright Cleanup:
 * WP:Copypatrol (since 2016), which is a livestream of edits recently tagged by a bot for potential copyright issues
 * WP:Contributor copyright investigations (CCI, since 2009), which investigates the major edits of users with histories of copyright violations. Requests can be submitted by anyone, and a case can only be opened by CCI clerks or uninvolved administrators.
 * WP:Copyright problems (CPN, since ~2003), where articles with substantial and/or complicated plagiarized text are sent.

There has always been a substantial backlog in the area, which became significantly worse in the mid 2010s after several editors experienced in the area retired. Editors with histories of copyright violations took a long time to be blocked, and blocks were often unreported to CCI. Efforts in the area were revived on a larger scale in 2020, resulting in more blocks and CCIs being opened. Although things in the area are better than they were a few years ago, there are still large backlogs and an exceeding amount of work to be done. The area is due for an update.

What are the problems

 * The backlog is unmanageable in its current state.
 * CCIs are open for too long.
 * Not all copyright blocks are immediately reported to CCI. Cited websites may die over the years and become inaccessible, leaving it uncertain if the content was copied or not.
 * Editors with several warnings are not blocked soon enough.
 * Editing at CCI is too tedious and time consuming.
 * High rate of burnout from copyright editors.
 * Editors find the area daunting to contribute in, limiting participation. Not enough understanding.
 * Work also requires a high amount of entry knowledge to get involved, with poor instructions (and little time on the part of contributors who are knowledgeable to go and BOLDly update all the instructions)
 * Bus factor in the area – things slow down when key editors aren’t around.
 * Limited resources – articles/books are often behind paywalls and difficult to access.
 * CPN is tedious to work through. Dates need to be manually set, and certain articles need to be deleted after seven days.
 * Low bus factor means that certain actions, especially admin actions or ones that are benefited by multiple perspectives, can be delayed by over a month
 * Reliance on front-end processes to find violators
 * NPP, CopyPatrol - CopyPatrol hits all users, but is limited to what TurnItIn can find as well as having a high rate of false positives. NPP is not equipped to deal with complex copyvio, and mostly relies on Earwig’s copyvio detector (with google search on), therefore missing book and translationvio. Earwig doesn't have consistent access to TurnItIn and can only really find obvious website violations.

General Answers

 * Increase automation in the area.
 * Make it easier for the general editing population to participate.
 * Change the layout of CCIs to make cleanup more convenient.
 * Have a better track of who is warned for copyright violations and when.

Specific answers

 * Create a sort of filter that warns an editor when adding content that is close to a source, similar to when adding a link on the spam blacklist. An iThentiacte report will be generated for comparison when the edit is made.
 * Lower tolerated threshold of repeated copyright violations before a block
 * Current standard is 5 warnings before a block, or usually upon the discovery of multiple violations after a CCI/CPN/Copypatrol report at the discretion of admins
 * Proposed change: 3 warnings (we can add a clause about timeframes; 3 in 1-2 years is concerning, warnings a decade ago and no recent problems shouldn’t lead to a block)
 * Codify procedures surrounding copyright blocks
 * Structured appeals process similar to AE
 * Have all use of warning templates logged by a bot
 * Add to WP:NEVERUNBLOCK to avoid ill-advised unblocks that result in further copyright violations
 * Add to policy/guideline that copyright blocks should be indefinite
 * Adminbot for CPN - trusted botop, or potentially even multiple maintainers
 * Could delete fully blanked articles/articles tagged with "presumptive deletion" after seven days, could automate the "bumping" of dates at CPN so it isn't manual
 * Reasoning for multiple maintainers; in multiple occasions in the past, Eranbot has gone down, and the maintainer has not been able to implement repairs or restart the bot in time. This would be alleviated by multiple maintainers who each have secure access.
 * Downside: Adminbot. Requires a high level of security and community trust, and multiple maintainers would be difficult
 * Have a page where warning for copyright violations are logged. Maintained by a bot every time a template is used, has a rolling archive. Kind of looks like AElog?
 * Post a notice on WP:AN whenever a major CCI is opened to increase community awareness
 * Improvement of documentation pages to lower the barrier of entry
 * Convert CopyPatrol from a soft redirect to an information page describing how to use the tool and evaluate reports to determine what course of action is appropriate (tag for revdel, send to CP, attribute, etc.), similar to User:Moneytrees/CCI guide
 * Write an actual guide to Copypatrol