Wikipedia:WikiProject Copyright Cleanup/2023 backlog drive

For new users
Firstly, thank you for taking the time to help clear the backlog! Your efforts are appreciated. Copyright is complex and nuanced topic to understand so we recommend you start on the easier backlogs to clear, linked below. For CCI, these are mainly pages that involve copying from non-free websites, so no offline research is required. Category-wise, all the suspected violations should have a source URL.

An exhaustive list of instructions for handling text-based copyright violations is available at the top of the copyright problems page. A good guide on how to start editing at CCI is User:Moneytrees/CCI guide. A brief rundown of handling CCIs, but no substitute for reading the relevant pages, is below: Please do not hesitate to ask any experienced editors for help
 * Check for dead links, if there are, use IABot to restore them
 * Run the page through Earwig's copyright detector to get a cursory score. Often mirrors copy from Wikipedia, so make sure to identify these and ignore them.
 * Check the article' sources and compare it to existing text. WP:REX may be helpful for hard to access sources.
 * If you have identified any possibly infringing content with a source
 * Check the page's licence: is it compatible per WP:COMPLIC?
 * If the content is not compatible, remove or rewrite it with a link to the source material in the edit summary
 * Remove the diff from the CCI page and mark it with y. Mark the article talk with CCI
 * If you have identified any possibly infringing content without a source
 * In case of content added by repeat copyright violators at CCI, the content may be presumptively removed
 * Please note this in your edit summary, linking to the CCI page if applicable
 * Otherwise, if you still suspect the content of being plagiarised from a non-free source, removing it under other policies (e.g. if it's unreferenced) may be appropriate.

For returning users
Welcome back, and thanks for taking part. This drive is mainly focusing on CCI, and the rewards system is available below.

Rewards system
For articles at CCI... For everything else...
 * Handling a diff <1k bytes - one point
 * Handling a diff >1k bytes - two points
 * Handling any article - two points
 * Reviewing all diffs of an article - four points

Beginner friendly CCIs

 * Rtkat3
 * Werldwayd
 * 20220720

Category backlogs to clear

 * Category:Copied and pasted articles and sections with url provided
 * Category:Articles with close paraphrasing
 * Category:Suspected copyright infringements without a source
 * Category:Copied and pasted articles and sections
 * Category:Articles with improper non-free content

Construction
Currently, there are significant backlogs in the three principle queues of copyright cleanup: CCI, CP and CopyPatrol. Other parts of the projects have made significant progress with clearing their backlogs through gamifying reviews and providing rewards for a certain number of points. Whilst a backlog drive is appealing, a gamified approach may not be effective in respect to copyright.

The Backlog (August 2023)
Based on rough estimates and database counts, copyright backlogs on Wikipedia are:
 * CCI currently has over 100,000 remaining diffs to be reviewed
 * CopyPatrol currently has ~70 open reports at a time
 * CP is at a manageable level for now

Rough ideas

 * Backlog drive where we reward points for older CCIs
 * Focus on a large CCI that's easier for beginners to tackle (rtkat3, werldwayd, etc.)
 * Tackle low-risk stuff towards the end of CCIs
 * Clear out Category:Copied and pasted articles and sections with url provided, so it doesn't have to be listed at CP
 * Not too big so we could evaluate each once like a CCI review
 * Bot to collate number of articles fixed

Rewards system
Most backlog drives make use of a point/article system, and this would make sense here: barnstars, etc. could be given out for certain criteria in a similar manner to the GAN drive. Finding points can be done automatically relatively easily: the NPP drive made use of bots to collect data such as the backlog size and user points.

The main problem is quality. Unlike the above, it is much more difficult to review individual users, not only because of the sheer number of pages, but the fact that there are a much more finite number of editors with sufficient copyright experience as GAN/NPP experience in the above drives. However, we could still probably get a relatively high standard with a set sample, which will have to be decided. One per 25 pages may be a good starting point but if this is an issue we can amend as appropriate.