User:Wiae/Copyvio heuristics

General heuristics

 * Presumption of copyright makes it tempting to label all copying from online sources "copyvio" by default
 * But you should still check because edge cases occur frequently
 * Get used to what Wikipedia mirrors look like
 * In 95% of cases you can identify mirrors from URL or from wiki-style references within body
 * If URL looks spammy then decent chance it's not original content
 * Probability of mirror increases as a function of prose quality?
 * Differentiating original sources from nonwiki mirrors
 * Critical because actual terms may only be on original website
 * Smell test: would I have expected site X to write about topic Y?
 * The copied from somewhere argument?
 * Little community appetite to revdel hundreds of diffs
 * Even properly licensed text can still be promotional
 * Is WP:DONATETEXT clear enough about this?
 * UX?
 * Check the following locations:
 * Bottom of the page
 * Links with language like "terms" ("terms of service", "terms of use", etc.)
 * "About us" page
 * "Contact" page
 * Consider whether text in the following areas may in fact be suitably licensed:
 * Government works
 * Stuff that looks like it's not copied from USGov but actually is (globalsecurity?)
 * International bodies (United Nations, etc.)
 * Collaborations with private industry
 * A recent example I found was EarthScope Consortium, where a CC-By 4.0 for "media" was buried on a "How to Cite" page
 * Some journal articles
 * Museum content
 * Earwig is nice but copyvio cannot be reduced to a score
 * False positives
 * List of works not creative (common with academics and artists)
 * Close paraphrasing can happen even with a low Earwig score