Wikipedia:Bare-text copies

Bare-text copies of web pages (also known as text-dump copies) are sometimes created on Wikipedia, by copying and pasting the web page into a Wikipedia page, which is often a user page or user sandbox page. Highlighting and copying a web page and pasting it into a Wikipedia page captures only the text of the web page, and not any of the functionality, structure, formatting, and possible attribution information. Such a copy exists as bare text that may be readable, or may be difficult to read, but it is certainly not how the designer of the web page intended it to function. Sometimes creating the bare-text copy may be the only activity on Wikipedia that the originator engaged in. Another editor, viewing it, will see that its history is that it was created by a user with very few edits, several years earlier, consisting mainly or entirely of creating the copy. It is not obvious why the user created the bare-text copy.

If the copy is of a random web page rather than another Wikipedia page, and the originator has otherwise not been an active editor, it can be tagged for speedy deletion as U5, misuse of Wikipedia as a web host by non-contributors. If the originator has made other edits to Wikipedia, so that the page is not U5-eligible, there are other grounds for deletion. The copyright violation detector at https://copyvios.toolforge.org may be able to detect the page that the bare-text copy was made from. In that case, the copy should be tagged for speedy deletion under the G12 criterion (copyright violation). (Even if very few sites on the Internet take copyright seriously, Wikipedia takes copyright very seriously.)

Sometimes the bare-text copies are of Wikipedia pages—these are unwikified copies. U5 does not apply to copies of Wikipedia pages. Users will sometimes "correctly" copy a Wikipedia page from its source rather than from the screen, in which case the copy will retain the original page's functionality. Irrespective of the method, the user page guidelines prohibit copies of article-space pages (with some niche exceptions of temporary nature), as they fail attribution requirements and are redundant content forks which reflect a historical state of the article, growing progressively more stale over time. They do not fall within any of the criteria for speedy deletion, but such copies should be nominated for deletion at Miscellany for Deletion, and will be deleted by consensus after seven days.

Occasionally multiple bare-text copies of the same page are created by different accounts at approximately the same time. They were likely created by the same person from multiple accounts, which is sockpuppetry. If they were created within the past three months, the accounts that created them should be reported as a sockpuppet investigation. Editors who abuse Wikipedia by creating such copies are sometimes editors who abuse Wikipedia with multiple accounts.

There may be at least two questions about bare-text copies, one of which cannot be answered, and one of which can be answered. The question that cannot be answered is why users create such purposeless copies. It isn't necessary to answer that question in order to answer the question of what to do about them, which is that they should be tagged for deletion using the appropriate deletion processes.

Text dumps as valid contributions
Some care is needed however. Editors can create bare-text drafts in their user space (or elsewhere) that are barely distinguishable from bare-text copies, and may really be copied from somewhere—but that original place may be their word processor or an off-site personal editing space of some other sort. . If there is no evidence that the text dump is a copyright violation, and it resembles the start of a potential article and does not need to be deleted for reasons other than those stated above, the creator should simply be asked about its origins. New editors should not be bitten if their original efforts are entirely unwikified and malformed in various ways; they should be helped instead. A more concerning side of this is if the material comes from a large-language model, which is also one of the ways in which text dumps originate, and also entails copying from a website or an application. New editors should be given guidance about the problems associated with these tools.