Wikipedia talk:Arbitration Committee/Requests for comment/Article creation at scale/Archive 2

Introduction
This is the pre-RfC workshop for the RfC(s) about article creation and deletion at scale. Per the rules below, please feel free to add proposed issues or solutions; other suggestions, comments, questions or replies should be made within your own section.

This pre-RfC discussion has been announced at the articles for deletion talk page, the Arbitration Noticeboard, the administrators' noticeboard, the Bot policy talk page, and the Village pump (policy).

Background
Page-related actions done at scale can overwhelm the community's ability to adequately monitor and participate effectively. The issue is exacerbated in the case of article creation at scale because it escapes the normal notification system.

In the past, Wikipedia did not discourage article creation at scale (see Definitions below) under the assumption this was the best way to achieve broad coverage of vast subjects such as sports, plant and animal life, geography. There exists a policy that automated or semi-automated creation requires a bot request for approval. More recently, concerns have been raised in multiple venues that the continuing creation of such articles has overwhelmed editors’ ability to track and assess these articles, and that the churn has become a waste of time and a cause of disruption. In a 2022 August decision, the Arbitration Committee (ArbCom) has ordered an RfC addressing "how to handle mass nominations at Articles for Deletion" (termed "AfD at scale").

A strong argument has been made that the article creation at scale (sometimes known as mass, rapid, or large-scale creation) is one of the causes of dysfunction at AfD with regard to article deletions at scale, and that addressing this issue is a necessary precursor to the ArbCom-ordered RfC addressing AfD at scale.

Purpose of this discussion
This discussion is to identify the issues with article creation/deletion at scale, to workshop initial proposals in preparation for the RfC(s), and to decide how to handle the RfCs.

Specifically, you are asked to address the questions:


 * 1) What are the primary problematic issues surrounding the article creation or deletion at scale which should be addressed in policy? (Proposed issues)
 * 2) How might we address these issues? (Proposed solutions)
 * 3) How should we structure the discussions? That is, do we need to run two RfCs, or can we run one? And if we do need two, do they need to be run consecutively or can they overlap?

Rules

 * 1) All editors are required to maintain a proper level of decorum. Rudeness, hostility, casting aspersions, and battleground mentality will not be tolerated. Inappropriate conduct will result in a partial block (p-block) from this discussion.
 * 2) The sole purpose of this discussion is to identify problematic issues surrounding article creation/deletion at scale and to workshop proposed solutions to be used in the resulting RfC(s). It is not a venue for personal opinion on past creation or creators of such articles or about previous tolerance of such creations, nor about past mass deletions, ditto. Editors posting off-topic may be p-blocked from this discussion.
 * 3) All comments must be about problematic issues and proposed policy changes surrounding article creation/deletion at scale or about structuring the resulting RfC(s). Comments about any contributor are prohibited and will result in a p-block from this discussion. Any violations will be reverted, removed, or redacted.
 * 4) Please do not make changes in issues/solutions that have already been posted. Anyone is permitted to post additional issues/solutions, below the existing ones. Moderators may at their discretion merge, edit, or condense issues/proposals at any point in the process. Any user may suggest such changes.
 * 5) Please make all proposals within seven days of the start of this discussion. Subsequent proposals may be brought up in an editor's own section on the talk page for consideration and inclusion at the discretion of the moderators.
 * 6) Please use subsections to number proposed solutions to correspond to a particular issue; that is, if you have a second proposed solution for Issue 1, number that as Proposed solution 1.2 and insert it between Proposed solution 1 and Proposed solution 2.
 * 7) This discussion will be unthreaded. Please create your own section within the comments section, placing your username in the section header. Within your own section you may present your opinions on the proposed issues or proposals to be addressed, post questions to other editors, or respond to other editors. Threaded comments will be moved or removed by moderators/clerk.
 * 8) Within their comment section each editor is limited to 800 words, including questions to and replies to other editors. (word count tool) Overlength statements will be collapsed until shortened.
 * 9) If you believe someone has violated these rules, please speak to a moderator on their talk page, not here. If you believe the moderators are behaving inappropriately, please speak to an arbcom member on their talk page or by email.
 * 10) This discussion will be open for at least 7 days and will be closed by the moderators at their discretion.
 * 11) Per their order, any appeals of a moderator decision may only be made to the Arbitration Committee at WP:Arbitration/Requests/Clarification and Amendment.

Moderators of this discussion
The Arbitration Committee has appointed two moderators for this discussion and the RfCs: Additional clerking help:

Statistics for mass creation

 * 1) Editors who have created more than seven articles in the past week, including lists and disambiguation pages
 * 2) Editors who have created more than seven articles in the past week, excluding lists and disambiguation pages
 * 3) Editors who have created more than ten articles in June
 * 4) Editors who have created more than ten articles in July
 * 5) Editors who have created more than ten articles in August
 * 6) Editors who have created more than 100 articles in the past year
 * 7) Editors who have created more than 100 articles in the past year, by month
 * 8) Editors who created more than than 10 articles in 2021, by month
 * 9) Editors who created more than than 10 articles in 2020, by month
 * 10) Editors who created more than than 10 articles in 2019, by month
 * 11) Editors by number of articles created in the past five years

Notes:
 * 1) None of these contain redirects that were converted into articles by the listed editor, but they do contain redirects that were converted into articles by other editors. I'm looking into fixing the latter; the former can be fixed for smaller datasets, but is too intensive for larger ones.
 * 2) External links counts can be suggestive about the quality of the article, it can also be meaningless - a low number may be because a large number of offline sources were used, while a high number may be because a template that provides links to a large number of database sources was added.


 * 1) Articles by editor by day over one year (1138 editor-days exceeded 10 articles; 163 exceeded 25)
 * 2) Articles by editor by week over one year (922 editor-weeks exceeded 20 articles, 150 exceeded 50)
 * 3) Articles by editor by month over one year (640 editor-months exceeded 40 articles, 123 exceeded 100)
 * 4) Articles by editor by year since 2020 (1156 editor-years exceeded 80 articles; 407 exceeded 200)

Note that these do attempt to exclude false positives from editors converting redirects created by the original editor, but some still exist, and this attempt does result in some false negatives. This is also the reason why a hard technical limit will be difficult; we will need some way to identify editors converting redirects into articles, and count those articles towards their count rather than towards the count of the original article creator. BilledMammal

Proposed questions for first of two RfCs
These are the suggested solutions to the issue of article creations at scale that I was able to distill from this workshop. Note that I’ve intentionally combined/shortened/simplified as much as possible, so please point out if I've:
 * 1) Combined proposed questions that need to be separate
 * 2) Left out an important consideration or rationale
 * 3) Missed something altogether

I've created sections below these for endorsement/nonendorsement and any comments or suggestions. Valereee (talk) 14:14, 7 September 2022 (UTC)

1. Clarify SNG policy
 * Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. (Note: there was another suggestion to require 2 sources, which I'd originally thought to add as an alternative, but I thought it might discourage consensus.)

2. New Creations Report
 * Develop a bot to produce a report listing new creations that is sortable/filterable by editor, category, time range.

3. Creator-at-scale permission
 * Create a userright to allow creation at scale. Users without this permission would be prevented from creating more than 25 articles/day or 50/week or 100/month or 500/year. Create a dedicated forum to request this right and where requesting and granting this right can be discussed.

4. Require consideration of alternatives to creation
 * Create policy to require consideration of alternatives to creation, with sanctions for those who do not adhere to such policy.

5. Clarify WP:BEFORE
 * Creations under SNGs can be assumed to be cited to the best readily-available sources.

6. Clarify SNG policy
 * Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions.

7. Require a GNG-quality source
 * Require all articles created under SNGs (other than those which confer notability) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in a n independent reliable source.

8. Mass creations noticeboard
 * Create a dedicated noticeboard to allow for consensus for, notifications of, reports of, and discussions of mass creations and the sources used for such creations. (Details to be developed there.)

Please endorse/not endorse for inclusion in the RfC or make suggestions for each question within the sections below.

Question 1: Clarify SNG policy

 * Endorse. I think going with "one source" is the right call; people who want two sources will just say so in their !vote, and the closers will be perfectly capable of determining the consensus. Levivich😃 16:43, 7 September 2022 (UTC)
 * Reading the comments below, I can see the benefit of splitting the first sentence of Q1 from the second sentence and just running some version of the 2nd sentence, e.g. "require any articles on topics that must meet GNG to have at least one GNG source", or some variation thereof. This punts on the question of which topics must meet GNG, but I think that's OK (as it doesn't relate solely to mass creation). Levivich😃 01:22, 8 September 2022 (UTC)
 * Agree with Levivich. However, I would reword Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. to Require all creations not under SNGs that confers notability to have at least one source which would plausibly contribute to GNG., to make it clear that this restriction also applies to creations that are not under any SNG. BilledMammal (talk) 18:38, 7 September 2022 (UTC)
 * @BilledMammal, it seems like that broadens it significantly. Valereee (talk) 19:38, 7 September 2022 (UTC)
 * I don't think so, unless the intent was for it to not apply to creations that are not covered by any SNG? BilledMammal (talk) 19:45, 7 September 2022 (UTC)
 * Endorse.—S Marshall T/C 22:50, 7 September 2022 (UTC)
 * Oppose the "clarify SNG vs GNG" part. I don't necessarily think this is a bad idea, in principle, but it's a very broad thing to do that is mostly unrelated to the issue at hand, mass creations of low-quality stubs, and something that is better done incrementally as an individual process for SNGs rather than as some kind of catch-all where we can expect most participants to be unfamiliar with the requirements of the individual subjects under discussion. Additionally I think doing this clarification well requires some thoughtfulness about its long-term consequences that might be lost in a poll of editors inflamed by the mass-stub issue. Instead, it is reasonable to expect many respondents to take the position "we must do something about mass creations, this is something, therefore we must do it", regardless of whether the proposed clarifications actually affect mass creations. Additionally, this part is misleadingly named, and inappropriately bundled, because "Require all creations under SNGs that do not confer notability to have at least one source" is not about clarification of SNGs. I would be supportive of polling on a separate bullet point that is just this requirement, with a better title and without the imposition of a new process to clarify SNGs. —David Eppstein (talk) 23:14, 7 September 2022 (UTC)
 * PS "to eliminate contradictions": if this means, eliminate places where the policy says two things that contradict each other, about the same articles, then that's again a laudable goal (although, as above, beyond the scope of this RfC). If it means, fit all SNGs to a single Procrustian bed, eliminating all ways in which some of them do things differently than others, then it's a total non-starter. For one thing, it would eliminate the stricter rules for NCORP. For another, it would eliminate most of our articles about living academics. Essentially, it would eliminate all SNGs, because what would be the point of having a SNG that could only say to follow GNG? At the very least, the wording here is far too ambiguous. —David Eppstein (talk) 05:59, 8 September 2022 (UTC)
 * It doesn't meant reducing all SNGs to the same level; we might as well get rid of them. When I made the proposal above, I referred specifically to clarifying whether or not they grant notability independent from GNG, because that isn't clear in many cases. Vanamonde (Talk) 06:23, 8 September 2022 (UTC)
 * Oppose in current form. While I am not against clarifying the relationship of SNGs to the GNG, I do not support the wording on the number of sources required. My position is that simply requiring one source per article does not address the problem of large numbers of articles being created from entries in a database, which is part of what triggered this RfC, and was involved in the case about Carlossuarez46. I would support either requiring two reliable sources, as proposed by BilledMammal, or, as I proposed at the start of this discussion, requiring one additional reliable source in addition to any source from a database. - 23:19, 7 September 2022 (UTC) Donald Albury 23:29, 7 September 2022 (UTC) (re-signed)
 * "simply requiring one source per article does not address the problem of large numbers of articles being created from entries in a database" but this would require one GNG source, which would address the problem of articles sourced only to database sources. Levivich😃 01:21, 8 September 2022 (UTC)
 * Assuming you believe that there isn't any database in the world that would constitute "a GNG source", which is not something the community has decided ...and if you think there isn't, then I invite you to look at this database entry, which contains about 400 complete sentences about the subject. WhatamIdoing (talk) 01:47, 8 September 2022 (UTC)
 * To take this requirement even farther to the point of absurdity: would prohibiting articles created from entries in a database mean we are disallowed from using Google (a database) to find sources for our new articles? —David Eppstein (talk) 06:04, 8 September 2022 (UTC)
 * Both of these arguments are straw men. There is no proposal to disallow database sources, it's to require GNG sources. If a database source meets GNG then it would be a GNG source. And David I'm sure you understand the difference between citing to a database and using a database to find a source, and I'm sure you don't cite to Google search results. :-) Levivich😃 13:35, 8 September 2022 (UTC)
 * @Donald Albury, are you satisfied, with the understanding that in order to support GNG, a simple short mention -- what I think you mean when you talk about database entries, as opposed to the significant coverage in the entry WAID is linking to above -- wouldn't be sufficient and so Q1 would require at least one other source? Valereee (talk) 14:49, 8 September 2022 (UTC)
 * Endorse but make the one source "in-depth, detailed coverage" that satisfies V and NOR.  Atsme 💬 📧 23:52, 7 September 2022 (UTC)
 * Neither V nor NOR require in-depth detailed coverage of their sources. Simple claims can be based on simple sources. The requirement for a source to have depth is purely a GNG thing, not V or NOR. —David Eppstein (talk) 06:07, 8 September 2022 (UTC)
 * Oppose in current form per David Eppstein. This conflates multiple things, misses important aspects (e.g. consideration of consequences of changes) and on its own will not solve the main problems. Thryduulf (talk) 00:06, 8 September 2022 (UTC)
 * I doubt this will result in a clear, simple resolution. Even if I am wrong and the RFC results in this proposal being not only agreed to, but also all of the pages updated with clear statements, we will still have fights over what constitutes a "source which would plausibly contribute to GNG", because everyone knows that two sentences about *my* important subjects indicates notability, but that twice as many sentences about *their* worthless subject are not only useless but probably also copied from a secret press release after bribing the publisher.  We will also struggle because we have never resolved whether the GNG's requirement of "multiple sources" that are independent, secondary, reliable and containing significant coverage means that a source that is independent, reliable and SIGCOV but not technically secondary (e.g., WP:PRIMARYNEWS) is something that "counts" towards notability, and we definitely haven't found an objective or consistent way to measure significant coverage (two consecutive sentences?  200 words? Ten severable facts that belong in an encyclopedia article?).  Bottom line:  I think this will fail to reach a decision, and even if it does, I think it will fail to solve the problems as they appear in individual articles.  WhatamIdoing (talk) 01:02, 8 September 2022 (UTC)
 * Endorse with a few caveats. Based on the arguments I'm seeing at sports AfDs, we should make it clear that a single SIGCOV source is required to avoid speedy deletion but not necessarily sufficient to meet GNG/SNG or pass AfD. Regarding GNG vs SNG, a common point of conflict is that "meets either the general notability guideline (GNG) below, or the criteria outlined in a subject-specific notability guideline (SNG) listed in the box on the right" (from WP:N) is often interpreted to mean that an article is notable if it meets any criteria within a SNG, even if the SNG lead states that it is subordinate to GNG, so it would be good to clear up. This could be as simple as adding a note at WP:N to that effect. –dlthewave ☎ 01:47, 8 September 2022 (UTC)
 * Weakly oppose as stated. Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. This could be read as if we are clarifying the relationship to GNG of each SNG, individually. Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. What about "require all GNG-based SNGs..." to cut down on ambiguity? I would definitely support any proposal that required at least one (or two, or three, or five...) GNG-contributing source for new creations, but anticipate resistance if we don't also have an idea of how we would enforce this. Also, based on the way this is going at NSPORT, we will definitely have editors insisting an injury report or listicle entry or anything with the subject's name in the headline or single sentences they believe "demonstrate significance" (e.g. "X, a preeminent Y-er with a remarkable career lasting 11 decades, won the prestigious Z award yesterday.") are SIGCOV, so we'll probably end up addressing that eventually too (not in this RfC). JoelleJay (talk) 02:46, 8 September 2022 (UTC)
 * Weak oppose the clarification portion, and I agree that this question is really two parts. No comment for now on the second. My intuition is that this (very broad) question intersects too narrowly with an RfC on "article creation at scale/en masse". I haven't had much time for Wikipedia recently, but I'm really disappointed we got this far -- and are soon having the full RfC?? -- without considering examples. Are people disgruntled over WP:NACADEMIC? (Mass-created stubs on academics would be funny to see. FIRST LAST (born XXXX) is a PROFESSION at the INSTITUTE studying TOPIC. Their dissertation was called TITLE (YEAR). ) Over the panoply at WP:NSPORTS? (I see a lot of athletes at NPP, but I dare not patrol those articles because I simply haven't the brain cells to understand that guideline.) Or maybe it's WP:GEOLAND? Then, even if whatever SNG in question were finally deemed subservient to GNG, would that solve the issue of mass creation? Ovinus (talk) 03:34, 8 September 2022 (UTC)
 * Endorse: Clarification of any policy that is frequently misinterpreted is a good thing, and misinterpretation of there policies does appear to be a problem with some mass creators. Contravening a clearly understandable policy becomes a behavioural problem, allowing a different set of remedies for unacceptable conduct. How and where these policies should be clarifies is another question. Differences in application of the policies may be appropriate for occasional article creators vs. creators of batches of similar stubs &middot; &middot; &middot; Peter Southwood (talk): 05:14, 8 September 2022 (UTC)
 * Endorse, and no objections to a split. I don't understand some of the arguments above;, the sufficiency of NSPORTS for supporting creation or continued existence is responsible for approximately half the deletion-related conflict we see; how is it too specific an issue for this discussion? Vanamonde (Talk) 06:12, 8 September 2022 (UTC)
 * NSPORTS has been the subject of a recent referendum that made big changes in our interpretation of it and is a fresh wound that needs healing, not immediate reopening. Most of the recent conflict has been because those changes have not had time to settle and become established, especially among some editors who were content with the old consensus, and because they involve a lot of changes to which actual articles we should keep. In that specific case, I think asking for another poll and another do-over is a mistake. But the proposed wording goes far beyond NSPORTS and asks us to revisit the details and independence of all SNGs. That is a huge can of worms that I would prefer not to open and especially not to subject to what is essentially the whim of a torch-and-pitchfork mob focused on a monster and not paying attention to the nearby straw roofs that their torches are lighting on fire. —David Eppstein (talk) 06:17, 8 September 2022 (UTC)
 * I can understand not wanting to revisit a difficult conversation, but if we are to avoid discussing NSPORTS and other SNGs used to justify mass creation, we should give this RfC up right now. The issues are inseparable. Vanamonde (Talk) 06:21, 8 September 2022 (UTC)
 * NSPORTS, under the new consensus, does not support mass creation. Neither does NGEO. Both of those are (now) the type of SNG that merely suggests to editors situations where sourcing is likely to exist, but defer to the GNG in requiring that the sourcing actually exist. What needs clarification is not what these guidelines say about notability, but rather how clearly actual notability (and not just the likelihood of notability) needs to be demonstrated at article creation time. The wording of the question is also problematic in a different way: it is worded in a way that assumes that there are only two kinds of SNG (those that defer to GNG and those that are independent of GNG). There is a third kind: SNGs that go beyond GNG in their restriction on what kinds of sources can convey notability. Both NCORP (which requires that sources be nonlocal) and, arguably NPOL (which at least as practiced, if not in its literal wording, prevents using coverage of unelected candidates for notability) are of that type. For articles that would fall under one of those SNGs, should the source that is provided meet the stricter standards of the SNG? Your question doesn't say. And who is to judge which SNG or GNG is the right choice for any individual article? It's not always an easy question (for instance the line between WP:PROF and WP:AUTHOR can be very unclear). —David Eppstein (talk) 06:34, 8 September 2022 (UTC)
 * Completely endorse. Would go with two sources as well. --WhoIs 127.0.0.1 ping/loopback 06:39, 8 September 2022 (UTC)
 * See, User:Vanamonde93, this kind of answer is exactly why it is a very bad idea to bundle things together. You have asked two questions, and received a positive answer to one that would be used as evidence of consensus for the other even though it does not address it at all. —David Eppstein (talk) 06:57, 8 September 2022 (UTC)
 * But I didn't bundle them, and am unopposed to bundling, so I don't see why you are directing that remark at me. To answer your point above; NSPORTS never did support mass creation explicitly, but was still used as justification for it. As such the recent RfC changes nothing. NGEO does confer notability independent of GNG, for a subset of geographic features that meet GEOLAND (and if two admins disagree on this point, it makes the need for a clarification that much more obvious). The question doesn't address NCORP at all, and I'm well aware that it's more restrictive than GNG (I've said so elsewhere over the course of this discussion). Most fundamentally, "how clearly actual notability (and not just the likelihood of notability) needs to be demonstrated at article creation time" can't be tackled when thousands of creations, and thousands of AfD !votes, have treated the likelihood of notability as actual notability. We need community consensus affirming that those are different, and how they are different. Vanamonde (Talk) 07:02, 8 September 2022 (UTC)
 * Re: "have treated the likelihood of notability as actual notability. We need community consensus affirming that those are different": then state that much more specifically and unambiguously as a poll question, rather than asking "whether each specific SNG directly confers notability independent of GNG", a completely different question that the SNGs already largely state answers to. If you want consensus that likely notability is different from actual notability, and that actual notability needs to be demonstrated, you won't get it from a question that doesn't address that issue. The only reason to ask "whether each specific SNG directly confers notability independent of GNG" is to change what the SNGs already state about which kind of SNG each one is. If you didn't want to change that relation, you could just read the SNG instead of asking in a poll. Changing the individual consensus of all the SNGs at once, as a byproduct of a process focused on something else, is exactly what I'm opposed to. Clarifying the interpretation of "likely notability" is much better focused but is not what the current wording asks for.
 * But now I'm confused about your role. Are you a moderator here, overseeing the process in a neutral way and making sure order and consensus are maintained, or are you leading the charge, pushing for change and setting an agenda that is worded in a non-neutral way that guides participants in the direction you think they should be guided? I thought it was the former but this interaction tends to make me think it is the latter instead. —David Eppstein (talk) 07:10, 8 September 2022 (UTC)
 * Flattered as I am to be confused with, we're not the same person. I'm not a moderator, and I certainly have opinions about this issue, so I wouldn't have volunteered to be one. I think we need community consensus affirming that most SNGs do not confer automatic notability, but the question is worded more broadly out of fairness; if the community at large wants to declare that all SNGs do, that's an option they ought to have. Hence my support for the current question. I assume there will be considerable wordsmithing before it's actually posted, and also these aren't actual proposals, I assume; we'd want the community to !vote on what clarification would look like. Vanamonde (Talk) 07:24, 8 September 2022 (UTC) (Added post-ec): I wish you were right that we could "just read the SNG"; but we can't. Because WP:N is confusingly worded, because language use isn't consitent, and because some wolly notion of AfD conventions has not infrequently been used to argue against a simple reading of the SNGs. If we don't change the status of any SNGs, we're still left with wording and convention issues. Vanamonde (Talk) 07:24, 8 September 2022 (UTC)
 * Oh, have I confused your identities? Oops. I do apologize, and would not have been as argumentative had I not been confused. —David Eppstein (talk) 07:27, 8 September 2022 (UTC)
 * I'm sorry, what about my endorsing the question being in the RFC (and endorsing the proposers initial thoughts about 2 sources) make it 'the kind of answer' that implies any consensus beyond it should be in the RFC? Did you reply to the correct comment? I don't really understand the reply and it doesn't seem to make much sense. --WhoIs 127.0.0.1 ping/loopback 08:03, 8 September 2022 (UTC)


 * Lol...for anyone who was wondering why we needed the first part of the discussion to be unthreaded and to allow limited word counts, this is why. Valereee (talk) 12:40, 8 September 2022 (UTC)


 * Comment - This doesn't touch GeoStub articles which are a massive problem. SPORTSBIO articles already have this requirement. What exactly is this directed to? Species stubs? FOARP (talk) 14:26, 8 September 2022 (UTC)

Question 2: New creations report

 * This seems like a potentially useful thing that could support any outcome of this RfC, but doesn't require consensus here to do, so maybe not necessary to include. &mdash; Rhododendrites  talk \\ 15:14, 7 September 2022 (UTC)
 * I agree with Rhododendrites. Any interested person should feel free to go ahead and work on this proposal now. isaacl (talk) 15:42, 7 September 2022 (UTC)
 * I agree with the above: it's a great idea, and should be removed because it doesn't need consensus and its removal would simplify the RFC. Interested editors should feel free to go work on it (and to ping me if they want help). Levivich😃 16:45, 7 September 2022 (UTC)
 * I'll leave this here for a day or so to give others a chance to chime in, unless someone pings me to say 'We've started work on this.' :D Valereee (talk) 17:33, 7 September 2022 (UTC)
 * The database side of this is dead simple; it "just" needs a friendly UI. Example without filtering, though that's just as easy. —Cryptic 23:12, 7 September 2022 (UTC)
 * Endorse. Should be in place irregardless of the outcome. --Enos733 (talk) 17:45, 7 September 2022 (UTC)
 * This idea seems like it has a lot in common with the existing new pages feed. ~  ONUnicorn (Talk&#124;Contribs) problem solving 17:51, 7 September 2022 (UTC)
 * Also with Special:Log/create. And Petscan can filter for new pages by category (though not by creating user). —Cryptic 22:58, 7 September 2022 (UTC)
 * I feel this is out of scope.—S Marshall T/C 22:52, 7 September 2022 (UTC)
 * Agree with Rhododendrites: useful but doable without polling and therefore not poll-worthy. —David Eppstein (talk) 22:58, 7 September 2022 (UTC)
 * Agree, in fact, any and all stats that we can get is A-OK with me! SMirC-thumbsup.svg  Atsme 💬 📧 23:54, 7 September 2022 (UTC)
 * I agree with Rhododendrites. Collecting stats and producing reports doesn't need consensus. Thryduulf (talk) 00:07, 8 September 2022 (UTC)
 * I don't think that the description as written will work. Specifically, I don't think that "filterable by category" is achievable in a wikitext table (which is the only way I know to have anything "sortable" on wiki) because articles can have multiple cats.  I think this would have to be done in Toolforge.  WhatamIdoing (talk) 01:06, 8 September 2022 (UTC)
 * Yes please -- I'd attempt to make it, but I haven't the time. Numbers are needed to inform the full RfC. (Btw, where did this oft-cited 25 articles/day figure come from? Divine inspiration?) Ovinus (talk) 03:37, 8 September 2022 (UTC)
 * Does not require consensus and would be useful to very useful, depending on what it reports. Should there be a project discussion to consider what we would like to see in the output? When could we see a prototype?&middot; &middot; &middot; Peter Southwood (talk): 05:50, 8 September 2022 (UTC)

Question 3: Creator-at-scale permission

 * Maybe I'm misreading the data above but it looks like 500/yr, in which case I wonder if an actual userright is necessary, as opposed to just having a list of authorized editors somewhere. A "pseudo-userright" like AWB would probably be easier to implement. Otherwise the thresholds and wording, etc., look good. Levivich😃 16:58, 7 September 2022 (UTC)
 * @Levivich, it would limit all -- more than 25 a day, and more than 50 a week, and more than 100 in a month, in addition to 500 in year. All four levels of creation would be captured. It was to make sure of capturing someone who, for instance in the case of the 500, sat down every weekend and created 24 articles over a period of 40 hours, but also those who created 100 in a week then didn't edit for six. Valereee (talk) 17:14, 7 September 2022 (UTC)
 * Maybe it needs a clarification? Valereee (talk) 17:27, 7 September 2022 (UTC)
 * means by technical limitation, i.e. a software rate limit, as opposed to "would not be allowed to"? That might be good to clarify, like . I'm not sure exactly what the thing that would do the prevention is called (edit filter?). Levivich😃 17:56, 7 September 2022 (UTC)
 * So we're confident a software rate limit that ignores the creation of redirects is doable? Just checking. Valereee (talk) 18:16, 7 September 2022 (UTC)
 * Maybe? I think this is something we would need to ask the WMF for. BilledMammal (talk) 18:35, 7 September 2022 (UTC)
 * Maybe also split this one into two: (1) what specific rate limits for a rate limit policy, if any (I still like the proposed thresholds of 25d/50w/100m/500y), and (2) whether/how to enforce that, e.g. a software rate limit, a userright to exceed it, or both. Levivich😃 16:55, 8 September 2022 (UTC)


 * Endorse. I presume the numbers are flexible, but this is the narrowest way forward to address the problem of mass creation. --Enos733 (talk) 17:47, 7 September 2022 (UTC)
 * I don't believe this is suitable, or would find consensus - it would have too much of an impact on genuinely highly productive editors, and it wouldn't end up constraining mass creation as once granted an editor would be free to make as many articles as they like. In particular, I imagine Lugnuts would have ended up with the right, and considering how long it took to revert his autopatrolled right I don't think it would have been revoked quickly.
 * Instead, we need a policy that constrains actual mass creation; something that says if you want to create more than 10 highly similar articles, you need approval for that group. BilledMammal (talk) 18:35, 7 September 2022 (UTC)
 * The question here is whether the question goes to an RFC. I think that the proposal here is close to your alternative - allowing a highly productive article creator to get an advanced user right (approved by a group) to create more than (10/25/X) articles in a specific period - Enos733 (talk) 20:32, 7 September 2022 (UTC)
 * I'm suggesting a slightly different one go to RfC. My objection is that this will require productive but otherwise unproblematic editors to go through the process, while also not constraining problematic ones. The ideal here would be to maintain something similar to WP:MASSCREATE; require each "group" of mass created articles to receive consensus, rather than editors with this permission having carte blanche to create whatever they like. BilledMammal (talk) 20:37, 7 September 2022 (UTC)
 * @BilledMammal, you see asking for a userright one time -- a userright which like any other can be removed if it's abused, which means it's not really 'carte blanche' -- as more onerous than asking for consensus for every planned mass creation? Valereee (talk) 12:50, 8 September 2022 (UTC)
 * It will be more onerous for prolific editors who don't engage in mass creation, and that I something I want to avoid. In addition, I don't think we want to approve mass creation by editor, I think we want to approve it by mass creation; while we might approve an editor mass creating articles on topic A, that doesn't mean we would approve the same editor mass creating articles on topic B.
 * In addition, I'm not convinced that we'll remove the user right in a timely manner if it is abused; my earlier example of Lugnut's abusing autopatrolled for years demonstrates that. BilledMammal (talk) 04:17, 9 September 2022 (UTC)


 * This may have in some part come from my proposed solution #10 above. On thinking about a technical intervention more, I'm thinking that it would be hard to find a number that is (a) high enough that it wouldn't snare too many people who aren't working on "mass creation" but just creating a lot of unrelated articles, while also (b) low enough that it addresses the concerns expressed in recent conversations about mass creation. I'm increasingly of the mind that we shouldn't even include in this RfC the creation of lots of completely unrelated articles and focus on creation of articles on a theme/topic/boilerplate/based on the same sourcing. If that's the case, the number of users the community would trust to do that without asking permission first would be very small, I suspect, and could just be documented as being exempt from whatever process we create. &mdash; Rhododendrites  talk \\ 20:40, 7 September 2022 (UTC)
 * I feel the numbers here set the bar for mass-creation far too high. 1 a day is mass creation imo.—S Marshall T/C 22:53, 7 September 2022 (UTC)
 * This is the part of these proposals that I think is the most appropriately focused on mass creations, so I support some mechanism like this to watch over and if necessary limit them. It doesn't have to be this specific mechanism. The choice of threshold is obviously an issue, but I think it's appropriate to set it at a level where sustained but hand-crafted creation of articles does not run into artificial limits, but where any kind of mechanical or boilerplate reproduction of article content does. For reference, I believe my own article creation to be "sustained but hand-crafted", with typical numbers of 1–3 new articles per day, usually targeted as being start-class. —David Eppstein (talk) 23:06, 7 September 2022 (UTC)
 * Oppose. I think any attempt to meter article creation this way is a problem. It is not the number of articles a user creates per se that is the problem, it is the large number of poorly-sourced stubs (generally produced from a database) that a few editors have created that is the problem. I think requiring that new articles meet minimum standards of sourcing will go a long way in dealing with the problem without having to create rules about how many articles an editor can create in some time period. - Donald Albury 23:27, 7 September 2022 (UTC)
 * Oppose. Number of creations isn't a problem, it's the content that matters. Redirects would also need to be excluded, especially if you go any lower than the figures above, yet that would allow the mass creation of redirects which is sometimes problematic. In other words it would create problems that don't currently exist without solving ones that do. Thryduulf (talk) 00:12, 8 September 2022 (UTC)
 * Endorse a limit – NPP is swimming in a 10k+ backlog right now. Back in June we were close to a 16k backlog. But worse, if these mass creations are autopatrolled, then they go straight to main space and get indexed w/o being reviewed. If they are not autopatrolled, then it's a root canal to get the non-notable/promo/unsourced stubs removed. Our CSD language is far too exacting. For example, a 2 sentence stub announcing a brand new stadium that was built in a little city in South America - new stadium with nothing notable beyond new stadium exists and the Mayor is excited about it, yada yada – the locals want the world to know they have a new stadium so they create a stub in WP. NPP reviews the stub, checks the sources, and can see that the stub is promotion because the sources are promoting the new stadium. We tag it G11, an admin comes along and rejects the G11 because admins don't check the sources, and for them, it has to be "blatant promotion" they can actually see in the stub. So now NPP has to take it to AfD and there goes another 7 days down the tubes, and then along comes another admin and decides to relist it. I suggested (tongue in cheek) at WT:V that when an admin rejects a CSD or PROD by NPP, that admin should be responsible for fixing the stub or article. It's easy to reject, but trying to make a stub/article worthy of inclusion when it is not is a time sink. Maybe we should consider unbundling that part of the bit and allow experienced NPP reviewers that right so they can do their job quickly and efficiently? The threat of bots and human bot-like creations is a real problem for NPP that needs to be controlled, or maybe we need a bot to automatically reject creations that don't meet certain criteria - or better yet, prevent publish if certain criteria is not met - like if you forget to add an edit summary you cannot publish changes. Otherwise, everyday is like Ground Hog Day at NPP, and we suffer a lot of burnout.  Atsme  💬 📧 01:06, 8 September 2022 (UTC)
 * This is a compound question, which is a bad idea for an RFC. It's fine to ask "Should we have some sort of permissions system, like a user right, that allows people to get pre-approval before manually creating an unusually high number of articles?"  It's also fine to ask "If we decide to have a pre-approval system for non-bot high-volume article creation, then are these proposed levels –   n per day, p per week, q per month, and r per year – about right, too restrictive, or too permissive?"  You don't want people saying "I oppose the whole idea because it's a great idea except the suggested numbers are wrong" (and that will happen, even if you tell people not to do it).   WhatamIdoing (talk) 01:13, 8 September 2022 (UTC)
 * Agree with points by Donald and WAID. But also agree that a question on restricting the ability to create many short, undersourced articles needs to be in the RfC in some form. In the comments I asked if it would be technically feasible to track the stub-status of articles created by a particular editor, and mandate that they can have no more than X number created after Y date that are both a) <Z bytes excluding infoboxes etc./stub-class, and b) do not contain a SIRS with significant prose coverage that could contribute to GNG. If they exceed that number, they have to go back and expand/source a stub before they can create another such article. And if the "at least one GNG source" thing passes, we could raise the coverage bar to "demonstrably meets GNG". JoelleJay (talk) 03:08, 8 September 2022 (UTC)
 * Assuming stub status always reliably correlates with article size (it doesn't) it seems likely that it would be possible for a bot to track the number of articles created in a given time period that are smaller than Z bytes, excluding infoboxes. ("etc." would obviously need refinement), and also the number of sources in an article (assuming all sources use a CS1/CS2 template). would be completely impossible to do automatically as there is no way for a bot to know whether a sources is independent, reliable, or how much relevant prose it contains. Even humans who frequently address the question regularly disagree completely about whether a given source "demonstrably meets the GNG" or not, so this is not something even an AI could reliably do. Thryduulf (talk) 09:03, 8 September 2022 (UTC)
 * I mostly wasn't considering automating the GNG source determination, just the article size restriction. However, we can exclude certain types of database sources outright as non-GNG-contributing (e.g. soccerbase, used in ~19k articles), which could be detected by filters. We require a GNG-quality source for NSPORT presumptions to apply at AfD, so editors are already watching out for that sort of thing when looking over athlete stubs; and if the generalized requirement gets passed the larger community would be making that determination regularly too. But the main thing would be to quantify "stub backlogs", with the editors who have reached the threshold then given a notice on their TP, and all subsequent stub-level articles they create would have to pass NPP (even if they're autopatrolled). If they continue making such articles that are flagged by NPP/other editors as lacking GNG sources, that would trigger escalating warnings and admin action. JoelleJay (talk) 19:51, 8 September 2022 (UTC)
 * @JoelleJay, w/re: a noticeboard -- perhaps that would include discussions of whether a paticular database can ever be considered to provide significant coverage? WhatamIdoing has given an example (The database problem, collapsed below but feel free to unhat if you'd like to discuss there) of a database entry they argue does provide sigcov. Valereee (talk) 14:55, 9 September 2022 (UTC)
 * The "database problem" wouldn't be an issue for the vast majority of cases in sports, where the sources are obviously unusable for GNG, like soccerway or transfermarket. We can also avoid it entirely by clarifying sources must have significant prose coverage that does not just regurgitate stats. A noticeboard would definitely be useful though. JoelleJay (talk) 19:47, 9 September 2022 (UTC)
 * WAID's point includes that some databases (and other sources -- Burke's Peerage, for example) shorthand things that once written out in prose do rise to the level of sigcov. Valereee (talk) 19:57, 9 September 2022 (UTC)
 * "Shorthand" still isn't significant prose coverage. It doesn't matter if editors can proseify data; literally every database with any collection of facts could be proseified to produce several sentences. But A) such databases should definitely be treated as primary as they do not contain independent secondary analysis; B) reproducing the contents of a database in prose form doesn't transform the content into NOTDIRECTORY; C) a considerable amount of OR, non-NPOV, and potentially CV goes into proseifying data: if we're expanding each fact in a directory entry, we run the risk of plagiarism (see all the "X Greatest Ys" lists that are truncated in our articles because reproducing them would violate copyright); and if we are expanding only a selection of a comprehensive collection of facts, how do we choose which ones are important enough to be in an encyclopedia? We have no independent validation that any particular item is DUE, because the data has not been discussed by someone independent of it. A database containing all international competition results for an athlete will make no distinction in importance between the Olympics and a match between Liechtenstein and any other team (which will necessarily have to be "international"). A second database containing all the athlete's domestic results suffers the same problem. So if it's impossible to write a NPOV, non-directory article solely from two different examples of the same broad class of source, we cannot consider any member of that class to be GNG-compliant. JoelleJay (talk) 21:32, 9 September 2022 (UTC)
 * Endorse in principle, especially a non-technical solution, but noting the concerns by Donald Albury, WhatamIdoing et al. I believe that 25 articles/day is too high a bar. Unfortunately BM's data is a bit coarse grained at the moment, but observing, assuming each editor edits only five days a week (perhaps that's generous), we'd have 15 to 20 editors in the "mass creation" zone. Haven't had the time to look at their articles. Ovinus (talk) 03:44, 8 September 2022 (UTC)
 * I think one is more what you are looking for; 58 distinct editors created more than 25 articles in a single day in the past year, with this happening 163 times between them. I would note that this includes editors creating disambiguation pages and similar, but I would still agree that 25 is too high of a bar, while some of the other numbers are both too high of a bar and too low of a bar; the issue isn't editors creating large numbers of articles, the issue is editors mass creating articles, as the latter tend to be of low quality and require significant work from other editors, while the former do not. BilledMammal (talk) 05:28, 8 September 2022 (UTC)
 * Agree with that this is a compound question. We first need to establish what constitutes mass creation/creation"at scale" and how it will be measured before it is possible to decide whether a permission system will be reasonable practicable, and enforceable, or just creeping bureaucracy with no practical value. A test run might help answer those questions. Another question is how one would qualify, and what would the process be for revoking the permission. &middot; &middot; &middot; Peter Southwood (talk): 05:25, 8 September 2022 (UTC)
 * Endorse. The question itself is a good one. Presumably the numbers can be debated in the RfC. Scolaire (talk) 16:38, 9 September 2022 (UTC)
 * Endorse in principle, but I would prefer to emphasize quality over quantity if possible. The primary issues at play are rapid creation of poorly sourced stubs (not rapid creation of well-sourced articles on notable topics) and/or large numbers of creations by newer users that don't clearly pass GNG and flood NPP. If this were to be a pseudoright, perhaps an assignment somewhat similar to autopatrolled could work, with exact numbers and guidelines (also for rate limits) to be determined at the RfC. Complex / Rational  20:54, 9 September 2022 (UTC)
 * I'm not sure that the problem is "poorly sourced stubs". The locus of the problem might be "very short stubs".  Some editors think that very short pages are embarrassing no matter how many sources are listed, and it doesn't much matter to them if the sources are stellar.  In fact, in many cases they actually won't know what the source's quality is, unless they happen to be familiar with the subject area.  In medical subjects, it's not especially unusual to see stub that contains two basic sentences plus six or eight journal articles.
 * @ComplexRational, you've made me wonder if we need a straw poll (more of a gauge-the-mood question than a make-a-rule discussion) that says something like "Agree or disagree: It would be better for Wikipedia to have long articles on fewer subjects than to have short articles on more subjects."  I've seen several discussions in the last month suggesting that editors might support this. WhatamIdoing (talk) 01:12, 10 September 2022 (UTC)

Question 4: Require consideration of alternatives to creation

 * I don't know how this would work in practice. A subject should meet English Wikipedia's standards for having an article in order for an article on it to be created, but it's still an editorial decision if a new article is the best way to organize the information comprised within the overall domain. Thus editors are always having to judge if content on a subject best fits within another article or a separate article. The only way I can think of to establish if this has been done is for all new articles to be discussed first. If that is the intent, then I think it should be proposed directly. isaacl (talk) 15:52, 7 September 2022 (UTC)
 * Maybe I'm not using my imagination, but a policy to require consideration of alternatives to creation, with sanctions for those who do not adhere to such policy seems like a near-non-starter. The broader community is pretty reluctant to erect barriers to creating content except where it's clearly enforceable, can be clearly communicated, and would clearly prevent more problematic content than good content, and I'm not sure this would qualify. &mdash; Rhododendrites  talk \\ 16:41, 7 September 2022 (UTC)
 * As above, I don't see how it is possible to require consideration or to determine who has not followed this requirement. "You must think about this, or else!" :-) Maybe it could be rephrased into something broader, like asking whether our PAGs should be changed to encourage alternatives to creation, and if so, how? Levivich😃 16:50, 7 September 2022 (UTC)
 * I included this here because it was a serious suggestion, but I too was unsure how this could be made to work. Finally decided 'Just moderating here.' :D Valereee (talk) 17:31, 7 September 2022 (UTC)
 * So, basically like AtD, which is also touted as "required" and "a policy" despite no language existing on what violating it would look like or how to enforce it... JoelleJay (talk) 03:11, 8 September 2022 (UTC)
 * I too feel this is unworkable.—S Marshall T/C 22:54, 7 September 2022 (UTC)
 * Let alone not knowing how this would work, I'm not even sure what it's supposed to mean. I have to have read someone's policy page on alternatives before creating? I have to have, in my mind at the time of creation, the possibility of alternatives? I have to go through some checkbox saying I know there are alternatives? Anything I can think of that this might mean either comes across to me as ineffective or both ineffective and bureaucratic. —David Eppstein (talk) 23:09, 7 September 2022 (UTC)
 * I think it's supposed to mean that you consider things like making a list article with 10 (detailed) list items instead of 10 articles. WhatamIdoing (talk) 01:42, 8 September 2022 (UTC)
 * That doesn't explain what "consider" means, which is more my question. —David Eppstein (talk) 07:25, 8 September 2022 (UTC)
 * I don't see anything to react to in this proposal. - Donald Albury 23:28, 7 September 2022 (UTC)
 * Oppose per David Eppstein. Also, I'm unsure what is even meant by "alternatives to creation" - adding content to existing articles? Doing nothing? Writing a draft (or is that creation)? Asking somebody else to create it for me? Thryduulf (talk) 00:15, 8 September 2022 (UTC)
 * Nope, not seeing it.  Atsme 💬 📧 01:17, 8 September 2022 (UTC)
 * How would we objectively measure compliance? I cannot see how this could work. It looks unmeasurable, unenforceable, and more likely to be used for personal attacks than anything constructive. Convince me otherwise with evidence. &middot; &middot; &middot; Peter Southwood (talk): 05:31, 8 September 2022 (UTC)

Question 5: Clarify WP:BEFORE

 * I think it's a good question but it's about deletion not creation IMO. Whether articles can be assumed to be cited to the best readily-available sources depends on what sources they are required to have when they are created; thus the answer to this Q5 depends on the answer to Q1. This should be a question in the second RFC about deletion. Levivich😃 16:53, 7 September 2022 (UTC)
 * Yeah, I waffled on whether this belonged in this RfC. Valereee (talk) 17:29, 7 September 2022 (UTC)
 * I think we should be allowed to presume the best sources in the article at the end of a full AfD are the best available. That idea might fit better in the second RfC.—S Marshall</b> T/C 22:57, 7 September 2022 (UTC)
 * Oppose. I think any relaxation of the principle that deletion-nominators should actually perform BEFORE, and its proposed replacement by a principle that articles as written can be assumed to have been written with the best possible sources, is a bad idea based on false premises. —David Eppstein (talk) 23:00, 7 September 2022 (UTC)
 * Oppose. Out-of-scope for an RfC about article creation. - Donald Albury 23:31, 7 September 2022 (UTC)
 * Oppose per David Eppstein and Donald Albury. Clarifying BEFORE would be good, but it needs to be strengthened (e.g. always requiring someone to look for sources in the place they are most likely to exist before nominating on the grounds of verifiability or notability) and enforced rather than weekend as suggested here (contrary to WP:V which states that articles must be verifiable not that they must be verified). Thryduulf (talk) 00:19, 8 September 2022 (UTC)
 * If it is determined that this question is out of scope then so be it; however, I will go on the record to say BEFORE is essential and should be the first (mandatory?) step prior to deletion. We are getting too many articles at AfD despite NEXIST and CONTN – the process is being misused for discussions that belong on the article TP or with the article creator. I also need specifics as to what some believe is in need of clarity. <span style="text-shadow:#F8F8FF 0.2em 0.2em 0.2em,#F4BBFF -0.2em -0.2em 0.2em,#BFFF00 0.4em 0.4em 0.5em;color:#A2006D"> Atsme 💬 📧 01:24, 8 September 2022 (UTC)
 * This is not a functional RFC question. It might be helpful to have a discussion around BEFORE, but instead of doing this, this proposed question asks editors to vote on a statement of fact, when they don't have the information necessary to determine whether the statement is true or false (especially for any subjects they're unfamiliar with).  Also, I suspect that what's intended here isn't "In our experience, certain articles (almost) always have the best readily-available sourcing at the time of creation", but instead "We never require editors to follow BEFORE if they believe that an article was created under an SNG".  And that, by the way, highlights another problem:  How would the AFD nom even know whether a given article is a "creation under SNGs"?  Articles don't come with color-coded badges that say "I'm a GNG subject" or "I'm an SNG subject".   WhatamIdoing (talk) 01:54, 8 September 2022 (UTC)
 * Support, although it might be more appropriate for the second RFC, yes. This is a vital open question with clear relevance to article creation, since many of the people who have most prolifically created articles have cited the interpretation that makes it mandatory in a way that makes it clear that they are leaning on this belief as part of what makes mass article-creations possible. To respond to David Eppstein specifically - there is currently clearly no consensus backing your opinion that WP:BEFORE searches are mandatory; no one, to my knowledge, has ever been sanctioned for "failing" to perform the search that you believe they are required to perform (nor could they be, since there's no consensus backing that interpretation and a clear contradiction between it and WP:BURDEN; or between it and WP:NEXIST, which merely says that such searches are "strongly encouraged.") If you believe that BEFORE searches ought to be mandatory, you should be pushing for an RFC to clearly establish this, but do not state or imply that it is mandatory currently - there is no consensus backing that position. --Aquillion (talk) 01:57, 8 September 2022 (UTC)
 * Clarification may be useful, but how would one objectively measure compliance? Without a measure for compliance, how would it be enforceablr? Either we require evidence of sufficient notability or we do not. In think we should require strong evidence of notability when creating "at scale" (batches of articles on closely related topics), as those editors would be expected to be competent, but keep the status quo for occasional creation to remain reasonably friendly to new editors and occasional article creators. &middot; &middot; &middot; Peter Southwood (talk): 05:44, 8 September 2022 (UTC)
 * Support wholeheartedly. BEFORE is not currently required, just encouraged. I'd like to see a compromise here. Make BEFORE a requirement, and make the BEFORE search be sources IN or MENTIONED IN the article. Cant type up ref formatting? We got you. Make us go on a wild goose chase to prove YOUR work is encyclopedic? That's a no. --WhoIs 127.0.0.1 ping/loopback 06:43, 8 September 2022 (UTC)

Question 6: Clarify SNG policy
Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. Please limit yourself to a single brief comment. If you must argue a point with another editor, please just take it to their talk; if they convince you to change your mind, come back and revise your single brief comment. Remember that we aren't !voting on these questions, simply trying to refine them and gain consensus on including them in the RfC.

Discussion of Q6

 * Splitting this from Q1. Valereee (talk) 15:37, 8 September 2022 (UTC)
 * Weak non-endorse, as this question could, and I think on balance should, be run as a separate RFC altogether, because it's a broad question (there are lots of SNGs) and it affects non-mass-created articles just as much as mass-created articles. Levivich😃 16:59, 8 September 2022 (UTC)
 * Weak non-endorse for exactly the same reasons as Levivich. Thryduulf (talk) 21:49, 8 September 2022 (UTC)
 * Perfectly fine question for a different time (e.g., 2023). Also, if anyone is making a note to run this later, please spend a while contemplating the other obvious way to address GNG/SNG questions, namely asking editors whether they want a rule that says "The English Wikipedia will not have any articles on subjects for which editors cannot find at least two independent/third-party reliable sources, which together contain enough information to write a short encyclopedia about the subject," which solves the problem in a different way.  WhatamIdoing (talk) 02:23, 9 September 2022 (UTC)
 * Agree with the previous three editors. Any future discussion would also need to carefully consider WP:NCORP and other SNG's that apply stricter limits than GNG. BilledMammal (talk) 04:25, 9 September 2022 (UTC)
 * Endorse, because the vast majority of conflict that I have seen at AfD related to mass creation and deletion has to do with GNG and SNG differences, and conflicts in their interpretation. I do not object to handling it separately, but if we don't handle it, nothing else we come up with is going to be meaningful. Vanamonde (Talk) 08:42, 9 September 2022 (UTC)
 * Basically agree with first four responders above. This will likely be useful to pursue, but maybe later, as a separate RFC. - Donald Albury 13:32, 9 September 2022 (UTC)
 * As far as article creation goes, this is probably not necessary if Q7 is asked. It's probably more relevant to the AfD RfC. Scolaire (talk) 16:35, 9 September 2022 (UTC)

Question 7: Require a GNG-quality source
Please limit yourself to a single brief comment. If you must argue a point with another editor, please just take it to their talk; if they convince you to change your mind, come back and revise your single brief comment. Remember that we aren't !voting on these questions, simply trying to refine them and gain consensus on including them in the RfC.

Proposed wording 1: Require all articles created under SNGs (other than those which confer notability) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in a n independent reliable secondary source.

Proposed wording 2: Require all articles (except those not required to meet GNG) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable secondary source.

Proposed wording 3: Require all WP:MASSCREATEd articles to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable secondary source.

'Please in your response indicate whether you'd endorse one, two, or all three for inclusion in the RfC'', not whether you'd personally support or oppose. Here we're just trying to refine and gain consensus on wording to include in the RfC.''' Valereee (talk) 17:41, 8 September 2022 (UTC)

Discussion of Q7
In general, #2 is the closest to what we want, as I agree that we should avoid using the ambiguous term "mass creation", and there is no reason to permit articles covered by the GNG to not have at least one source from creation. However, to address my earlier concerns I would suggest All articles (except those not required to meet GNG) must be include at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable source. Articles must not be created in or moved to article space without meeting this requirement, but meeting it is not sufficient to demonstrate notability. BilledMammal (talk) 00:02, 9 September 2022 (UTC)
 * Splitting this from Q1, revising for comments. Valereee (talk) 15:38, 8 September 2022 (UTC)
 * Endorse as written but I think it would benefit from two changes: (1) "Require all articles created under SNGs (other than those which confer notability)..." ->, and (2) "...significant coverage in a reliable source" -> or . I would also endorse this if it said "Require all WP:MASSCREATEd articles...". Levivich😃 16:57, 8 September 2022 (UTC)
 * @Levivich, are we inadver then placing into policy that all articles need only a single GNG source? I try for three before I move to article space, and that's what I look for at AfC. Valereee (talk) 17:12, 8 September 2022 (UTC)
 * I think clarifies that? Right now, we don't require any GNG sources; I'd say this intentionally places into policy "at least one". Further clarifications might be  (or mass created). Levivich😃 17:18, 8 September 2022 (UTC)
 * Trying to avoid "mass-created" because of the definition problem. :D I'm hoping it'll be less muddy at the AfD RfC. Valereee (talk) 17:21, 8 September 2022 (UTC)
 * True, 'at least' does do that. Hm...maybe offer a couple wordings, ask for feedback? Jeez. This workshop. WTaF possessed me? Valereee (talk) 17:31, 8 September 2022 (UTC)
 * The leading theory is that you and Xeno went out drinking one night and one of you stole the other's phone and emailed arbcom signing up for this, and the other did the same as retaliation. Maybe the easiest thing to do is delete this whole section since no one else has commented here yet, and offer a Q7a/Q7b/Q7c/whatever, and see which version gets the most endorsement? Levivich😃 17:35, 8 September 2022 (UTC)
 * oops too late lol...go on about the WP:MASSCREATEd version, though. I thought that was just about bots, am I misreading? Valereee (talk) 17:43, 8 September 2022 (UTC)
 * (I don't want to directly clerk this page, but maybe move my !vote and all the replies down to Discussion? I can !vote again later on the two proposed wordings.) WP:MASSCREATE says "automated or semi-automated", so I don't think it's limited to bots. Also, it's part of WP:BOTPOL, which also includes WP:MEATBOT; the policy is (IMO) consistent in saying there is no difference between a bot edit and a bot-like human edit. Anyway, if we say "Require WP:MASSCREATEd articles...", that punts the definition of "mass created" to WP:MASSCREATE, and the community can adjust it there (now or in the future). For those editors who are concerned that this question isn't sufficiently focused on mass-creation, it's one possible way to more narrowly focus it. Levivich😃 17:56, 8 September 2022 (UTC)
 * This is the discussion for this question. The discussion below (now titled "General discussion (please discuss specific proposed questions above in their own sections)" was added at some point for general discussion by someone else, I think. By the time I saw it there were multiple responses, so I didn't remove. I'll add the masscreate wording as a third. Valereee (talk) 18:31, 8 September 2022 (UTC)
 * I think one source is insufficient and would strongly recommend two (in particular, I would strenuously object if the RFC moves forward without a two-source requirement as an option.) Two sources is necessary for the WP:GNG, which specifies "multiple", and from a common-sense perspective is necessary to write neutral articles, since any article with only a single source is by definition placing undue weight on it. --Aquillion (talk) 07:05, 9 September 2022 (UTC)
 * The problem again becomes one of complexity. If we end up proposing two or three different wordings, and then have to figure out how to offer 'require 2' as an second option for each, are we getting into the RfC becoming a train wreck? Valereee (talk) 13:46, 9 September 2022 (UTC)
 * Endorse 2 & 3 - I think 1 and 2 are saying the same thing but 2 says it better, so we should present 2 and 3 as options in the RfC. Levivich😃 22:33, 8 September 2022 (UTC)
 * Dislike all - as they are not really connected to mass-creation or I don't see a way to automate review of sourcing to prevent the mass creation of articles (that is all mass created articles will go to mainspace where there will be a process for deletion). If any of these are going to an RFC, my preference is #3. --Enos733 (talk) 22:54, 8 September 2022 (UTC)
 * Effectively, what we want to do is say that all articles must have at least one (or two) sources that plausibly contribute to GNG from creation. The issue with the current wording here is that it isn't clear whether we are requiring one source from creation or in general - and given that editors abuse WP:SPORTSCRIT #5 to argue that one source is sufficient to grant notability, this lack of clarity will cause issues.
 * Agreed with BilledMammal, but would also add the requirement that the source be secondary. JoelleJay (talk) 00:57, 9 September 2022 (UTC)
 * Agreed. BilledMammal (talk) 02:47, 9 September 2022 (UTC)
 * Adding that; if anyone objects, speak up! Valereee (talk) 13:05, 9 September 2022 (UTC)
 * All of these are basically pointless, as there is no agreement about what constitutes WP:SIGCOV. These amount to "Prevent disputes by requiring articles to meet a highly disputed, subjective, variable criterion".  It doesn't matter whether any of them are debated or adopted, because none of them will solve the actual, practical problem.  WhatamIdoing (talk) 02:28, 9 September 2022 (UTC)
 * There is some agreement; some sources are uncontroversially considered to not be WP:SIGCOV, while others are uncontroversially considered to be WP:SIGCOV. In between the two there is some controversy, but that's why the words plausibly contribute to GNG are used; for this area of controversy, that is where we need things like AfD, rather than an uncontroversial move to draft space.
 * This also means that this wouldn't be pointless, as almost all mass created articles are created with sources that are uncontroversially not significant coverage. For example, Lugnuts used this article as the sole source for Enrique Hernández (weightlifter); no one would consider that to be significant coverage, and this rule would have prevented that article from being created unless Lugnuts spent the time to find a proper source. BilledMammal (talk) 02:47, 9 September 2022 (UTC)
 * Here's what I could write from that archived source alone:
 * "Enrique Hernandez (b. 13 July 1945) is a male weightlifter. When he was 23 years old, he competed in the 1968 Olympics in Mexico City, where he represented Puerto Rico. A small man, he stood 157 cm high and weighed 60 kg at the time.In the 1968 Summer Olympics, he competed in three events:  military press, snatch, and clean and jerk.  His overall standing at the end of the games was 13th place.  He ranked 8th for the military press, lifting 115 or 120 kg on each of three tries.  He ranked 13th for snatch, lifting between 95 and 102.5 kg each time.  He ranked 16th on clean and jerk, managing 122.5 to 135 kg each time.Hernandez was born in Santurce, Puerto Rico."


 * That's nine sentences, so still a stub, but it's not a bad stub (except, of course, for any errors I might have made because I don't know what any of these events are). I'd say that a source that can be turned into a substantial stub is a source that provides significant coverage.  Wouldn't you?   WhatamIdoing (talk) 16:31, 9 September 2022 (UTC)
 * Absolutely not. That interpretation would permit inclusion of literally every middle school athlete on maxpreps. It also runs afoul of NOTDIRECTORY since it's literally indistinguishable in content from an actual directory. JoelleJay (talk) 20:35, 9 September 2022 (UTC)
 * The middle school comment sounds like an Appeal to consequences to me. There's no problem with WP:NOTDIRECTORY in what I've written, as it's not any kind of list at all.  NOTDIR's components are "simple listings", "lists or repositories of loosely associated topics", "cross-categorizations", genealogical entries", TV guides, or business-related lists, and this is little stub falls into none of those categories.
 * The point of this sub-thread is that BilledMammal said "no one would consider that [source] to be significant coverage", and I not only disagreed with him, but provided a worked example of exactly why someone might genuinely believe that this source constituted significant coverage.
 * The problem isn't whether The Community™ would agree that this is/isn't SIGCOV; the problem is that some editors would genuinely, sincerely, believe that it did. They might, in fact, believe that it was "plausible, adj.:  Seemingly or apparently valid, likely, or acceptable; conceivably true or likely", and all we have on the other side is editors claiming that it's inconceivable that any of our >>100K editors would ever hold a different view from them.
 * This might be a perfectly fine rule to have, in that magical future when we all agree (more or less) what SIGCOV actually means, why we require it, and how to tell whether a subject has it. Until such a point in time, adopting or not-adopting this rule would simply be a waste of effort.   WhatamIdoing (talk) 00:41, 10 September 2022 (UTC)
 * But it is not plausible; an editor who claims that is WP:SIGCOV either has WP:CIR issues, or WP:IDHT issues. The idea of "plausible" isn't that no one can believe it's SIGCOV, it's that no editor with a reasonable understanding of and respect for community norms can believe it is SIGCOV.
 * I would also point out that I think they meant WP:NOTDATABASE, not WP:NOTDIR, with the issue being that it is a replica of a database entry. The fact that you have converted the database entry to prose and padded it with superfluous words and a little WP:OR doesn't change that. BilledMammal (talk) 00:51, 10 September 2022 (UTC)
 * @BilledMammal, I invite you to find my name in the lists of top 10 the editors ever of WP:N and the policy on how to write policies and guidelines and also WP:RFC, and then consider whether you really meant to say that I don't have a reasonable understanding of community norms. WhatamIdoing (talk) 01:20, 10 September 2022 (UTC)
 * If someone believes such a source can plausibly meet GNG, then they can be disabused of that on their talk page like we do for every other content issue. Our guidelines say that GNG-contributing sources need to be secondary, which non-prose items in a database are not (just like census results or official records or raw research data or tabulated results of surveys or questionnaires); that notability is necessary to prevent "indiscriminate inclusion" of topics, which would exclude entries sourced solely to databases; and that Moreover, not all coverage in reliable sources constitutes evidence of notability for the purposes of article creation; for example, directories and databases. JoelleJay (talk) 01:18, 10 September 2022 (UTC)
 * It turns out that secondary databases are a thing. WhatamIdoing (talk) 01:44, 10 September 2022 (UTC)
 * (Funny that that footnote calls advertisements a reliable source.) WhatamIdoing (talk) 01:45, 10 September 2022 (UTC)
 * That's obviously not "secondary" in the sense used on Wikipedia. I use PROSITE all the time; in fact I literally have three tabs open right now. This one is querying UniProtKB for motifs found in a particular protein. No human was involved in publishing that specific page, it's just another result of a database search. JoelleJay (talk) 01:53, 10 September 2022 (UTC)
 * Speaking as the editor who created pages like Independent does not mean secondary and Identifying and using primary sources to clear up some serious misunderstandings about what secondary sources are:
 * I don't know whether Wikipedia editors consider a secondary database to be a WikiJargonSecondarySource. Our rules don't necessarily follow any of the multiple and conflicting versions of the real-world definitions.
 * I can tell you that we have never had a rule that says a secondary source requires a human being involved in publishing the specific page. (Also, I suspect you mean "creating that specific page" rather than "publishing that specific page".) WhatamIdoing (talk) 02:04, 10 September 2022 (UTC)
 * Of course a "secondary database" isn't secondary in the wikipedia sense. It's only secondary in the sense that it includes more higher-order annotated data like protein motifs and interaction profiles rather than just straight sequences. And regarding a "human rule": A secondary source provides an author's own thinking based on primary sources, generally at least one step removed from an event. It contains an author's analysis, evaluation, interpretation, or synthesis of the facts, evidence, concepts, and ideas taken from primary sources. Do you think "an author's own thinking" includes, like, the raw output of protein docking software? JoelleJay (talk) 02:44, 10 September 2022 (UTC)
 * A source that contains information that can be put into a stub isn't the same as a source that provides significant coverage demonstrating that English Wikipedia's standards for having an article has been met. Routine mentions of a person's activities can often be found in sources which would fill out a stub. From the notability guideline, significant coverage addresses the topic directly and in detail. Consensus discussion at the talk page for the sports notability guideline has agreed that routine sports coverage is not suitable for meeting the requirements of the general notability guideline. I agree with you that making blanket statements about databases (or potentially even about all topics in one database) may be subject to individual exceptions, and thus might not be sufficiently helpful. isaacl (talk) 20:43, 9 September 2022 (UTC)
 * You're mixing criteria. It's fine to say "Routine coverage doesn't count", but it'd be silly to believe that routine coverage can't "address the topic directly and in detail".  That database entry is probably routine coverage.  But you'd never say that the database record doesn't "address the topic directly" (It's got his name right there at the top, and everything in it reports what he did himself at a particular event.  This isn't a source that's about his family, or his country, or about weightlifters in general – it is 100% directly about the subject) or "in detail" (it names his exact height and weight, and the precise amount of weight he lifted each of three times on a specific day; how much more detail do you expect?  What he ate for breakfast each morning?).
 * I've long had my doubts about the sentence we use to describe SIGCOV. For example, I suspect that we actually want sources, when viewed in combination with each other, to address a subject comprehensively rather than in detail.  But that's a problem for a separate RFC, not for this one, and I see no point in proceeding with this question until we have an agreement on what SIGCOV is.  In the meantime, this source might be SIGCOV, and "non-routine" isn't mentioned in any of the above proposals. WhatamIdoing (talk) 00:52, 10 September 2022 (UTC)
 * I didn't say that routine coverage doesn't address a topic directly. I agree that in the context of the notability guideline page, the word "detail" is being used in the sense of providing comprehensive discussion of the subject's different aspects and history. I'm not aware of any areas where routine coverage is considered to be evidence that the general notability guideline is met (though I do know there is disagreement on what coverage is considered routine); if you know of some examples, I would appreciate learning about them. isaacl (talk) 01:08, 10 September 2022 (UTC)
 * Part of the problem is that everyone has their own idea of what counts as "routine". Is a news story saying that the FDA approved a drug for Alzheimer's "routine"?  It's certainly expected, and that kind of news story happens every single time they approve a drug, and some editors would say it's therefore "routine".   But there has never yet been an FDA-approved drug that wasn't considered GNG-notable, and the media attention at the time of approval is one of the key factors for this.   WhatamIdoing (talk) 01:50, 10 September 2022 (UTC)
 * For that case, the ensuing comprehensive coverage of the drug is suitable for demonstrating that the general notability guideline is met. isaacl (talk) 05:08, 10 September 2022 (UTC)
 * (It's got his name right there at the top, and everything in it reports what he did himself at a particular event. This isn't a source that's about his family, or his country, or about weightlifters in general – it is 100% directly about the subject). sports-reference.com literally has a page for every single Olympic weightlifter and every single Puerto Rican Olympian; just because it has a UI that lets you access tabulated results on one particular athlete at a time doesn't mean the source (the database itself) is directly covering each one in detail. I could export the 1968 Men's Featherweight event as a CSV and apply a filter to include only the elements associated with Enrique; that doesn't make me, or the filter, or Excel suddenly a source of SIGCOV. No human wrote that page, all it is is the result of a database query. And even in cases where we do have proseified stats, without someone independent actually interpreting them and publishing their original analysis, that coverage remains primary. JoelleJay (talk) 01:45, 10 September 2022 (UTC)
 * Do you think the database is indirectly covering the subject?
 * We have never required that a subject be a particular percentage of a source. It's okay if only 1% of a book directly addresses the subject; it's therefore okay if only 1% of any source directly addresses the subject.
 * I don't believe that we have ever required that humans create the sources. With the rise of software that can write plausible-sounding text, we might want to consider that.  Creating such a rule would require us to stop citing Google Earth (looks like it's in a couple hundred articles), online maps, etc. WhatamIdoing (talk) 01:58, 10 September 2022 (UTC)
 * Do you genuinely believe that the results of a database query comprise secondary independent SIGCOV? Seriously? In what possible way is that a secondary interpretation of data? We have policy against basing articles on primary sources specifically because [s]econdary or tertiary sources are needed to establish the topic's notability and avoid novel interpretations of primary sources.
 * And no one said we can't use databases as sources, they just don't count toward GNG. JoelleJay (talk) 02:26, 10 September 2022 (UTC)
 * I wrote a bit about this in my suggestions above, but generally-speaking I would advise that the policy say something like "any good-faith assertion that a source satisfies the GNG is sufficient." Individual cases where there is a dispute over whether a source is sufficient should go to WP:AFD as usual, whereas if someone is repeatedly presenting sources that cannot possibly be considered good-faith efforts to satisfy the GNG (after being informed of the problem), that is a conduct issue better handled at WP:ANI or the like. But the purpose of this policy is to act as a general pointer towards what the ideal expectations are, plus establishing a justification for sanctions against people who flagrantly and repeatedly disregards them - the purpose isn't to allow articles to be deleted instantly without going through AFD just because someone quibbles over the source. This is not so different from how we handle sourcing anywhere else. --Aquillion (talk) 07:05, 9 September 2022 (UTC)
 * I'm a little leery of adding complexity unless it's really necessary to explain the idea. We already have "plausibly contributes", does 'good faith assertion' really add much to that? Valereee (talk) 13:29, 9 September 2022 (UTC)
 * "Plausibly contributes" leads to disputes about whether it's really plausible according to all of us.
 * "Good-faith assertion" means "I might be incompetent or gullible, but I am not lying or deliberately trying to harm Wikipedia when I tell you that personally think this contains what I personally call significant coverage".
 * I still think this whole section is pointless. We need a definition of SIGCOV before we can address this problem. WhatamIdoing (talk) 16:36, 9 September 2022 (UTC)
 * Endorse 1 & 2, prefer 1. Three would prohibit mass creation of articles that aren't required to meet GNG, which is an overreach. Vanamonde (Talk) 08:47, 9 September 2022 (UTC)
 * I'm not happy with the wording of any of the three alternatives. If we are trying to limit the creation of large numbers of poorly-sourced articles, then whatever we adopt should apply to all articles, and not exempt articles under some SNG. I also think requiring two sources is a good idea, although getting that accepted may be harder than requiring one reliable source. I am interested in restricting the mass creation of articles based on a single database, such as was the case with many articles created by and . For example, exempting articles about populated places from the sourcing requirement because they fall under the Notability (geographic features) SNG would not have prevented Carlossuarez46 from creating that large number of very short, often very mistaken, articles. I cannot support any solution that leaves that big a hole. Please note that this is not about notability, my concern is about preventing the mass creation of poorly sourced articles that require major efforts to cliean up. - Donald Albury 13:58, 9 September 2022 (UTC)
 * @Donald Albury, WhatAmIDoing has addressed multiple times their concerns about making a blanket assumption that all entries in all databases inherently cannot meet SIGCOV, there's a section below (The database problem) I've collapsed. Valereee (talk) 14:13, 9 September 2022 (UTC)
 * ETA: if you'd like to continue that discussion with them, feel free to unhat. Valereee (talk) 14:17, 9 September 2022 (UTC)
 * Also none of these proposed wordings exempts all SNGs. Just those that do not confer notability; that issue is addressed in Q6, but the general comments so far indicate that hsould be a separate question. Valereee (talk) 14:20, 9 September 2022 (UTC)
 * If we are trying to limit the creation of large numbers of poorly-sourced articles...
 * It is not clear that we are trying to do this, though I imagine some editors would like this outcome. Even if we are, it is not clear to me that any of these proposals would have that result. WhatamIdoing (talk) 00:55, 10 September 2022 (UTC)
 * Without any mention of what happens if an article doesn't have the required sources these are all incomplete, but I strongly oppose 1 and 2. Small numbers of articles do not cause the problems that this is about solving and existing processes handle issues with them well so there is absolutely no justification to adding more barriers and more bureaucracy to article creation all of which bite good faith editors. I'm less opposed to 3, but I'm still opposed it's all very vague and I worry that something like this is going to be (or at least attempted to be) rigidly enforced leading to arguments about whether something was mass-created, whether it could plausibly contribute to the GNG, whether it's plausibly reliable or not, etc. I fully endorse also what WAID said. Thryduulf (talk) 14:54, 9 September 2022 (UTC)
 * Endorse any or all. However, I think it should require two sources, simply because GNG says "reliable sources" (plural). Also, you might consider adding "significant coverage", because this would eliminate the likes of Olympedia, but not the sort of database talks about.  Scolaire (talk) 16:29, 9 September 2022 (UTC)
 * I feel that articles should have at least two good quality sources, not just one. Because an article with only one good quality source is either (1) plagiarism of that source, (2) obfuscated plagiarism of that source, or else (3) not really based on its only decent quality source.  We really need to combine at least two good quality sources to get outside copyvio territory.  Even a two-source minimum doesn't solve the sportsperson problem because sports bios are inevitably sourced to the hinder pages of several local newspapers.—<b style="font-family: Verdana; color: Maroon;">S Marshall</b> T/C 20:56, 9 September 2022 (UTC)
 * I think you're wrong about plagiarism and copyvio. I just wrote a faux stub above, from a single source, on a subject that I know nothing about.  It's not plagiarism (I told you exactly where I got this information from), and it's not a copyright violation (I never took even so much as four consecutive words from the source).  If I can write from a single source without falling afoul of Plagiarism and Copyright violations, then so can others. WhatamIdoing (talk) 00:57, 10 September 2022 (UTC)
 * You know plagiarism isn't just word-for-word copying...if there's any original creativity behind the organization of data, reproducing the data with the same presentation can also be infringement. Even the simple fact of selecting particular data can be creative, so reordering the data and giving it a new treatment might not be protective either. JoelleJay (talk) 02:09, 10 September 2022 (UTC)
 * That'd be copyright infringement, not plagiarism.&lt;/pedant> There's also the unlikely possibility that you're prosifying a copyright trap. —Cryptic 02:48, 10 September 2022 (UTC)
 * Plagiarism is using information written by someone else without crediting them, so as long as the content is cited, it hasn't been plagiarized. Regarding copyright infringements, facts and ideas can't be copyrighted, only a specific expression of them. I can learn something from one source, and then retell what I learned in my own words without infringing copyright. isaacl (talk) 05:15, 10 September 2022 (UTC)

Question 8: Mass creations noticeboard
Create a dedicated noticeboard to allow for consensus for, notifications of, reports of, and discussions of mass creations and the sources used for such creations. (Details to be developed there.)

Discussion of Q8

 * Endorse --Enos733 (talk) 15:33, 9 September 2022 (UTC)
 * Endorse on the proviso that there is no prohibition on the same venue also handling (discussions about) mass deletions, mass moves, etc. if there is consensus for it to do so (to be established in subsequent discussion(s)). Thryduulf (talk) 16:25, 9 September 2022 (UTC)
 * I didn't include those simply because I didn't want to get out of scope or muddy the waters, but my intention with this wording is not to exclude other mass actions. I think it's possible such a proposed specific inclusion could be made during the RfC. Valereee (talk) 16:45, 9 September 2022 (UTC)
 * I was not proposing their inclusion be proposed here, just that the option to propose them later remains available. Thryduulf (talk) 17:18, 9 September 2022 (UTC)
 * Wondering if we should remove "dedicated" as implying only for mass creations? Valereee (talk) 19:10, 9 September 2022 (UTC)
 * Perhaps "specific" would work better? Thryduulf (talk) 21:32, 9 September 2022 (UTC)
 * Endorse. Scolaire (talk) 16:31, 9 September 2022 (UTC)
 * Endorse, to provide a centralized/dedicated venue (though also encompassing the types of discussions described by Thryduulf) and to keep non-conduct issues away from AN/ANI. Complex / Rational  20:58, 9 September 2022 (UTC)
 * Endorse as an RfC question, whether I'd support this would depend on the details. I can just see a flood of AfD !votes along the lines of "keep, needed prior discussion at MAR". Which would be unhelpful. Vanamonde (Talk) 05:16, 10 September 2022 (UTC)

General discussion (please discuss specific proposed questions above in their own sections)

 * Your response is requested for these questions. &#8211;<span style="font-family:CG Times, times"> MJL &thinsp;‐Talk‐☖ 22:42, 7 September 2022 (UTC)
 * I am guessing I got pinged because I left a comment, in my role as an arb? Regardless because of that role I feel it's not my place to weigh in on these questions. Barkeep49 (talk) 23:49, 7 September 2022 (UTC)
 * This is correct. I pinged everyone with a signature above. &#8211;<span style="font-family:CG Times, times"> MJL &thinsp;‐Talk‐☖ 01:44, 8 September 2022 (UTC)
 * I realise I've just opposed all five suggestions, but one of them doesn't require consensus and is not a solution in itself (but an aid to other processes that may be), and none of the others will solve the actual issues (and at least some will create new problems). I wont have time for a couple of days I don't think, but perhaps someone could transfer the ideas at Mass action review here for workshopping. Thryduulf (talk) 00:23, 8 September 2022 (UTC)
 * @Thryduulf, sorry to ping you if you're busy, but I'm not sure whether that board might be out of scope here unless you're seeing WP:MARV as possibly the place Rhodo is suggesting people go to post notice they're planning to use a single source to create multiple similar articles? Valereee (talk) 16:07, 8 September 2022 (UTC)
 * That wasn't my original intention when drafting the page (as I hadn't thought of that as something that would be needed), but if there is consensus that such a venue is something we want then having a single venue to discuss all aspects of mass actions would seem logical. Thryduulf (talk) 21:47, 8 September 2022 (UTC)
 * Are we taking it as a given that creation of policy compliant articles on any scale is acceptable, and that the problem and proposed remedies refer to non-compliant articles only? &middot; &middot; &middot; Peter Southwood (talk): 05:56, 8 September 2022 (UTC)
 * I am not sure how clarifications of GNG or the SNGs affect mass creation. These proposals appear to only address the creation of stub-class articles and articles that are already or would be flagged as needing sources. --Enos733 (talk) 16:11, 8 September 2022 (UTC)
 * I believe that requiring a GNG source at creation (Q7=yes) would effectively stop mass creation altogether, because you can't mass-create (i.e., 25/day or more) articles when you need to find a GNG source for each topic. Finding a GNG source can't be automated or semi-automated. So I see the "requirements at creation" issue to be a fundamental issue upon which rate-limitation issues depend. In other words, if Q7 failed, I'd be a strong supporter of Q3, but if Q7 passed, I'd probably be neutral on Q3. Levivich😃 17:03, 8 September 2022 (UTC)
 * Your suggestion would appear to lead to the subjective assessment of what constitutes a GNG source. And since there is no technological barrier of how (from either the size or any sourcing) an article can get placed in the mainspace, I do not see how this suggestion actually prevents mass creation, other than potentially helping to determine if a user warrants a block or other sanction. - Enos733 (talk) 17:31, 8 September 2022 (UTC)
 * Well we eventually have to do that at AfD, and AfC/NPP already make subjective assessments of article merit, so having a sourcing requirement enforced at creation would just redistribute an existing burden and likely reduce it. JoelleJay (talk) 20:55, 8 September 2022 (UTC)
 * All unsourced BLPs are already supposed to be PRODded, or speedily deleted. NOCITE - Enos733 (talk) 22:59, 8 September 2022 (UTC)
 * , that isn't correct though. You could mass-create under NPOL or GEOLAND, for instance; and mass-creation of articles about recognized vertebrate taxa is still fundamentally possible, if slightly harder. My problem is more that unless there's consensus that NSPORTS (for instance) does not grant independent notability, mass-creation under that will continue. Vanamonde (Talk) 08:50, 9 September 2022 (UTC)
 * Maybe I'm misunderstanding, but this item appears to have gone off the rails. The cited issue was mass creations, and various proposals for source-requirements were then phrased without mentioning mass creation. This should be explicitly tied back to mass creations. Are we seriously discussing speedy delete for a single good article by a new user, because they didn't know to cite easily available sources? A new user isn't creating an article under any SNG or GNG, they are making an ignorant but good faith first effort. It becomes an even bigger problem when an SNG could plausibly apply, but where appealing to that SNG is unnecessary or undesired. (i.e. failing an SNG is irrelevant if GNG is satisfied.) Alsee (talk) 17:13, 8 September 2022 (UTC)
 * I don't think we're discussing speedy delete at all? Levivich😃 17:14, 8 September 2022 (UTC)

An alternative to a multi-question/multi-variable sequence of RfCs
Valereee, I wasn't sure where to put this, but it seemed useful to discuss here rather than your talk page. If it's disruptive or confusing down here, feel free to move it to my section above.

There are many people who have been following the issues of mass creation/deletion. What about instead of treating every variable as unknown for the purposes of an RfC, what if we workedd together to workshop an actual process for mass creation based on what we've seen in the various threads -- a process that could be refined later, but provides a starting point. Large, multi-stage, multi-question RfCs are thorough, and can produce good results, but they can also be complicated, result in some confusing/contradicting outcomes, and produce results that are hard to modify or implement. The risk of proposing a specific process for the community to !vote on is that the specificity has the potential to lose people who feel passionately about a particular detail, but can also be productive in giving people something actionable to work with (and later implement). I'm thinking of, for example, when WP:NCORP was completely overhauled, and we had an RfC about using the rewrite as the new starting point rather than debating each and every change. &mdash; Rhododendrites  <sup style="font-size:80%;">talk \\ 20:52, 7 September 2022 (UTC)

A process to workshop
So you want to create a bunch of articles.

Does this guidance apply to you?
 * 1) Are you planning to create more than 50 new articles in the span of a month or 500 in the span of a year?
 * 2) Are those articles on a similar topic, similar theme, or are they based on the same set of sources?
 * 3) Will the articles be created manually, rather than through use of a bot or tools like AutoWikiBrowser (these must go through the Bot Approvals Group)?

If the answer to all of these is yes, this guidance applies to you. (Note that even if the answer is no, if an uninvolved administrator has determined your editing fits within the spirit of these requirements, you will still be expected to follow them).

You must post a notice to [new venue to be created] with the following information:
 * 1) The approximate number of articles you will create
 * 2) The approximate time frame for creation
 * 3) A description of the overall topic/theme
 * 4) Which notability criteria you will be using, and the kind of sourcing you will use to demonstrate that each article meets the criteria

[some additional work on how long these discussions stay open, who approves them, if there's an appeals process, etc. could be added here or deferred to a separate RfC on process for that new venue]

Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases. While there are no firm requirements about the level of quality an article must reach when created, many in the community have a strong preference for mass created articles to be more than one- or two-sentence stubs.

If articles are created after [date this goes into effect] that do not comply with these rules, notice should be posted at [the new venue] for review. An uninvolved administrator may, at their discretion, and with feedback from the community, speedy delete the articles under [criterion TBD, but it should be one that allows refunds], draftify/userfy, or in unusual circumstances even allow keeping the articles and requiring they go through AfD. &mdash; Rhododendrites  <sup style="font-size:80%;">talk \\ 20:52, 7 September 2022 (UTC)

Discussion (proposing a process)

 * I think that's okay – could be a workable process, could be helpful – but I don't think it solves all the problems. Some of the problems described above have relatively little to do with mass creation per se.  Consider, e.g.:
 * complaints about the NPP backlog, even though most of them aren't mass-created articles, and even though most of mass-created articles are really quick and easy to process.
 * complaints about article quality, even though mass-creation of FAs, or heavily sourced jewels of stubs, is just as much mass creation as two-sentence stubs.
 * I saw some articles created recently by an editor in one of these mass-creation discussions. They had a small infobox and said things like:  "Geographic Place is a place near Other Place in State, Country. Part of a Film was filmed there."  The sources were a single government database (for the location) and a single article (for the film's name).  There were a few of these.  Even if they were pre-approved, consider the effects from the two viewpoints above:
 * If this editor doesn't have Autopatrolled, then the creation of those articles means that someone in NPP has to look at them. It ultimately doesn't much matter whether one editor writes n of these or n editors each write one of these; n articles created is still n articles for NPP to process, and because of how burdensome we've made NPP over the years, the NPP reviewer might spend more time running their checklist than the editor spent writing the two-sentence, two-source article.  You can probably imagine why some NPPers shudder at the idea of anyone mass-creating any articles.  (NPP used to be all about CSD, but these days, they're trying to be one-stop shopping for every aspect of quality control, including everything from article titles to stub-tagging, even including typo fixing.)
 * It's kind of a lousy article. This doesn't bother me, personally, but it does bother some editors.  They are disgusted by the idea of "inadequate" articles.  They don't want "embarrassing" articles.  They don't actually care about mass creation per se, except to the extent that mass creation is sometimes associated with the creation of extremely brief articles.  So for this group, your process doesn't directly address their real problem (short, boring articles on subjects I don't care about), and it might even actively authorize an increase in their problem.
 * I think we could address these problems (e.g., stop telling NPPers to be human grammar checkers; agree on whether embarrassing articles and imperfect edits are still part of the glorious wiki process, or if only perfect editors are welcome now), and these problems are only partly connected to mass creation, but I don't think your proposed process will appeal to either of these groups, because they have problems that it will not solve. WhatamIdoing (talk) 02:28, 8 September 2022 (UTC)
 * complaints about the NPP backlog - I think it does get at this by requiring that any mass creation post a request with some basic information about notability/sourcing. That should ensure anyone reviewing at NPP should have an easy time. These are all compromises, of course.
 * complaints about article quality - There's the line above that starts While there are no firm requirements about the level of quality.... I feel like it's about what we could find consensus for (a recommendation). I'm quite doubtful that a proposal to require a certain size/quality would find consensus, and the request process should ensure any debates over sourcing re: SNG/GNG are sorted out in advance. It does sort of move those debates rather than solve them, but I think finding a SNG vs. GNG solution is outside the scope of this RfC anyway.
 * The sources were a single government database (for the location) and a single article (for the film's name) - If the source for the location is a database, that's addressed above (Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases). The point of a process like this would be to ensure something like that doesn't get pre-approved.
 * doesn't much matter whether one editor writes n of these or n editors each write one of these - but the latter is outside the scope of this RfC, isn't it?
 * some NPPers shudder at the idea of anyone mass-creating any articles - I do get that, even if it's hard to draw clear lines about what "mass-creating" means. Regardless, again, for this approach of proposing a process IMO it's important to try to find something that would be broadly acceptable even if none of the sides feel like they entirely got their way.
 * It's kind of a lousy article - Sort of similar to above, but I think this is one of the perspectives that has the potential to sink the productivity of this RfC. There's just no way a large RfC is going to codify requirements for quality beyond something really, really minimal (like enough sourcing to show notability and a recommendation that you do more than a couple sentences). YMMV. &mdash; Rhododendrites  <sup style="font-size:80%;">talk \\ 03:01, 9 September 2022 (UTC)


 * Number of articles may be difficult to estimate. Greater than N should be an option. Time frame is also going to be tricky. Perhaps a declaration of a maximum daily or weekly rate would be more practicable. Third and fourth criteria look suitable. Clarification may be requested and should be provided if reasonable.
 * It is possible that an editor may start with no specific intention to mass create, but find themself doing it as a natural progression. They should not be penalised for this.
 * A tool for measuring compliance would be useful.
 * Would this be a setting permission or just a finding at the discussion?
 * What about revoking permission?&middot; &middot; &middot; Peter Southwood (talk): 06:40, 8 September 2022 (UTC)
 * Greater than N should be an option. - By this do you mean propose a process but ask a separate question about the specific numbers? If so, that seems reasonable to me.
 * It is possible that an editor may start with no specific intention to mass create - If we're only talking about creating a large number of articles on a similar topic/theme/using the same sourcing, this is a rule that people will simply need to be aware of (like people are unaware of 3RR until they are, at which point they realize that they'll need to count). Creating that many iterative articles in a month isn't something that just sneaks up on you, I don't think (though I've never done it myself). Ultimately there's not really any difference between someone who mass creates 50 articles in a month and, oops, didn't realize, and someone who just mass creates 50 articles and didn't request permission first. The difference, I suppose, is awareness, and certainly new rules require some flexibility to ensure people are aware of them. Perhaps I've misunderstood.
 * Would this be a setting permission or just a finding at the discussion? - I don't understand. You mean permission (or lack thereof)? I'd think it would just be an "ok" rather than something technical. &mdash; Rhododendrites  <sup style="font-size:80%;">talk \\ 03:12, 9 September 2022 (UTC)


 * Does rate or number have to be specified? Seems like that's the weak link when it comes to article creation. I think many of the editors who are mass creating probably have discovered a source and just keep moving through that source until they get to the end of it.
 * To take one of the quarry searches provided in the above Statistics section and sort for shortest average content, here's 100+ extremely short stubs on species in the genus Carex, created in maybe three months and all sourced to the same database. Before that it was all the entries in a different genus, same database. This editor is working their way through the database in order and creating an article for each entry. They work inconsistently, a few articles a day, then a day off, then ten articles. (Courtesy ping to )
 * Surely this is something we'd want the process to include, but that rate wouldn't necessarily be captured by 50 per month or 500 per year, and I don't imagine this editor is necessarily planning to create at that rate...it probably happens one month and not the next, or one year and not the next. Valereee (talk) 14:00, 8 September 2022 (UTC)
 * This is a good point, and a good reason to workshop this (I don't know how a sprawling, many-question RfC would quite address all of these sorts of situations, either). The process above includes Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases. It may be worth lowering the threshold when databases are concerned (more than 20/month or 200/year, for example). &mdash; Rhododendrites  <sup style="font-size:80%;">talk \\ 03:16, 9 September 2022 (UTC)
 * @Rhododendrites, yes, my concern is that this is a pretty complex proposal. Could we boil it down to a single idea, like "Create a noticeboard for reporting and discussing mass creations (see WP:MARV for a current similar discussion)." The exact details perhaps could later be worked out there -- what 'mass creation' entails, for instance, and which planned creations would need to be reported up front to allow for consensus to be gained before the work is done? Valereee (talk) 13:02, 9 September 2022 (UTC)
 * That's why I suggested putting a limit on the number of stubby, undersourced articles an editor can have in their creations, and requiring them to expand/source them before they can add another stub. JoelleJay (talk) 20:43, 9 September 2022 (UTC)

The database problem
Summary: Some entries in some databases may represent significant coverage

There are several comments above that contrast "GNG sources" and "databases". I think this is overly simplistic, and I am concerned that this is going to turn into a destructive meme during the RFC.

On the one hand, there are databases that are not independent, do not contain significant coverage, or are otherwise not reliable for any encyclopedic purpose. See, e.g., a database that matches ISBNs with bibliographic data about the book registered to that number. It may be reliable, but it does not have much information in it. You can write some information from a small record like that ("Alice Expert wrote a book, The Sun is Really Big, in 2007") but you can't really turn it into a whole encyclopedia article.

On the other hand, there are database records like https://omim.org/entry/609423, which contains both more complete sentences in prose and more inline citations than most of our articles ever will.

While the extremes might be tolerably obvious, in the extensive middle ground, it will be difficult for editors to decide, fairly and without bias, which ones contain enough information to "count". There will always be a tendency for editors to evaluate the amount of content in a database entry according to whether or not they believe the subject is "worthy". The sports fans will always approve of databases about sports; the Wikipedia is serious business folks will always prefer databases about academic subjects.

And then there is the other problem, which is that some people can get more out of some database entries than others of us.

Consider https://www.fishbase.se/summary/Entomocorus-benjamini.html which I found in a stub created in 2009. A lot of us are going to look at that (please do glance at it now) and say "Ugh, what a useless source". There is not a single sentence on the page. And others of us are going to say that it's a good source with tons of information in it. For those who don't "speak" biology, here's what that source says, in plain English:


 * This fish was described by Carl H. Eigenmann in 1917.
 * This is a kind of catfish.
 * Specifically, it's one of the Driftwood catfish.
 * All catfish are Ray-finned fish, which means they're bony and their fins are supported by multiple thin bones (not just cartilage, and not a solid bone).
 * This particular fish belongs to the genus Entomocorus.
 * The genus name comes from the Greek, and means something like "sharp eye".
 * They are a type of freshwater fish.
 * They live deep water, which is called the Demersal zone (not the layer of water right on the bed, but just above that).
 * This means they are a kind of ground fish.
 * Since they live in deep freshwater, we know they live in lakes or deep rivers, rather than streams or oceans.
 * They live in warm water.
 * They live in the tropics.
 * They're found in the Madeira River basin.
 * This means they live in the middle of South America.
 * We probably don't know much about their development processes or if there is anything unique about their biology. (We'd leave that out of a Wikipedia article, because while it's probably true that we don't know anything about, e.g., sexual dimorphism in this species, this source also can't tell us if someone published a paper on exactly that subject yesterday.)
 * They're small: The maximum recorded standard length (not counting the length of the tail fin) is 7 cm (3 in) long.
 * They have not been evaluated for possible inclusion in the International Union for Conservation of Nature's red list of endangered species, nor for the anti-poaching work of CITES or Convention on the Conservation of Migratory Species of Wild Animals.
 * They aren't poisonous.
 * They don't attack humans.
 * It's mentioned in a chapter of ISBN 85-7430-361-5, which could be a ==Further reading== entry.
 * The Phylogenetic diversity is estimated to be low, so it's pretty similar to similar species.
 * It's not likely to go extinct as a result of fishing by humans.

That's 15 to 20 severable facts from the main page of a single database entry that doesn't contain a single complete sentence. Just what I've written here is almost long enough to qualify for WP:DYK.

If you click through some of the links to subpages, you find that the fish has been reported in at least two countries (Bolivia and Brazil), that it has a Valid name (that's a thing for animals), that it's been entered into the Catalogue of Life (which will interest Wikidata more than us), that it's found in inland waters (which you already knew, if you knew where the Madeira River is, but, hey, now it's officially a statement that this source Directly supports), and that it's a Native species of the Madeira region and endemic in the Neotropical realm, plus a complete taxonomic hierarchy (perfect for filling out those infoboxes) and a list of specific places (turns out it's deep rivers, not lakes) where it's been reported in the academic literature, including citations to those reports.

This single source contains plenty of objective, encyclopedic information. But not everyone can see that, even if they're genuinely trying. All some people can see is "The article contains two sentences and the lone cited source is Greek to me."

I hope this illustrates the two problems of talking about "databases". I think the end result of talking about "database sources" is going to be destructive. We're going to end up with one-size-fits-none claims, and with people dismissing rich sources of information because they don't understand them, rather than because of any limitations in the databases themselves. I think we need to be more descriptive and specific, like "I don't want editors mass-creating articles from sources that contain very few actual facts that would be appropriate for an encyclopedia article. It doesn't actually matter whether that fact-deficient source is 'a database entry' or 'a long feature story in a gossip rag'.  We need sources that contain a lot of information, no matter what format that information is presented in.".

Of course, if your main issue is "I don't want editors mass-creating two-sentence articles", then that's a separate problem. But I still encourage you to not blame "databases" when it would be possible to use only that database to write a much longer article. WhatamIdoing (talk) 04:23, 9 September 2022 (UTC)

Closing workshopping
Closing this as I think we've got enough input. The RfC will be at WP:ACAS and will be announced in various fora. No confirmed timeline yet, sorry! Valereee (talk) 14:47, 10 September 2022 (UTC)


 * @MJL, feel free to archive the entire page, we'll want it blank when we start the RfC. Valereee (talk) 14:48, 10 September 2022 (UTC)