User talk:BilledMammal/LUGSTUBS alternatives

While both WP:LUGSTUBS and WP:LUGSTUBS2 closed as successful, the closer HJ Mitchell did later comment on their talk page that even among the supporters there isn't an appetite for a huge RfC on 1k at a time every few months. This is a concern that I have had since the first LUGSTUBS, and before we even start to consider what topics to contain within the third RfC I want to have a discussion on how to address this.

One obvious solution is to drastically escalate the number of articles we bring through these RfC's; perhaps instead of 1000 articles, the next RfC would contain 10,000. To support editors in parsing this I would create dozens of sublists organized by sport, nationality, era, etc, as well as any other subset requested by editors.

However, for LUGSTUBS2 we decided against such a massive escalation as it was a close debate, and LUGSTUBS2 was similarly close. As such, I want to consider a few alternatives.

First, a WP:PROD style process where editors can tag articles meeting the LUGSTUBS criteria. Once tagged, the tag would only be removable if the LUGSTUBS conditions for restoration to mainspace was met; if the tag had not been removed with one month/three months/six months, the article would be moved to draftspace and could only be restored if the LUGSTUBS conditions were met. I have two concerns with this options; I am not certain about the appetite for such a process among the community, and the large scale tagging that it would require may become problematic.

Second, the creation of a new Wikiproject - sports.wikipedia.org. Unlike enwiki, which is intended to only cover encyclopedic topics, it would be intended to be comprehensive; to cover every verifiable sportsperson. To avoid it becoming a full fork with the resulting duplication of effort and reduced coverage of sports on both sites it would remain inexorably connected to enwiki.

This could be done through a few measures; first, sportswiki will only contain articles on individuals who are not presumed notable on enwiki; for individuals who are presumed notable it will instead provide a redirect to that article, or transclude the article. This means that articles on sportspeople on enwiki will never be deleted (with a few exceptions, such as WP:COPYVIO); instead, they will be transwikified to sportswiki. Similarly, when an article on sportswiki is improved to the point of being presumed notable it is moved to enwiki.

Second, sportswiki would not have its own project space; with the exception of notability guidelines, WP:NOTDATABASE, and parts of WP:PRIMARY it would follow and contribute to our policies. Issues and discussions that require the use of project space would instead take place on enwiki.

The hope of this is that it would make it less controversial to remove these LUGSTUBS - and similar articles by editors like BlackJack etc - from mainspace, by preserving them and by continuing to enable readers to access them. However, it would require sports wikiproject buy in, which may not be forthcoming, and it would require support from the WMF who may be reluctant to allow the creation of a Wikiproject that wouldn't operate independently but would instead be an appendage of a large project.

So far, I haven't been able to see any other alternatives; BeanieFan11 suggested a cleanup project, but as discussed on my talk page I can't see that working as it lacks the scale required to address the scale of the problem of mass creation. If anyone can see others that they think might work, please comment them; otherwise, I would be interested in feedback on these three alternatives.

Ping some of the editors pinged or involved in the previous discussion; while I intend to open this to the broader community before acting through a discussion at VPI, I believe a pre-discussion discussion will allow that discussion to be more productive. BilledMammal (talk) 17:38, 27 September 2023 (UTC)

Discussion

 * We could design a template that says "This is a biography of a living person created by an editor who was site-banned for disruptive behaviour", and then use a script to tag every Lugnuts-created BLP with it. Add text to say any good faith editor can remove that tag if they've reviewed the article, they've checked it, and either it isn't a BLP, or else they agree to adopt it (which means they watchlist it and undertake to revert blatant vandalism and other obviously BLP-violating changes to it).—S Marshall T/C 18:15, 27 September 2023 (UTC)
 * In my opinion, before we do any drastic removal of tens-to-hundreds of thousands of potentially notable articles, we should attempt to do my plan: a competition for, say, two months, similar to the WikiCup, where users get points depending on how many non-notable articles they AFDed are successfully deleted, how many "sub-stubs" are expanded, with bonus points and etc. for different criteria (e.g. like being a Lugnuts creation) and there being numerous barnstars and awards given out. We've almost certainly I believe have enough interested editors in the sports debate to get a large participation - and if we get, say 100 users who each nominate five articles for deletion and improve five, then we've got 1,000 articles dealt with (including 500 improved) in a much less controversial and time-wasting way (and we could repeat it every few months, just like these RFCs) - and of course, if the event is unsuccessful you can move on to your more drastic suggestions. BeanieFan11 (talk) 18:16, 27 September 2023 (UTC)
 * For others here, you should also see WikiProject National Football League/Football biography cleanup - something in the football area similar to my suggestion (albeit without awards and a timeframe), which has already got several times the number of articles improved as the Lugstubs draftification RFCs. BeanieFan11 (talk) 18:19, 27 September 2023 (UTC)
 * Note that I've set up a rough draft of my proposed contest here. BeanieFan11 (talk) 12:36, 29 September 2023 (UTC)
 * Do you have breakdowns for the rest of the lugstubs? Like, how many of the 90k meet "the lugstubs criteria" (created by lugnuts, only DB sources, no significant edits by others... I forget what else is in the criteria)? Is it like 90k or like 10k? And then, how do those lugstubs break down: how many are BLPs v. non-BLPs? How many by sport, is it almost all football? I think if we knew more about what was in the pile, we'd have better ideas for how to go through it. BTW I realize figuring this stuff out involves a ton of work, which I am not expecting you or anyone else to take on, but I'm curious how much you already know based on the work you've already put in, and how much more work would be involved in finding out the rest of the answers. Levivich (talk) 18:25, 27 September 2023 (UTC)
 * And "small" is the other part of the criteria. I haven't run a query over all of Lugnuts creations, but among Olympians there are approximately 30,000 articles that meet the criteria. Unfortunately, I can't answer the other questions you have asked; while they are all comparatively simple to answer when I have a working query base the WMF recently updated how external links are stored in the backend and so before I can produce the answers I need to fix my queries - I'll do that sometime next month. BilledMammal (talk) 18:38, 27 September 2023 (UTC)
 * I've created a number of tables that may help to answer these questions:
 * User:BilledMammal/LUGSTUBS by country (A to J)
 * User:BilledMammal/LUGSTUBS by country (J to Z)
 * User:BilledMammal/LUGSTUBS by sport (A to G)
 * User:BilledMammal/LUGSTUBS by sport (G to M)
 * User:BilledMammal/LUGSTUBS by sport (M to Z)
 * User:BilledMammal/LUGSTUBS by sport (Z to Z)
 * User:BilledMammal/BLP LUGSTUBS
 * User:BilledMammal/All LUGSTUBS (A to J)
 * User:BilledMammal/All LUGSTUBS (J to Z)
 * Unfortunately, I needed to split the pages for some of them as they exceeded the maximum page length. If anyone is interested in a list for just a single country/state/sport, let me know and I can easily provide it.
 * A few caveats:
 * The sources whose inclusion in an article was not considered evidence of WP:SIGCOV differ slightly from query to query. I found myself running up against the maximum query length, and the only way I could get under that length was to remove some of the least common sources from the query.
 * The sources are from a preliminary list, which can be found here. While I have glanced over the sources, I haven't done an in depth investigation of all of them. Note that they are in index form, as that is the form provided and used by the database, and that we can likely exclude most of them without a significant drop in the number of articles under consideration. The full list of sources used in the LUGSTUBS can be found here.
 * The articles only include sports biographies. Biographies outside sports, and non-biography sub-stubs, are not intended to be included.
 * The country count is incomplete; it doesn't include former countries, or athletes who are classified under a sub-national unit such as England, California, or Chicago. Including former countries will be relatively easy, but I'm not yet sure how to include sub-national units. As such, I will not yet be presenting a count of articles by country, as the result will be inaccurate and misleading.
 * There are:
 * 43,357 LUGSTUBS
 * 28,027 of which are BLP's
 * By sport, we have:
 * {| class="wikitable sortable mw-collapsible mw-collapsed"

! Sport !! Articles
 * Athletics || 5201
 * Australian sports || 1383
 * Badminton || 13
 * Basketball || 784
 * Biathlon || 445
 * Boxing || 1297
 * Canadian sport || 897
 * Cricket || 5624
 * Cycling || 3934
 * Fencing || 1598
 * Figure skating || 103
 * Football || 2395
 * Golf || 24
 * Gymnastics || 1312
 * Handball || 328
 * Ice Hockey || 531
 * Martial arts || 2904
 * Olympics || 33606
 * Pakistan Super League || 3
 * Professional wrestling || 1
 * Referees || 231
 * Rowing || 2700
 * Rugby union || 13
 * Running || 3884
 * Sailing || 1295
 * Skiing and Snowboarding || 1924
 * Softball || 74
 * Speed skating || 666
 * Sports || 579
 * Swimming || 3277
 * Tennis || 61
 * Triathlon || 8
 * Volleyball || 503
 * Water sports || 12
 * Women's sport || 5850
 * }
 * I suspect we are missing a few sports here, although most articles are accounted for - out of the 43,357 articles under consideration, 43,249 are counted in at least one sport. BilledMammal (talk) 12:22, 29 September 2023 (UTC)
 * This is awesome, thank you. My first thought is the obvious one: more than half of lugstubs are BLPs, and I wonder if anyone else thinks its a good idea to handle those in one group, either the same draftification proposal, or maybe allowing these to be BLPPRODed (although I'd prefer draftifying 28k articles to prod-ing them). Seems like these are the priority? Levivich (talk) 21:43, 29 September 2023 (UTC)
 * I think that would be a good idea; handle our most sensitive articles immediately, and then handle the rest at a more leisurely pace. For the purposes of actually getting the proposal through RfC one option may be to limit us to Olympian BLP's; this still covers 22,639 articles (down only 5,388) but might make the scope more palatable? I haven't created a tidy table for this, but a list of such BLP's can be seen here. BilledMammal (talk) 22:05, 29 September 2023 (UTC)
 * To brainstorm a little more; we exclude ~150 sources from these BLP lists (the issue is templates, like Authority Control, adding external links). I am concerned that this number might cause issues in any RfC, but if we cut it down to two sources there are still 16,309 articles (72% of Olympian BLP LUGSTUBS); cutting it down to ten there are still 19,704 (87%). Cutting it down to twenty leaves 21,029 (92%), and we are starting to get to the point of diminishing returns; unless we can find a way to identify which external links are provided only by templates I think we need to accept that we'll have some a non-trivial number of false negatives in the batch, just to ensure that the proposal is sufficiently easy for editors to review and understand, though I'm not sure what the right middle point is. BilledMammal (talk) 23:50, 29 September 2023 (UTC)
 * Here are some questions I'm thinking about:
 * Should there be another set of articles at all, or should the process going forward be something altogether different from looking at another set of articles?
 * If it is another set, should the proposal be for dratification or some other option(s)? (E.g., extending blp prod, csd, something else.)
 * If it is another set, should one set in particular be proposed, or should several options for sets be proposed? (E.g.: 43k lugstubs, 28k blp lugstubs, 22k Olympian blp lugstubs, 16k Olympedia/Sports-Reference Olympian blp lugstubs, or something smaller than that.)
 * If it's not another set, what else would it be?
 * Not sure where the happy medium is between flexible complexity and strict simplicity, but will give it some more thought. Levivich (talk) 04:50, 30 September 2023 (UTC)
 * Those aren't easy question. I do think we need to continue using batch processing, but there may be an alternative that I haven't been able to think of.
 * I like the idea of providing editors with several options; it could work well. I suspect the best way to phrase it would be to ask editors if they support draftifying all Lugstubs, and only if they do not do they need to consider the subsets.
 * If we do go that route I think we should continue to propose draftification, as I think introducing too many novelties in the same RfC will result in a reduced chance of consensus, but there may be a better option out there. BilledMammal (talk) 15:32, 5 October 2023 (UTC)
 * Sailing || 1295
 * Skiing and Snowboarding || 1924
 * Softball || 74
 * Speed skating || 666
 * Sports || 579
 * Swimming || 3277
 * Tennis || 61
 * Triathlon || 8
 * Volleyball || 503
 * Water sports || 12
 * Women's sport || 5850
 * }
 * I suspect we are missing a few sports here, although most articles are accounted for - out of the 43,357 articles under consideration, 43,249 are counted in at least one sport. BilledMammal (talk) 12:22, 29 September 2023 (UTC)
 * This is awesome, thank you. My first thought is the obvious one: more than half of lugstubs are BLPs, and I wonder if anyone else thinks its a good idea to handle those in one group, either the same draftification proposal, or maybe allowing these to be BLPPRODed (although I'd prefer draftifying 28k articles to prod-ing them). Seems like these are the priority? Levivich (talk) 21:43, 29 September 2023 (UTC)
 * I think that would be a good idea; handle our most sensitive articles immediately, and then handle the rest at a more leisurely pace. For the purposes of actually getting the proposal through RfC one option may be to limit us to Olympian BLP's; this still covers 22,639 articles (down only 5,388) but might make the scope more palatable? I haven't created a tidy table for this, but a list of such BLP's can be seen here. BilledMammal (talk) 22:05, 29 September 2023 (UTC)
 * To brainstorm a little more; we exclude ~150 sources from these BLP lists (the issue is templates, like Authority Control, adding external links). I am concerned that this number might cause issues in any RfC, but if we cut it down to two sources there are still 16,309 articles (72% of Olympian BLP LUGSTUBS); cutting it down to ten there are still 19,704 (87%). Cutting it down to twenty leaves 21,029 (92%), and we are starting to get to the point of diminishing returns; unless we can find a way to identify which external links are provided only by templates I think we need to accept that we'll have some a non-trivial number of false negatives in the batch, just to ensure that the proposal is sufficiently easy for editors to review and understand, though I'm not sure what the right middle point is. BilledMammal (talk) 23:50, 29 September 2023 (UTC)
 * Here are some questions I'm thinking about:
 * Should there be another set of articles at all, or should the process going forward be something altogether different from looking at another set of articles?
 * If it is another set, should the proposal be for dratification or some other option(s)? (E.g., extending blp prod, csd, something else.)
 * If it is another set, should one set in particular be proposed, or should several options for sets be proposed? (E.g.: 43k lugstubs, 28k blp lugstubs, 22k Olympian blp lugstubs, 16k Olympedia/Sports-Reference Olympian blp lugstubs, or something smaller than that.)
 * If it's not another set, what else would it be?
 * Not sure where the happy medium is between flexible complexity and strict simplicity, but will give it some more thought. Levivich (talk) 04:50, 30 September 2023 (UTC)
 * Those aren't easy question. I do think we need to continue using batch processing, but there may be an alternative that I haven't been able to think of.
 * I like the idea of providing editors with several options; it could work well. I suspect the best way to phrase it would be to ask editors if they support draftifying all Lugstubs, and only if they do not do they need to consider the subsets.
 * If we do go that route I think we should continue to propose draftification, as I think introducing too many novelties in the same RfC will result in a reduced chance of consensus, but there may be a better option out there. BilledMammal (talk) 15:32, 5 October 2023 (UTC)
 * To brainstorm a little more; we exclude ~150 sources from these BLP lists (the issue is templates, like Authority Control, adding external links). I am concerned that this number might cause issues in any RfC, but if we cut it down to two sources there are still 16,309 articles (72% of Olympian BLP LUGSTUBS); cutting it down to ten there are still 19,704 (87%). Cutting it down to twenty leaves 21,029 (92%), and we are starting to get to the point of diminishing returns; unless we can find a way to identify which external links are provided only by templates I think we need to accept that we'll have some a non-trivial number of false negatives in the batch, just to ensure that the proposal is sufficiently easy for editors to review and understand, though I'm not sure what the right middle point is. BilledMammal (talk) 23:50, 29 September 2023 (UTC)
 * Here are some questions I'm thinking about:
 * Should there be another set of articles at all, or should the process going forward be something altogether different from looking at another set of articles?
 * If it is another set, should the proposal be for dratification or some other option(s)? (E.g., extending blp prod, csd, something else.)
 * If it is another set, should one set in particular be proposed, or should several options for sets be proposed? (E.g.: 43k lugstubs, 28k blp lugstubs, 22k Olympian blp lugstubs, 16k Olympedia/Sports-Reference Olympian blp lugstubs, or something smaller than that.)
 * If it's not another set, what else would it be?
 * Not sure where the happy medium is between flexible complexity and strict simplicity, but will give it some more thought. Levivich (talk) 04:50, 30 September 2023 (UTC)
 * Those aren't easy question. I do think we need to continue using batch processing, but there may be an alternative that I haven't been able to think of.
 * I like the idea of providing editors with several options; it could work well. I suspect the best way to phrase it would be to ask editors if they support draftifying all Lugstubs, and only if they do not do they need to consider the subsets.
 * If we do go that route I think we should continue to propose draftification, as I think introducing too many novelties in the same RfC will result in a reduced chance of consensus, but there may be a better option out there. BilledMammal (talk) 15:32, 5 October 2023 (UTC)


 * I believe that for LUGSTUBS and similar topics, merging into a list is a solid compromise option. I spoke to this in the similar GEOLAND discussion that's currently active at VP/PR. It would take longer than mass draftification or mass deleting, but it would retain the basics in a way that's better organized and easier to maintain. Failing that: I would lose no sleep if every <2kb article on Wikipedia were deleted all at once. It is impossible for any article to have meaningful coverage at that size, and it would take no effort to recreate one if someone decided to seek out sources and write an actual article. To that end, the community has already signed off on the requirement that sports biographies have SIGCOV sources. On the opposite end of the issue, just yesterday I started working on reviving WikiProject Stub improvement. I don't expect it to have a meaningful effect on mass created sludge, but I was inspired in part by the recent microstub controversies and I think it could be a good outlet for those wishing to prove that a given type of stub is easy to expand. Thebiguglyalien  ( talk ) 22:40, 27 September 2023 (UTC)
 * I believe that for LUGSTUBS and similar topics, merging into a list is a solid compromise option. Agreed. I don't want to propose merger as the explicit option, as I feel it is better to leave the specifics unsaid and flexible, which both gives us more options for how we implement it and prevents the specifics getting in the way of consensus, but going forward I plan to offer to do as permitted by item 5 of the proposal like I did in LUGSTUBS2.
 * Failing that: I would lose no sleep if every <2kb article on Wikipedia were deleted all at once. It is impossible for any article to have meaningful coverage at that size, and it would take no effort to recreate one if someone decided to seek out sources and write an actual article. Nor would I; it's part of the reason I support soft-delete, as I think soft-deleting such articles would be much more palatable than hard-deleting them. My belief is that it would result in a significant benefit for the encyclopedia, both in terms of the number of substantial articles and in terms of the number of editors; I think that readers who are looking for a topic and find a micro-stub leave disappointed, without considering that they can improve it - but a reader who finds nothing is explicitly invited to fill it, and I believe are better encouraged to do so. Indeed, the reason I joined was because I went looking for the article on Yarra Falls but couldn't find it; if the article had existed as a sub-stub I doubt I would have felt the same compulsion.
 * On the opposite end of the issue, just yesterday I started working on reviving WikiProject Stub improvement. I don't expect it to have a meaningful effect on mass created sludge, but I was inspired in part by the recent microstub controversies and I think it could be a good outlet for those wishing to prove that a given type of stub is easy to expand. Hopefully it works; I see BeanieFan11 is working on something similar, perhaps their stub improvement drive can take place under the auspices of WikiProject Stub improvement? BilledMammal (talk) 00:05, 30 September 2023 (UTC)
 * I hadn't even considered that benefit of cleaning up microstubs; I was thinking about it from a maintenance versus usefulness perspective. As far as projects to fix these things, WP:WikiProject Stub improvement, BeanieFan11's drive, and WP:The 50,000 Destubbing Challenge all look like they could address different aspects of the issue in conjunction with one another. So far, I'm leaning toward trialing a "stub category of the week" at stub improvement like this. If it works out, a couple sports categories could be chosen during the drive. Thebiguglyalien  ( talk ) 16:25, 5 October 2023 (UTC)