User talk:Smallbones/1000 random results

All feedback appreciated. Smallbones( smalltalk ) 17:55, 24 February 2016 (UTC)

1001?
User:Smallbones/1000 random has, dare I say it, 1000 articles. Yet User:Smallbones/1000 random results discusses a sample size of 1001. Where did the mysterious 1001st article go (or come from)? --GRuban (talk) 19:37, 24 February 2016 (UTC)


 * I started aiming for 1000 articles, but I can't count! and selected 1002. Then one got deleted before I could gather any data on it.  See Articles for deletion/Corwin Samuel West.  However I didn't realize yet that I'd originally selected 1002, so I selected an additional article to replace it,


 * "307A	Schoenoplectus saximontanus	BHM	8 July	2010‎	61	replaced 307 December 29, 2015"

a few days later, while still gathering data, I found another article was merged (out of existence essentially)


 * "726	Pharmacological gene therapy	BHM	27 October	2005	173	Merged to Gene therapy 29 December 2015‎

merged article started 28 November 2001‎ 22835 pvs"

this was a bit more difficult - there seems to have been 2 parallel articles. Which article history to use? Not being clear on what to do and having 2 "extra" articles - the natural thing seemed to be just delete it.

I'd guess these type of fuzzy choices, based on unexpected situations, are pretty common at the start of any data set. The only real question is whether the choices biased the data set somehow. I can't see what the bias would be, and in any case it would be very small, 0.1%, and the choices were made in good faith. Smallbones( smalltalk ) 02:03, 25 February 2016 (UTC)

Deleted content?
Presumably some of the 1001 had content removed as well as added? So an article that begins with 2,000 bytes, then has 250 bytes added, 50 bytes removed, would then be 2,200 bytes.

Was there any calculation done to see if this was more / less likely in women's bios than men's bios? What the end figures for blps would have been if no content had been deleted. --The Vintage Feminist (talk) 12:08, 23 March 2016 (UTC)
 * I went back and checked the year-by-year data for each of the 54 bios in the matched set. (not each edit or anything like that, and it was very quick, I may have made a mistake or two). There were 18 articles where the size went down in 1 or 2 years (no article for more than 2 years), 8 of these were for women's bios, 10 for men's.  Most seemed to be pretty small, but one for a woman was large as a percentage from 1,868 bytes to 828 bytes.  The absolute size was larger than that for a couple of men's bios.  So there's nothing obvious jumping out about deletion of material from women's bios.
 * My remembered impression from the time of gathering the data was that articles almost always seem to grow in bytes, which is consistent with the above, but that there is a lot of rejiggering categories, where the number of letters in the category name go up usually, but fairly often go down.  I have a vague recollection of 1 or 2 incidents that were starting to look like edit wars, but don't remember the gender.
 * My first reaction to your question was a bit defensive, but I think it is worth saying. I think most editors involve themselves with only the best half of Wikipedia articles.  More than half our articles are stubs and most of these are not edited very often.  So the prospect of women's bios not growing because people come and take out material at first struck me as very unlikely.  Doing this type of investigation has shown me just how bad, and forgotten, half of our articles are.
 * That said I was rather distressed seeing that women's bios were not better, and did not improve faster, than men's. I figured that men (most editors) were just more interested in writing men's bios, and if highly motivated editors came along to write women's bios, they would do better in both ways (starting quality and improvement).  Clearly I was wrong, and the problem is much deeper.  It now seems highly likely to me that the main bias is in the source material - women's achievements were just not recorded as much and as often as men's.
 * I'll be back in a bit. But I want to add material on how you can help (me and perhaps others) understand what's in Wikipedia and how it improves.


 * give me ideas ("generate hypotheses") that can be tested on how Wiki works.
 * lend a hand and gather some data.
 * more later. Smallbones( smalltalk ) 15:16, 23 March 2016 (UTC)
 * - I'm back, and just wanted to underline that I'm trying to figure out "How Wikipedia really works" after being an editor here for more than 10 years. Sure, I know how the politicking and edit-warring, Arb-Com, and other nasty stuff works, and I've read - or at least skimmed a dozen or so academic papers. But I always felt that what really happens here is hidden under tons of edits in millions of articles, so I never got to see the forest for the trees.  I'd like to be able to help others see what's working and what's not working here.
 * I assume you're interested in increasing the number and quality of women's bios. Have you considered how that's going to work?  What kind of numbers of articles and editors are going to be needed?  What those articles will be about (in a bit more detail)?  Where those editors will come from?  The semi-formal type of analysis I'm doing won't give you the exact answers to that type of question, but will often make helpful suggestions.  If you can define more specific questions or hypotheses, I can help.
 * A possible question might be along the line of "Do we need more bios of women scientists, artists, sportswomen, singers, or actors? Or do we just need more women's bios all around?" If that type of question is of interest to you (I bet you have a better question though), I could help.
 * It does take some time, and if you know of somebody who could pitch in with manual data gathering, or perhaps is very good with bots, it would go a lot quicker, and both of us could benefit.
 * Sincerely,
 * Smallbones( smalltalk ) 17:35, 23 March 2016 (UTC)
 * Gosh, that's prompt. I asked about the deleted content because I tend to find this sort of thing and this on bios that I've written. Unexplained content removal from IP addresses and I've had individual journal articles and individual book chapters taken off with just "irrelevant" as an edit summary. If a daughter bibliography article is needed then it should be created rather than just censoring content. When I read the number of bytes for articles I just thought, "yes but those figures can go down as well as up." Thanks for your work. --The Vintage Feminist (talk) 20:45, 23 March 2016 (UTC)
 * Historic largest size vs current size. Basically, 500 are the largest they've every been and 100-150 had larger versions in their past (prior to splits, vandalism, etc.). — Dispenser 03:08, 24 March 2016 (UTC)

Question about your survey
Hi Smallbones, my name is Shira Klein, I'm a historian at Chapman University (Orange County, CA) and I'm currently working on a project related to Wikipedia's impact on society. I came across this tree map you created a while ago: https://commons.wikimedia.org/wiki/File:Size_of_English_Wikipedia_(1000_vol).svg I am trying to find a breakdown of Wikipedia by topic for the last year or two, or as recently as possible. Is this (https://en.wikipedia.org/wiki/User:Smallbones/1000_random) the latest data you found? Do you know if others have carried out such a survey more recently? I thought perhaps there would be a Wiki page providing this kind of information, but my google searches haven't yielded anything. I also haven't found anything using Google Scholar or other scholarly search engines. Many thanks in advance for any ideas you might have! Shira (feel free to email me also, at sklein at chapman.edu) --Chapmansh (talk) 18:27, 26 February 2022 (UTC)