User talk:Sean.hoyland/Archive 16

Ramat Eshkol is located in Northern Jerusalem, not East Jerusalem
I included the fact that it's an Israeli settlement, and changed the location of the neighborhood from East Jerusalem to Northern Jerusalem. I also included the population of the neighborhood. Wikieditor738 (talk) 12:30, 30 January 2024 (UTC)


 * Did you read WP:ARBECR? If so, why are you editing the article rather than making an edit request on the talk page? Sean.hoyland (talk) 12:40, 30 January 2024 (UTC)
 * Where does it say that only extended-confirmed users can edit on the Ramot Eshkol article? Wikieditor738 (talk) 12:45, 30 January 2024 (UTC)
 * At the top of the talk page. But you should be aware that all articles "related to the Arab–Israeli conflict" broadly construed can only be edited be extended-confirmed users even when that ARBPIA template is absent and even when extended confirmed protection has not been implemented on the article. Israeli settlements are clearly within scope of the restrictions. You should undo your edits and make an edit request. Someone will handle your request. Sean.hoyland (talk) 12:53, 30 January 2024 (UTC)
 * , this edit indicates that you know how to undo edits. You also now know that WP:ARBECR applies to the article and that you should undo your edits and make an edit request. Are you going to do that? Sean.hoyland (talk) 14:06, 30 January 2024 (UTC)
 * Ok Wikieditor738 (talk) 14:09, 30 January 2024 (UTC)
 * I undid the edits Wikieditor738 (talk) 14:11, 30 January 2024 (UTC)
 * Thank you. Sean.hoyland (talk) 14:15, 30 January 2024 (UTC)

Can I borrow/steal your visualiser tool?
You used it ‚on me‘ and I thought it was really interesting. Is it something someone who is a little bit of a Luddite can use, and if so, would it be possible to get it?

Thanks in advance! FortunateSons (talk) 02:40, 13 February 2024 (UTC)
 * Unfortunately it's still very much work in progress, along with a number of other things I'm trying to work on mainly to help myself understand the state of affairs in ARBPIA. I wasn't planning on releasing anything, it's more about looking at what might be possible and useful. Also, I'm not a big fan of the Wikimedia Cloud Services development environment so I'm moving things over to my own environment. But let me know if you would like to see that kind of information for any editors and I'll generate the plots. Sean.hoyland (talk) 09:59, 13 February 2024 (UTC)
 * I understand, thank you very much for the kind offer! FortunateSons (talk) 10:03, 13 February 2024 (UTC)
 * Hi. Neat graphs. Can you do me? Is it possible to do my first 500 and my last 500? I'm just curious what the graphs would look like and if there's a difference. Thanks! Levivich (talk) 22:07, 15 February 2024 (UTC)
 * Sure, will do. Bear with me as I'm in the middle of migrating/rewriting/repeatedly breaking the code. Sean.hoyland (talk) 01:40, 16 February 2024 (UTC)
 * Here's your first 600. I'm connecting to Wikimedia's replica databases rather than the API, so they may not have your very latest edits, but I'll have a look. Sean.hoyland (talk) 01:52, 16 February 2024 (UTC)
 * Cool, thanks! Levivich (talk) 02:45, 16 February 2024 (UTC)
 * Last 500 edits
 * Last 20000 edits for interest.
 * I'm still a bit confused by how Wikipedia counts 'edits'. The account information web page says your enwiki edit count is over 36,000. But both the replica database (which is up to date right now) and the API (which is live) agree that your enwiki revision count is more like ~32740, including new pages and deleted revisions. And I just pulled all of your contributions out the API without filtering, so I'm puzzled by the mismatch... I must be missing something. Sean.hoyland (talk) 08:43, 16 February 2024 (UTC)
 * The privileges required to see those particular deleted edits is the missing something. Sean.hoyland (talk) 10:41, 16 February 2024 (UTC)
 * Thanks again for doing this--the visualization is interesting. Levivich (talk) 04:42, 20 February 2024 (UTC)
 * There's also this one for interest, an edit count heatmap - day of week vs time of day, for all of your edits...or at least the 32,777 edits the system hasn't hidden behind an event horizon. Sean.hoyland (talk) 08:41, 20 February 2024 (UTC)
 * I'll never tell what's in those 3,470 edits! muahahahaha
 * (Actually I think it's five years worth of deleted sandbox pages.)
 * Thanks once again! I think your data driven approach could help bring some objectivity to the whole "gaming" issue. Levivich (talk) 15:38, 20 February 2024 (UTC)
 * Hi sorry to bother you again, but do you know if it's possible to get cumulative lifetime "page byte size change", i.e. how much total text I've added/removed from mainspace? I'm curious what portion of the encyclopedia I've written/deleted, how many 0's after the decimal point. And then I'm curious who's written the most. Do you think that's possible to figure out? Levivich (talk) 01:50, 21 February 2024 (UTC)

<- You might be interested in Wikimedia's "xtools" if you haven't seen them before. There's a whole bunch of handy tools in there. They don't quite address your questions though.


 * "do you know if it's possible to get cumulative lifetime "page byte size change", i.e. how much total text I've added/removed from mainspace?" The total bytes figures are easy. How much text that actually represents is hard to say exactly because the number of bytes per character depends on the character, but assuming most of that text is made up of normal ASCII letters, numbers etc. it will by roughly one to one, one byte for one text character I think. See below for additions and deletions for all workspaces.

Bytes added.

Bytes deleted

The other 2 questions... ...are more difficult, and I'm not sure how to do that. It's easy to see the size of mainspace, just the current enwiki article content, excluding images, talk pages etc., and it's small enough to fit on a phone, 60084661068 bytes, about 60Gb it seems. But that doesn't really help because the size of each article and what portion of a user's contributions are preserved over time are dynamic and not easy to track. With xtools you can see that kind of information for individual articles e.g. Nakba (although I assume that is looking at byte volumes per user rather than how much of each user's contributions have been preserved). Sean.hoyland (talk) 11:48, 21 February 2024 (UTC)
 * "what portion of the encyclopedia I've written/deleted"
 * "who's written the most"


 * Oh, you did it! Thanks! Yes I know xtools, and I think it'd be very cool if someone figured "total authorship" rather than just by-article breakdowns. (Of course, my net +2 MB to mainspace, divided by 60GB current size of mainspace, is not my total authorship percentage, as that would be way too high, and the total amount of text added by everyone would be orders of magnitude higher than 60GB.)
 * Which API are you using? WikiWho? I assume you can't use that API for all 6 million articles, but in theory, it should be possible to download a recent DB dump, and then use WikiWho's code on all 6 million articles (locally), and calculate a cumulative total... right? I've played around with pywiki and the APIs a little bit but have never attempted working with a DB dump. I bet you could test whether the pareto principle (80/20 rule) applies to Wikipedia. Hypothesis: 20% of editors are reaching 80% of readers. Levivich (talk) 19:05, 21 February 2024 (UTC)
 * There are so many nice tools available to look at Wikimedia data. I'm not using any of the APIs right now for the most part, I'm using the replica databases (although, like for the APIs, you still need to do it in a server friendly way, by pulling the query results out in bite-sized chunks, or else it will eventually complain). I have a Wikimedia developer account and Toolforge membership so I ssh to a Toolforge server from my own environment (in VSCode), and open an ssh tunnel to the MySQL server behind that machine hosting the replicas. But you don't need a developer account or Toolforge membership or a local DB dump to talk to the replica databases. You can use Quarry, or Apache's Superset, or the PAWS JupyterHub deployment. I started by using the PAWS environment but you don't have freedom of choice for libraries you can keep in your environment there. I'm trying to make things that can use both Pandas/NumPy and Polars/Apache Arrow to do things, but the administrator does not seem to like Polars being there and keeps removing it, hence the move from the bare-bones PAWS environment to my own environment.


 * It's a bit of a rabbit hole, once you start looking around at all of the data. I started looking because of the canvassing. I'm curious whether it's possible to assemble a set of (biometric-like) signals from user data so that I can ask a model "Who's this?" and it says "It's probably a sock of account A because of X, Y, Z". It's a deep rabbit hole. I should probably be doing something more interesting like maybe looking at the combinatorial space of 8 quarter notes, 8 eighth notes and 1 eighth rest constrained to an A major scale from which Bach selected the opening 16 notes of the 2nd movement of BWV 1015 and listen to all the combinations he didn't choose. We have such nice tools nowadays. It almost seems foolish to use them to look for sockpuppets etc.


 * I hadn't even noticed WikiWho, so thanks for that tip. I suspect the pareto principle will apply, but maybe the 20% will all be bots. Sean.hoyland (talk) 05:19, 22 February 2024 (UTC)


 * Having looked at the WikiWho API, which is a whole new rabbit hole, I think it probably is possible to use it to rank authorship. It would boil down to token counting I guess. With their limit of "2000 requests/day for unregistered users, and also a 60 requests/minute limit for all users" it might even be possible to get a good first approximation from the replicas by targeting a sensible subset of enwiki. Their WhoColor API is another new rabbit hole - it would be interesting to color code page histories based on things like EC privileges, whether an account was blocked as a sock etc. Sean.hoyland (talk) 09:31, 22 February 2024 (UTC)
 * It's an interesting rabbit hole though isn't it? :-) I'm guessing that however done, if I wanted to pull authorship data for 6 million articles, I should be querying a local dump and not somebody else's server, because somebody else would get mad, even if I did it in batches? I know next to nothing about APIs or API etiquette. But anyway, a sensible subset is the way to go, and frankly for my part I wouldn't care about all 6 million articles, knowing that there is a "long tail" of stubs and articles with <10 page views a day, etc. I would start with the 1,000 articles with the most page views and work downwards from there. Or alternatively the 1,000 articles with the most edits. Or maybe just WP:VA3 and up. Any set of well-read, well-edited articles would do. And yeah we're going to find out it's mostly Cluebot writing the encyclopedia :-D Levivich (talk) 19:59, 22 February 2024 (UTC)
 * One of the use applications is to find "damage" caused by UPE, COI, or POV-pushing editors or sockfarms or meatfarms ("cabals"). Say, for example, Arbcom blocks a group of accounts for whatever reason (UPE, COI, off-wiki canvassing, etc.). We can find what pages those accounts have edited, either alone or together, but it would be nice to know which pages those blocked accounts have high authorship of, as those would be the first to check manually.
 * Going further, if had some data about sort of what "average" or "normal" authorship looks like across a large sample of articles, and we had some data about what authorship looks like for, say, UPE'd or COI'd articles, that might help identify "biometric"-like patterns that could then be used to find previously-unknown problems.
 * For example I'd bet that the "authorship profile" of a known UPE-farm would look different even than the authorship profile of five editors who are interested in comic books and edit comic book articles together. You'd expect comic book articles (or any niche area) to be mostly authored by the same few editors, but I bet bona-fide UPE-farms have even a "higher" level of authorship than groups of fans or similarly-interested editors working in niche areas. Know what I mean? Levivich (talk) 20:10, 22 February 2024 (UTC)
 * So much to think about here. I'm thinking along similar lines on finding ways to measure "damage". The fact that an undercover NGO Monitor employee has the highest edit count at the NGO Monitor article doesn't seem ideal. Sean.hoyland (talk) 11:25, 24 February 2024 (UTC)

BC
I knew that editor as Daveout, and tbh I had no idea at all.  nableezy  - 15:01, 26 February 2024 (UTC)
 * Their editing habits are a bit odd. Most people's editing statistics produce nice probability density functions that reflect the fact that we do sensible things like go to sleep, wake up, commute, have lunch etc. But theirs is just a wall of edits.
 * BanyanClimber - Edit sequence range 1 to 5068
 * BanyanClimber - Edit count heatmap - day of week vs time of day
 * Sean.hoyland (talk) 15:26, 26 February 2024 (UTC)

Borrowing your software tool (again)
Some of the edits by User talk:Gimmethegepgun seem fishy to me, and at least imply a rather rapid change in editing style after EC. Would you be so kind as to check them for EC-Gaming? FortunateSons (talk) 12:43, 5 March 2024 (UTC)


 * Gimmethegepgun - Edit sequence range 1 to 823
 * Gimmethegepgun - Edit count heatmap - day of week vs time of day
 * Gimmethegepgun - Revision time of day vs timestamp - color by namespace

Bytes added by namespace.

Bytes removed by namespace. Sean.hoyland (talk) 13:09, 5 March 2024 (UTC)


 * Then it’s harmless, thank you! :) FortunateSons (talk) 13:46, 5 March 2024 (UTC)

The flour massacure
Hello there mate, on what grounds did you take down my edit in the talk about the flour massacure? --Amir Segev Sarusi (talk) 13:32, 13 March 2024 (UTC)
 * It's because you don't have the extended confirmed privilege. It's a requirement for that page, and anything related to the Israel-Palestine conflict. Have a look at the WARNING: ACTIVE ARBITRATION REMEDIES section near the top of the talk page and WP:ARBECR. You can make edit requests. See WP:MAKINGEREQ. Sean.hoyland (talk) 17:27, 13 March 2024 (UTC)

Thanks :)
Hi Sean

Thanks very much for your help with the Artists4Ceasefire article, as I mentioned I saw some edits that also looked strange but again I do not know this topic area at all, if you are interest to look here are a few, I'm just flagging them because they are large removals of text without an edit summary in a similar topic area. Honestly I wouldn't know how to identify any conspiracy theories added in this area so the stuff deleted without explanation jumped out at me.


 * https://en.wikipedia.org/w/index.php?title=Palestinian_genocide_accusation&diff=prev&oldid=1189380850
 * https://en.wikipedia.org/w/index.php?title=Assassination_of_Sadegh_Omidzadeh&diff=prev&oldid=1197402098
 * https://en.wikipedia.org/w/index.php?title=Moshe_Klughaft&diff=prev&oldid=1206931198
 * https://en.wikipedia.org/w/index.php?title=Moshe_Klughaft&diff=prev&oldid=1206929163

Thanks

John Cummings (talk) 09:55, 14 March 2024 (UTC)

SPIs
I am confused about SPI. Although the Signs of sockpuppetry list is exhaustive, using them as arguments to request a SPI is not taken into consideration. It seems that in practice only one criteria allows an SPI: whether the user has restored edits by a banned SP. And then when that does happen and an SPI is opened, an IP check and also edit analyses are performed, which could warrant an affirmative or possible match that results in a ban. Is that correct? And what are my options to investigate whether two accounts that are likely to be SPs despite the lack of restoring edits by banned SPs criteria? Makeandtoss (talk) 12:11, 10 March 2024 (UTC)
 * is probably better placed to advise, but I've filed a number of SPIs and I'm not sure I've ever included evidence of someone restoring edits by a banned user. It's certainly not a prerequisite. In general, I would say, it's useful to adopt the same approach as many prosecutors i.e. don't file a case unless and until you have a reasonably high, evidence-based, confidence of success. For reasons that have never really made sense to me, so-called fishing expeditions are not allowed. Suspicion, and even evidence of sockpuppetry, usually isn't enough for me. I have to have a reason to file a report. The user has to be doing something wrong, something harmful. Things that make the topic area worse like persistent POV pushing (civil or not), aggression and deception get my attention. The cases that seem to work best are the ones that keep it simple. Trying to limit the evidence so that it is just enough to justify the investigation seems to help. You can always add more later. Also, you can ask people to critique the report before you file it to check for weaknesses, ways to improve it. It's hard to say what kind of evidence works best given that it is always circumstantial evidence, always behavioral evidence, because that's all we have, especially when checkuser results are unavailable, ambiguous etc. The "Possible signs" list seems pretty complete. And I think it pays to be quite skeptical about that feeling that the evidence that you personally find compelling is, in fact, compelling for anyone else. Sean.hoyland (talk) 13:55, 10 March 2024 (UTC)
 * I had a pretty convincing SPI with all the signs and the evidence listed last time and it was not considered specifically because they had not reverted banned SPs.
 * As for how to build evidence, what are the options? Would asking to use one of the tools you’re using be considered bad faith if it turned out to be false, especially since the usernames in question would have to be publicized?
 * Plus I am not sure what tools are you using. Are they your own, or WP’s? Can I use them discreetly? Makeandtoss (talk) 14:06, 10 March 2024 (UTC)
 * What was that SPI that was not considered?  nableezy  - 15:01, 10 March 2024 (UTC)
 * Yes, I'm curious which case that was too. I think if someone suspects a user is a sockpuppet of a banned user there's no harm in doing some investigation. You don't need to publicize anything unless you file an SPI. My Wikipedia email is always enabled if you want a second opinion. As for tools, none of them are great. I'm trying to build my own, but identifying sockpuppets is quite a challenge, especially as some of them have become quite good at evasion and it's difficult to separate signal from noise, or even know where to look for the signals...there's so much data. But the tools that I find useful are...
 * The 'Strike out usernames that have been blocked' gadget in preferences. That helps to quickly see accounts that have already been blocked at pages that the suspected sock likes to frequent.
 * The classic Editor Interaction Analyser. Although article overlaps are not necessarily significant, they can find intersections that have a low probability of happening by chance. That kind of tool seems most useful (and effective at SPI) when you have a large set of data for someone's sockpuppets. Many intersections with multiple blocked socks is something that seems to help.
 * I make this kind of web page to help me navigate a set of socks. Having links out to their contributions and seeing which accounts left the largest footprints can help with investigation.
 * The XTools edit counter can help sometimes to get a general idea of the user. The idea of a timecard is useful for comparing editors, but I don't like the XTools implementation. Circle size just doesn't work for me so I make this kind of display, which I find easier to understand.
 * Sometimes I look at edit summary patterns, tone, things like that, but I rarely include them in SPI reports. Sean.hoyland (talk) 16:24, 10 March 2024 (UTC)
 * Also, for interest, I think this ANI discussion and the associated SPI, are quite good examples of what not to do. Sean.hoyland (talk) 00:51, 11 March 2024 (UTC)
 * this SPI Sockpuppet investigations/Marokwitz/Archive. I think some of them were later indefinitely topic banned from IP articles for canvassing if I remember correctly. Makeandtoss (talk) 12:47, 11 March 2024 (UTC)
 * Okay thanks for sharing the links for the tools, I just used the tools and seems like a false alarm. Could be canvassing rather than sock puppetry. Would it be controversial if I share the edit link that raised suspicion here? Makeandtoss (talk) 13:29, 11 March 2024 (UTC)
 * Well I dont think that SPI actually has evidence of socking, just evidence of people from the same area having similar views. I think looking at those editors would show that they all edit in very different ways, have different tones, different levels of English competency. You need to be able to show there is suspicion that they are the same person, not just that they have similar views or even that they are vote-stacking.  nableezy  - 14:18, 11 March 2024 (UTC)
 * I don't think it would be controversial to share a link. Also, I agree with Nableezy on that SPI report. I find actively looking for differences, not just similarities, is useful. Sean.hoyland (talk) 15:08, 11 March 2024 (UTC)
 * and Sean.hoyland: it was this comment on the talk page, which was swiftly followed by a supportive response; by someone who seems to have read the comment, read the linked article, quoted a passage in it, and wrote a response-all of that in four minutes. Both accounts are editing Jewish/Australia-related articles, which raised suspicions even more. The second account barely edits to WP articles, instead spending their time on talk pages and arbitration. But now having used your tools, seems an unlikely case of SP. Could be canvassing. Makeandtoss (talk) 13:26, 12 March 2024 (UTC)
 * Sean.holyland can you please remind me which was the tool you were using to check activity post 500 edits? Also this seems to be a special case where nonsensical edits were made on a sandbox, and suddenly jumped to ARBPIA articles after 7 October 2023 even before reaching 500 edits. . Makeandtoss (talk) 13:44, 12 March 2024 (UTC)
 * Maybe one of these plots. Did you see this? The way they work on enwiki, in subpages/sandboxes, matches the way they work on ptwiki. Sean.hoyland (talk) 16:20, 12 March 2024 (UTC)
 * You seem to have a specialization in data analysis; personally, it takes me time to understand these plots, if I do at all. What about them? As for the EC warning, I just saw it, and noticed the match. It's becoming quite irritating how clusters of users who seem to have the exact same opinion are appearing at the same time on certain discussions. Makeandtoss (talk) 10:38, 13 March 2024 (UTC)
 * Actually, I have no experience analyzing this kind of people-language-centric data. So, the "it takes me time to understand these plots, if I do at all" applies to me too. It's very different from the data I'm used to, which is essentially all about rocks. It seems rocks and people are quite different, so for the wiki data I'm very much in the data exploration phase. Change in activity after 500 edits is often clear on these kind of plots, but the committed rule breakers know not to change their behavior or focus too much on the topic area. It's probably always been the case that clusters of users with roughly the same opinion appear at the same time on certain discussions in ARBPIA. It seems to be part of the dynamics of the system here. Sean.hoyland (talk) 05:03, 14 March 2024 (UTC)
 * Yeah maybe I should rely less on my hunch and more on the tools linked above. Thanks for the help. Makeandtoss (talk) 17:05, 14 March 2024 (UTC)

Question
About the ARBCOM tag, are the 500 edits inclusive of talk page edits? And are editors who did not reach 500 edits allowed to edit or vote on the talk pages of ARBPIA articles? Makeandtoss (talk) 13:16, 24 March 2024 (UTC)
 * Yes, I think includes edits anywhere on English Wikipedia. It's subject to EC gaming review of course and I've seen cases where editors have had their edit count effectively reduced because they made lots of edits in ARBPIA before they were extendedconfirmed. They certainly can't vote or comment at RfCs, AfDs, RSN etc. They can only submit edit requests that are supposed to follow the WP:EDITXY guide. In practice, if a non-EC editor makes a very straightforward reasonable request on a talk page to fix something like a typo, people mostly seem to give them a pass and handle the request. Sean.hoyland (talk) 13:53, 24 March 2024 (UTC)

Nomination of Where is Kate? for deletion
A discussion is taking place as to whether the article Where is Kate? is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.

The article will be discussed at Articles for deletion/Where is Kate? (3rd nomination) until a consensus is reached, and anyone, including you, is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the article during the discussion, including to improve the article to address concerns raised in the discussion. However, do not remove the article-for-deletion notice from the top of the article until the discussion has finished. IgnatiusofLondon ( he/him • ☎️) 11:51, 1 April 2024 (UTC)

Do you think Islamophobia is at least partially in the I/p area
Looking at it I think some parts are, but I could be wrong Thanks  Doug Weller  talk 19:37, 11 April 2024 (UTC)
 * I agree that some parts are in the topic area, or at least in the fuzzy lawless border regions. There are references to pro-Israel groups, pro-Palestinian groups, Zionism, a citation with the title "The Islamophobia Industry and the Demonization of Palestine: Implications for American Studies" etc. It seems close enough to fit into the "related content" set (Arbitration/Index/Palestine-Israel_articles) and could probably do with a template. Sean.hoyland (talk) 01:22, 12 April 2024 (UTC)
 * Thanks. That's my opinion exactly. Doug Weller  talk 07:14, 12 April 2024 (UTC)

Possible sock?
I noticed you reported BasedGuy for being a sock and since then another account have appeared with a quite similar editing pattern. Just notifying you just in case. - UtoD 15:57, 15 April 2024 (UTC)
 * Yes, thanks. I noticed that editor at Battle of Hamad. Not sure what to do about it yet. I'm curious how they noticed the article within half an hour or so of its creation. Blocking these kinds of editors is evidently ineffective. Even extended confirmed protection won't keep them away from the ARBPIA topic area for long given their dedication. Sean.hoyland (talk) 16:22, 15 April 2024 (UTC)

Signature
Thank you for the feedback on the AE post about me. I think you forgot your signature though. JDiala (talk) 19:10, 3 June 2024 (UTC)
 * Oops. Thanks for letting me know. Sean.hoyland (talk) 04:19, 4 June 2024 (UTC)

Quick request regarding AN comment
Thank you for your input at AE.

I would like to fix the issue you addressed at AE here. For me, it links to the right place, but I guess it doesn’t for others? Would this link to the right place? If so, am I just allowed to fix it after others have responded? FortunateSons (talk) 16:56, 3 June 2024 (UTC)
 * That works for me now. Yes, I think you can change a link after responses when it's a change like this. More of a fix than a change. More precision. It's helpful. Sean.hoyland (talk) 17:07, 3 June 2024 (UTC)
 * I will fix it, thank you.
 * I know we don’t always see eye to eye on content, but you thinking of me as an asset to the topic area is meaningful to me! FortunateSons (talk) 17:28, 3 June 2024 (UTC)
 * You explain your reasoning and organize it. This is very helpful in the topic area, I think. Sean.hoyland (talk) 18:01, 3 June 2024 (UTC)
 * Thank you very much :) FortunateSons (talk) 18:03, 3 June 2024 (UTC)
 * Thats probably a kind of dumb question, but the contributions at AE feel like a joined a conversation late. Did I miss something? Was there a specific sock-related issue that I am unaware of? FortunateSons (talk) 07:50, 7 June 2024 (UTC)
 * Sort of related but more about the notion of casting aspersions...JDiala said "Two people on that thread arguing against me are proven or suspected sockpuppets (Galamore and ElLuzDelSur)." as part of their AE comment. There's no evidence of socking as far as I'm aware, hence The Kip's comment. My comment is more about structural issues/unintended consequences. Sean.hoyland (talk) 08:30, 7 June 2024 (UTC)
 * Ah, that does make more sense, thank you. Do you think socks in the I/P area are still a major issue? I feel like SPI is pretty good at catching them FortunateSons (talk) 08:34, 7 June 2024 (UTC)
 * Well, I'm not quite sure how to answer that question. Let me split it into 2 parts. I think there are multiple topic ban evading socks active in the topic area right now and I don't think the EC requirements have proven to be an effective barrier in that regard (despite being one of their objectives) for a variety of reasons, at least for the dedicated socks. But I'm not sure about the extent to which it can be described as "a major issue" because it depends on the behavior of the suspected sock, and that varies a lot. The question of when to file an SPI report, when it is "the right move", "for the best" etc., is not clear in my mind. Should it be in every case or only when there is problematic behavior? Blocked socks can simply come back, and they do, again and again, for years. It would better if there were a technical barrier, but that's difficult. One of the things that concerns me is that people tend to focus on easy problems they can solve rather than more difficult and perhaps more significant problems. Focusing on tone over honesty may be an example. If I think about a question like "is it a major issue", it depends on relative values, content creation vs honesty vs disruption vs bias etc. It's confusing, especially because socks come back. On the one hand I don't think the presence of well-behaved ban evading socks is intrinsically a major issue, on the other hand I think the presence of people willing to employ deception, regardless of what they do here, is a major issue, partly because it creates 2 classes of editors, the sanctionable and the unsanctionable. Sean.hoyland (talk) 09:28, 7 June 2024 (UTC)
 * That does make sense, particularly making the distinction between „well-behaved“ (thereby indirectly harmful) and directly disruptive socks, as well as different ways of measuring harm to the project. I don’t believe that there is a perfect answer to this sort of question, and would (as a person, not as an editor) probably also draw a distinction between those that were simply blocked or tbanned for technical violations and those that caused severe harm to individual members of the project or the project as a whole, for whom I think some sort of quiet return should be impossible (to any part of the project, not just en.wiki).
 * Some technical measure would be really nice, but I know a few people that do “disruptive activism” on social media sites, and even the very large platforms struggle with effectively suppressing this sort of stuff, so I’m not sure if it’s doable for us. However, if it could be done, it would be great to add some more sting to the sanctions.
 * I agree with deception being more important than tone, but (acknowledging my own lack of technical skills) I don’t think we can truly know with people who are decently competent, thereby creating an evolutionary pressure instead of an actual remedy. Personally, I know that I myself focus mostly on requesting actions to be taken about tone and “surface-level-conduct” issues, but do know that those dealing with sockmasters are probably doing a lot more per minute of time spend. FortunateSons (talk) 12:27, 7 June 2024 (UTC)


 * As for whether the community is good at catching them via SPI, one way to address that question is to look at the delay between registration and blocking as a sock, how long was the sock's account usable. Things don't look great from that perspective e.g. one sockmaster or another sockmaster Sean.hoyland (talk) 11:49, 7 June 2024 (UTC)
 * That’s quite interesting, thank you, I always really enjoy looking at this sort of data. Just for curiosity, are the ones that get away mostly sleeper agents that get ‘activated’, or are many of those active for 1000s or 10.000s of edits before getting caught? FortunateSons (talk) 12:30, 7 June 2024 (UTC)
 * 1000s of edits before getting caught is not very unusual e.g. here, you can see some edit counts. Sean.hoyland (talk) 12:45, 7 June 2024 (UTC)
 * Hmmm, that is quite concerning. However (even with survivorship bias), it doesn’t seem like they are getting much better on duration, only on edit count, so that’s nice. Do you think that some technical measures could be promising? FortunateSons (talk) 12:51, 7 June 2024 (UTC)
 * I don't really have any idea what proportion of socks for any given sockmaster are detected, so that's a problem. Visibility into this issue isn't great. As for technical measures, the LITIS labs' SocksCatch could detect between 90% and 95% of sockpuppets depending on the approach used, and that was a long time ago now in machine-learning-time. However, Wikimedia doesn't appear to have followed up on it. Sean.hoyland (talk) 13:06, 7 June 2024 (UTC)
 * That makes sense.
 * It is unfortunate that they didn’t follow up on it, it does look promising. Do you know why? It’s not like there is a lack of money FortunateSons (talk) 13:11, 7 June 2024 (UTC)
 * Maybe because of the state of this workboard. Sean.hoyland (talk) 13:43, 7 June 2024 (UTC)
 * I know very little about tech, but I’m guessing there is not supposed to be this much stuff? FortunateSons (talk) 13:51, 7 June 2024 (UTC)
 * They look very busy, working on all sorts of ML related things. That's good I guess. Sean.hoyland (talk) 13:58, 7 June 2024 (UTC)
 * That makes sense, thanks FortunateSons (talk) 14:03, 7 June 2024 (UTC)
 * btw, I should clarify that I think concerns about tone in the topic area are important and there does need to be some kind of moderating force to keep things within limits or else it just gets unpleasant to do anything there. Sean.hoyland (talk) 12:07, 7 June 2024 (UTC)
 * Of course, I got that, don’t worry. I’m am mindful of not dragging someone somewhere for a single conduct violation (particularly if they have no prior warnings), but I find some types of “continuous low-lewel disruption” in combination with tone issues to be unpleasant enough for all involved that it’s just better for the project if that kind of editor edits something else for an indefinite (not not necessarily infinite) amount of time. FortunateSons (talk) 12:36, 7 June 2024 (UTC)

Fyi
@Sean.hoyland I tried to answer your question at WP:RSN itself since may be if any other notice board reader would be curious to understand what's being discussed. Since my own answer was detailed I did collapse template our discussion. I hope and request you do not mind me collapsing the discussion there.

Last not but least request to join in to develop the draft is very sincere and genuine one. Happy editing &#32;Bookku   (talk) 05:13, 9 June 2024 (UTC)


 * Also, though draft article is largely of philosophical and theological and much less political side too it, pl. watch list the draft article and keep me informed, whenever possible, if any aspect of draft article comes under CTOP and would need correction at any time. &#32;Bookku   (talk) 05:29, 9 June 2024 (UTC)


 * , thanks, no, of course I don't mind you collapsing the discussion. And thanks for your thoughtful response and invitation. Sean.hoyland (talk) 05:38, 9 June 2024 (UTC)

editor similarity
Hi, you mentioned some method for estimating the distance between editors in "metric space". I'd be curious what you mean by this and where I can find more information. Thank you! DMH223344 (talk) 15:29, 21 June 2024 (UTC)
 * It's a bit of a long story. Years ago, I read about some work using machine learning to identify socks. It has been rattling around in my mind ever since. Unfortunately, I don't have access to a bunch of A100s and the enwiki database is too big, so machine learning is not an option. But the fact that the system could apparently discover features that enabled it to detect 90%+ socks suggests that it is possible to do some kind of proximity ranking of accounts, albeit in the very high dimensional space of a neural network. Since then, I've been wondering what those features were (the network is a black box so I have no idea) and whether looking at distances between editors in much lower dimensional spaces might still be able provide clues about sockpuppets. I've just started looking at this, and it's a bit of a rabbit hole, but it might have some potential. My comment was based on this test output (I've anonymized it here). I suggested a match based on the low Wasserstein distance between the editors in a particular space (I'll omit the details). I really have no idea whether it is a good match because my test set is small right now, I just happened to be looking at the test output at the time (although the Editor Interaction Analyser suggests it might be a decent match). It's possible to construct all sorts of spaces from editor data and I don't know which ones could be useful. Also, there are many things that I haven't gotten around to doing and aren't clear to me yet, like the relationship between proximity and dimensionality, how many samples per editor are needed (I have to pull them out of the database), can this approach 'predict' the outcomes of previous SPI reports etc. Sean.hoyland (talk) 17:19, 21 June 2024 (UTC)
 * interesting! one thing we should be very aware of with any method like this is the false positive rate. It's of course essential to have an idea of how well we've controlled this rate using the proposed method. There are of course many ways we can do that, depending on what data we have and how it was collected.
 * some brief comments without know much of the details of what you're describing: I would *not* expect a neural network based approach to work well for this task. More generally, I would not expect a supervised approach to work well. I also dont think gpus are required, although I'm guessing some approaches might not be open to us without access to many cpus and large working memory.
 * Feel free to share any other details you might have, even if you consider them overwhelming or disjointed thoughts. DMH223344 (talk) 17:50, 21 June 2024 (UTC)
 * In general, I would say that a method which tries to identify pairs of user accounts that are socks will not work unless we make some strong assumptions or come up with a clever way to identify candidate pairs. Otherwise our FPR will likely be huge. DMH223344 (talk) 17:53, 21 June 2024 (UTC)
 * I would not have expected a ML based approach to work well for this task either because I assumed the signal to noise ratio would be problematic. And yet it does work, quite well it seems. There are several papers. The fact that these systems are doing something (opaquely) that works is encouraging in the sense that it shows there are features there and not just in our imaginations when we see patterns connecting users. I'm trying a no assumptions, just do math approach. Sean.hoyland (talk) 18:17, 21 June 2024 (UTC)
 * I dont disagree that an ML based approach would work. It was the supervised framing that I was unsure of. but of course I could be wrong! DMH223344 (talk) 19:04, 21 June 2024 (UTC)

Technical question
Hey, how many edits per account are required so that your software can actually (relatively reliably) group them together? Can a “full picture” be created out of multiple low-edit accounts? FortunateSons (talk) 09:14, 24 June 2024 (UTC)
 * I really have no idea. At this stage I don't have any evidence that it could reliably group identical twins who shared a room, edited together, with a passionate interest in rare Mongolian hats. I'm in the stumbling around in the dark, bumping into things phase where I keep realizing 'oh, to do that, I need to be able to do this first, but how?' My normal approach to problem solving is to not think about it and do something else. Puzzlingly this works quite well for mysterious reasons, albeit slowly. As for the question 'Can a “full picture” be created out of multiple low-edit accounts', maybe, but I guess only if there is high confidence that combining them boosts a signal for a single source rather than introduces noise. Sean.hoyland (talk) 10:37, 24 June 2024 (UTC)
 * That's unfortunate, but if I find two editors with an obsessive interest in Gugu hats, I'll let you take a swing at it. :)
 * I can definitely relate to this style of problem solving, and wish you the best of luck with the work.
 * I'm asking because I seem to have made an acquaintance trying to make me join into their topic-specific discussions through improper means, and while I'm appropriately dealing with the messages themselves, I was curious whether the half dozen accounts with ~10 edits each are enough to create a profile that can be compared to others with an above-random chance of success? Some of them caught joint checkuser blocks and some haven't, so... FortunateSons (talk) 11:15, 24 June 2024 (UTC)
 * Well, it's always nice to make new friends I guess... That kind of behavior using disposable accounts to gain email access for canvassing is characteristic of Category:Wikipedia_sockpuppets_of_AndresHerutJaim, the guy that caused this ArbCom case. Sean.hoyland (talk) 12:36, 24 June 2024 (UTC)
 * Yeah, that’s the guy that some got CU’d for. I had only skimmed the case before, and didn’t connect the name and the person. FortunateSons (talk) 12:57, 24 June 2024 (UTC)
 * I would be interested to know which ones have checkuser blocks, to see how they were logged and categorized. Inconsistencies in logging and categorization can be a bit of a weak link, making potentially valuable information invisible. Sean.hoyland (talk) 12:50, 24 June 2024 (UTC)
 * I’m happy to provide the usernames of all who have send me messages through e-mail, assuming that’s permitted? FortunateSons (talk) 12:55, 24 June 2024 (UTC)
 * In the ArbCom case they said "Private evidence (including emails) can be sent to the Committee at arbcom-en@wikimedia.org". I think they still want people to let them know about inappropriate canvassing emails. I assume it's okay for you to list the usernames here. There's no loss of privacy. But what do I know? Very little apparently, so it might be worth verifying with someone who knows what they are doing first. Maybe ScottishFinnishRadish knows. Sean.hoyland (talk) 13:04, 24 June 2024 (UTC)
 * I’m going with an abundance of caution and will sent them through mail for now, if you find a categorisation error or two, I would have no issue with you doing with it what you wish (in line with policy).
 * If an admin later says that it was not appropriate, I would expect you to then delete the mail :) FortunateSons (talk) 13:23, 24 June 2024 (UTC)
 * No problem. Always happy to delete emails. Email received. Sean.hoyland (talk) 14:04, 24 June 2024 (UTC)
 * Perfect, if something productive comes of it, I would really appreciate a quick update either here or through mail. :) FortunateSons (talk) 14:05, 24 June 2024 (UTC)
 * I think I can probably see some of the blocked accounts...maybe... e.g. Sean.hoyland (talk) 13:11, 24 June 2024 (UTC)
 * FortunateSons (talk) 13:51, 24 June 2024 (UTC)