User talk:Tzusheng/sandbox/Wikipedia:Wikibench/Entity:Diff/719347634/719359416

Damage
@PriusGod @Actualcpscm I'd like to hear more about your rationales for why this edit is damaging. Overlinking isn't obviously vandalism or damaging to the article. IMO labelling overlinking as damaging would cause an AI model detecting vandalism to be oversensitive to minor edits like this. —*Fehufangą (✉ Talk · ✎ Contribs) 06:39, 2 July 2023 (UTC)


 * In my interpretation, the purpose of the "damaging/not damaging" part of the label is not to characterize the diff as vandalism or not, but simply that it causes or exacerbates a policy issue, no matter how small - i.e. here, with the policy regarding overlinking. You can see a previous discussion here regarding what how exactly we seek to define "damage," there are some interesting points and I'd like to see your interpretation.
 * If I understand the goal of the project correctly (and if I'm totally off the mark,, do let me know), we are here to generate labels not for training purposes, but simply to evaluate the gap (if it exists) between the opinions of models like ORES and Liftwing, and the community's interpretations of what is harmful and how to examine the intent behind edits. With a better, data-driven understanding of that gap, we would be able to provide a more complete understanding of how editors should apply those models' interpretations. PriusGod (talk) 07:33, 2 July 2023 (UTC)
 * I think it is perfectly fine to have differing points of views in this exercise. It shows that we all have different interpretations and experiences when comes to applying guidelines and norms for such borderline cases. – robertsky (talk) 07:39, 2 July 2023 (UTC)
 * 100% agree - a healthy amount of differences of opinion (in the right places!) is the foundation of establishing positive consensus.
 * Also, just as a general note to any reader - I know I keep using the word "policy" for my definition of "damaging/not damaging," but I mean it in a more general sense - maybe a better way for me to describe my interpretation of "damaging" would be "anything that goes against the strongest available consensus that governs the change." This would include deviations from guidelines or even widely cited essays, but not controversial-but-well-supported changes to the status quo, as the latter would be hard to strongly characterize as damaging unless they become part of an edit war. PriusGod (talk) 07:48, 2 July 2023 (UTC)
 * There is a majority for “not damaging” so I changed the primary label. I hope you’re fine with it. TenWhile6 (talk | SWMT) 09:00, 2 July 2023 (UTC)
 * Thanks @Fehufanga, @PriusGod, @Robertsky for the discussion and @TenWhile6 for bold editing! Yes, Wikibench aims to facilitate the curation of Wikipedians' labels so that we may compare them to AI's prediction and identify the gap between AI and the community's consensus. A better understanding of the gap can be helpful in various ways. For example, it helps us understand where AI works better or worse so that we may use AI more confidently or cautiously in those cases. It also helps compare two AI systems (e.g., ORES v.s. LiftWing) when we aim to choose the better one for deployment. On the other hand, having differing viewpoints in the labeling process is perfectly fine, as @Robertsky mentioned. In fact, this is another goal of Wikibench, which aims to surface cases where Wikipedians have not reached a consensus yet, even though we might think we did. The discussions about these cases can hopefully help develop clearer definitions, policies, or guidelines for making judgements and taking actions on Wikipedia. I hope this makes sense! Tzusheng (talk) 14:50, 2 July 2023 (UTC)
 * I think a big issue here is that the binary classification between harmful vs not is not sufficiently granular to distinguish between minor harms and vandalism. The confidence scale gets overloaded to represent both degree of harm and likelihood of harm in this case. Alpha3031 (t • c) 15:10, 3 July 2023 (UTC)
 * I know I’m a little late to the party, but I don’t think this should be labeled as harmful. Yes, it is overlooking, but it isn’t “damaging”, merely misguided. - 🔥𝑰𝒍𝒍𝒖𝒔𝒊𝒐𝒏 𝑭𝒍𝒂𝒎𝒆 (𝒕𝒂𝒍𝒌)🔥 18:14, 12 July 2023 (UTC)
 * Yeah, TenWhile6 edited the primary label, which is the one used by the software - the other user labels are more for tracking other users' opinions, which might inform discussion about the harm of the edit. PriusGod (talk) 18:22, 12 July 2023 (UTC)

The primary label has been edited
TenWhile6 edited the primary label that you previously submitted. If you disagree with the change, please kindly engage in a discussion on this talk page and consider seeking a third opinion if needed. TenWhile6 (talk | SWMT) 08:58, 2 July 2023 (UTC)