Talk:Data breach/GA1

GA Review
The edit link for this section can be used to add comments to the review.''

Nominator: 20:47, 26 March 2024 (UTC)

Reviewer: Chipmunkdavis (talk · contribs) 14:07, 31 March 2024 (UTC)

Starting to look at this one. Initial impression is that it's surprisingly short, given what feels a large topic. Will be looking at broadness rather than comprehensiveness, so just an initial note. Another first impression is that it seems written from a US-centric perspective (and the 2005 date here contradicts the linked Security breach notification laws, have not yet checked sources to dig into this). The sources look modern and reliable, the images are all PD and created by the nominator. This was a recent complete overhaul so stability is not immediately obvious, although it seems a clear improvement on the previous version and the overhaul was performed by another editor. I likely have limited to no access to the majority of these sources, but will have a more detailed look at this later. CMD (talk) 14:07, 31 March 2024 (UTC)


 * I can send pdfs of the sources if you want. (t &#183; c)  buidhe  16:42, 31 March 2024 (UTC)
 * Buidhe Thank you for the offer, perhaps if you could share Fowler 2016 and Solove & Hartzog 2022 that would be very helpful. CMD (talk) 09:23, 1 April 2024 (UTC)
 * CMD, Buidhe, any further progress on this review? Ideally it should be wrapped up pretty soon. —Ganesha811 (talk) 15:42, 14 April 2024 (UTC)
 * I sent CMD the materials but he seems to be busy right now. It's not a huge rush (t &#183; c)  buidhe  00:47, 15 April 2024 (UTC)

Apologies for the delay, took awhile to figure out the article and its sources. Thank you for the materials, used for some of the spot-checks and other notes below:

Definition History and prevalence
 * "Since the advent of data breach notification laws in 2005, reported data breaches have grown dramatically." is a very odd second sentence. It feels like referring to a national set of laws? Further, is the reporting linked to the laws, or does it simply reflect the internet becoming more important and widely used? Is 2005 worth emphasising at this point in the lead?
 * Rewrote
 * "Data breaches are most commonly caused either by a targeted cyberattack, an opportunistic attack, or inadvertent information leakage" could use some more explicit expansion in the body, it is mostly implied.
 * Rewrote
 * "Data breaches are most commonly caused either by a targeted cyberattack, an opportunistic attack, or inadvertent information leakage." This seems an odd collection of items. None are used elsewhere in the article, and I'm not intuiting a significant difference between a targeted and opportunistic attack.
 * The difference between opportunistic and targeted attacks is covered in the perpetrator section—essentially when any target will do versus when the attacker wants to attack a particular system. However, I removed from the lead as potentially unnecessary.
 * "...including accidental disclosure of information" does not seem different from the earlier "inadvertent information leakage"
 * Rewrote
 * Regarding long-term risk as covered in "people whose data was compromised are at elevated risk of identity theft for years afterwards and a significant number will become victims of this crime", this will depend on the data being leaked, the risk is mostly elevated only with PII data. (A leak of hashed/salted passwords is a data breach, but not helpful for identify theft.) It is perhaps worth specifically mentioning it is Personal/PII data.
 * A leak of hashed passwords would not qualify as a data breach by most definitions, but I've edited to clarify.
 * The note that definitions vary should perhaps be placed before the giving of a specific definition.
 * I wrote it like this because the definition initially given is essentially the one used by the sources and covering the article's scope, although exact details may vary.
 * How does the note on company disclosure fit within this section?
 * While some definitions include other information, the laws (entirely) and sources (almost entirely) discuss breaches of personal information. I put this in the definition section to hint at the article's actual content and scope without going into OR.
 * Data breach notification laws notes laws were first put into place in 2002, although perhaps a different nuance/focus of the laws. Looking into the source, 2005 does appear to match the "widespread" part of your text. The text should be adjusted however to note the US-focus, being widespread only in that country.
 * After more research, I think the source is wrong and I'm fixing it accordingly.
 * More information on other countries seems like it is needed here, GDPR is mentioned in a later section but surely it would fit into history?
 * Moved
 * Speaking also to that source, it places emphasis on California as the leading state. Given the prominence California played and still plays in tech, it is probably due a mention in the second paragraph where the "legislatures around the United States" history is given.
 * Done
 * "In 2016, researcher Sasha Romanosky estimated that data breaches outnumbered other security breaches by a factor of four." Noting that in the source this excludes phishing, which is included as a data breach cause in a later section.
 * Clarified the definition he is using.
 * "In the 2000s, the dark web—parts of the internet where it is difficult to trace users and illicit activity is widespread—began to be set up, increasing in the 2010s with the advent of untraceable cryptocurrencies such as Bitcoin. Information obtained in data breaches is often offered for sale there." While this is all true, it seems incomplete to jump to this without noting that this data can also be available on the regular internet and its many forums. The source does mention illicit markeplaces in general before focusing on the SilkRoad/darkweb. (The Silk Road (marketplace) is actually probably due an explicit mention, although this is slightly beyond GACR consideration.)
 * I checked multiple sources and none of them said much of anything about non-dark web forums for selling data. However, I revised the text to avoid any implication that the dark web is the only forum for sale.
 * Similarly, there are other platforms for information sharing beyond the dark web these days, Telegram (software) is a common one and it feels the article could mention something about this.
 * Even specifically searching for Telegram's use regarding data breaches I cannot find much of anything, so I think it is likely UNDUE.
 * The mention of ransomware feels off-topic.
 * Maybe I could be clearer on this, but ransomware can qualify as a data breach according to many definitions, because it results in "the unauthorized... loss of personal information". It is definitely covered by multiple sources as a type of data breach.


 * Regarding this section I almost think it would be better to narrow it to just "prevalence" or split it to different parts of the article, so we wouldn't have the legal information or technologies used to perpetrate breaches in two different places. Buidhe paid (talk) 17:38, 22 April 2024 (UTC)

Perpretrators Causes Breach lifecycle Consequences Laws On a general note, there are various examples of breaches scattered throughout the text. It is a surprise that Pegasus (spyware) is not one of them, given its prominence. Overall, the writing of the current text gives off an odd feeling, like a textbook rather than an encyclopaedia. Some specific mentions of this made above, but it's a noticeable tone throughout. At the same time, many areas feel truncated, touching only obliquely on that their topic of discussion is. There is some room for expansion in the body and the lead. Much of the information is US-focused. Ideally it should be globalized for comprehensiveness, however at the very least the facts and figures constrained to the US should clearly reflect this in their presentation in the article. It would be good to get a picture on how closely these items reflect the whole sources corpus used. No issues specifically found for NPOV, stability, or image licensing. Best, CMD (talk) 13:33, 21 April 2024 (UTC)
 * The second paragraph of Perpretrators seems to be more about Consequences? I am also unsure if the methods of communication are relevant to this article.
 * Moved to consequences
 * "The threat of data breach or revealing information obtained in a data breach can be used for extortion, often using ransomware technology (where the criminal demands a payment in exchange for not activating malicious software)." I am not sure about this link. As with the above mention of ransomeware, ransomware is usually a process where a hacker will encrypt data, it is not a function of data leaks but of other security challenges. As an aside, for this source, the page is given as 14, but it seems to be citing 13.
 * Removed.
 * It is probably worth noting given the first point in Causes is about encryption, that phishing and other methods get around encryption. It should probably be clear that the non-encryption data breaches are the low-level opportunistic breaches, and that encyrption is just one interlinked tool with the other causes. Also worth noting that the wording seems targeted only at symmetric cryptography, asymmetric would have two keys, one for encryption and one for decryption.
 * Added mention of hashing
 * Relatedly, the chart in this section are different to the text. Neither is incorrect, the text instead needs to be more holistic. The note at the end about security and zero risk should be expanded upon near the start, highlighting that breaches are a function of there being data at all and that security is about a layered approach of risk management, rather than implying there are clear distinct causes for breaches.
 * Revised the text to not use bullet points, I hope this is what you meant by more holistic. I also added a mention of defense in depth.
 * Regarding the second point on phising, phishing is a subset of Social engineering (security), and it would be better to discuss that broader concept rather than focusing on just the one method.
 * Done
 * The note on hiring a CISO is another odd text-book style piece of writing that feels like a diversion from the topic. It basically says that to have security, one should invest in security, which feels like a self-explanatory point. I think that this section on prevention is trying to get at the concept that firms often underpriotize security relative to what they perhaps could (should?), but that does not come through in the language which reads more as a presecriptive guide to companies.
 * Rewrote
 * Perhaps one way this section could be less-textbook is by noting that there are industry standards/frameworks for data breach response, and work from that framing. Presumably this (eg. NIST compliance) is the expertise provided by the outsourcing. NIST and a couple of other examples are mentioned in Fowler 2016 pg 51 and 210.
 * Done
 * The sentence on the balance of funding vs outcomes is also I feel trying to get at the broader concept of risk management, and how simple steps cause substantial initial rises in security but there are diminishing returns of unit security per unit investment.
 * It seems intuitive that there would be diminishing returns, but I can't find sources that put it that way.
 * The other advice for prevention on paranoia and proactive action also reads as a how-to. I would also suggest that this is well-known within the cybersecurity field, and does not need specific in-text attribution.
 * I actually ended up removing most of it per your comments above.
 * The standalone sentence on data non-collection/deletion is perhaps worth expanding on, but that is perhaps beyond strict GA needs.
 * Expanded a bit
 * "A penetration test can then verify that the fix is working as expected" in the Response section feels explictly going back to the Prevention section, where pen testing is already mentioned.
 * Penetration test verifies that the system is secure, thus it has a role in both prevention and recovery from attacks.
 * "After the breach is fully contained, the company can then work on restoring all systems to operational" seems another sentence that fits in a how-to guide but doesn't seem necessary for an encyclopaedia.
 * Removed
 * I'm having trouble interpreting the graph in Consequences. The dots seem to be aligned as they would in a bar chart by the 2023 number. The 2022 number is in the graph text but doesn't seem represented. I'm assuming circle size is a second indication of scale? And I'm not sure what the lines are trying to tell me. Not strictly a GA issue, but perhaps it could be looked into. A better caption would help a lot here, especially as it is providing definite figures in a section saying the figures are tricky to calculate.
 * I only added the graph because Chris Troutman when implementing my edit decided to keep another graph that was even worse. I couldn't find any really good graphs that seemed informative, so I would be happy to remove it.
 * "Due to increased remediation efforts in the United States after 2014, this risk decreased significantly from one in three to one in seven." This felt unlikely as a statistic, far too high. Checking the source, this appears to be specifically credit card data breaches, rather than data breaches as a whole.
 * Removed this stat as perhaps overly specific.
 * "Measures to protect data from a breach are typically absent from the law or vague, in contrast to the more concrete requirements found in cybersecurity law." I'm not seeing the distinction here, wouldn't data breach laws fall under cybersecurity? It is a contrast to non-data-related items in those laws, or other laws? (I am unable to figure out which part of the source this stems from.)
 * I'm not entirely sure what they mean by this, so I removed the clause.
 * "Beginning with California in 2005, all 50 states have passed their own general data breach notification laws", perhaps also worth mentioning Alabama was the last in 2018 (same source). Further, perhaps this timeline should be in history and a simple summary noting the current current position should be here?
 * Partly done, although I wonder if the history section should be split as I proposed above.
 * Added mention of Pegasus
 * Unfortunately, a lot of the overview sources I could find are distinctly US focused, so it's hard to ensure global coverage. Buidhe paid (talk) 06:41, 23 April 2024 (UTC)

Break

 * Thanks for addressing or explaining all my points above, I'm impressed how much the article has changed. Summaries and some follow-ups below:
 * Lead is reading well now.
 * Surprised you can't find anything on Telegram, it's getting its own dedicated papers. Perhaps it is still rather new and has not percolated to more general sourcing, so if you feel it is undue compared to everything else in current sourcing that is fine by me. Something to watch out for though.
 * Added mention based on this source
 * I've looked a bit more into this assertion about Ransomeware, and would be interested in some explanation. Forgive the non-scholarly sources, but I'm used to distinctions such as this one between the two. However, reading up such as in here, perhaps I am behind the times and the lines between them are blurred as Ransomeware attacks now sometimes (often? usually?) include data extraction. I do think the article needs clarity on this point to show the relevance, "a type of malware that encrypts data storage" does not fit in with the definition given on the page.
 * Per the definition cited in the article, data breach includes the loss of data which could be caused by ransomware. Data exfiltration is not synonymous with a breach. However, it seems like different laws and /or sources may use different definitions when it comes to ransomware that does not exfiltrate data (although if you can modify data it's usually possible to access it.) I think I will remove this sentence because it could be confusing or contradict other sources.
 * Reading the reformulated history and prevalence section, it still feels like a good idea to include a general note from California to Alabama that might bridge the gap before GDPR. Regarding your idea of splitting, I can see how it might work in that case too. If you do split, a simple note of data breaches increasing over time would suffice to frame the legal development if you want to focus on prevalence separate to that.
 * Done
 * On the cause section, there is still a bit of disconnect between the image and the text. The image causes seem to focus heavily on different types of data storage, which is not covered in the text. Looking at the image on its own, it's not immediately clear why say "Portable device" is different to "Physical loss". My initial guess is that it might be a statement on the security of those devices, but Hacking/malware is a separate category too.
 * After looking for an alternate image, I recognize that different credible sources have a wide variety of different graphs, probably resulting from different methodologies and/or definitions and changes over time or by industry. I have removed the graph because it could mislead readers into thinking it has more validity beyond the sample examined by the authors than it really does.
 * "The software vendor is not legally liable for the cost of breaches, thus creating an incentive to make cheaper but less secure software", would help to specify if this is a particular jurisdiction. If it is the US, then perhaps wording such as "Even in the US..." would help while also linking it to the legal history already covered and thus providing implications that there is less elsewhere.
 * Looking deeper, it seems that this detail may vary by jurisdiction, although the source doesn't specify. Reworded to "rarely", because even in jurisdictions where product liability may apply, it rarely does.
 * I really like this split into technical and human causes, including the linking of the two at the end. The rewrites in Breach lifecycle were also very effective.
 * I would remove the graph in Consequences for now, although if remade into a simpler bar graph it may be quite effective.
 * Done
 * CMD (talk) 12:28, 29 April 2024 (UTC)
 * The new rearrangements seem sensible, 1a and 1b passed. 2a and 2b clearly passed, and spotchecks earlier revealed no notable problems regarding 2c and 2d. Seems as broad as the current sourcing allows, and I didn't find anything else major and high quality in a quick look. Tangents handled above, so 3a and 3b met. Article seems neutral (4) and stable (5). Hopefully new images can be generated in the future, but not a blocker here if they don't exist for 6a and 6b, the current graph image is claimed as copyright-ineligible which seems plausible. Best of luck with ongoing work, passing, CMD (talk) 11:52, 5 May 2024 (UTC)