Data breach

A data breach, also known as data leakage, is "the unauthorized exposure, disclosure, or loss of personal information".

Attackers have a variety of motives, from financial gain to political activism, political repression, and espionage. There are several technical root causes of data breaches, including accidental or intentional disclosure of information by insiders, loss or theft of unencrypted devices, hacking into a system by exploiting software vulnerabilities, and social engineering attacks such as phishing where insiders are tricked into disclosing information. Although prevention efforts by the company holding the data can reduce the risk of data breach, it cannot bring it to zero.

The first reported breach was in 2002 and the number occurring each year has grown since then. A large number of data breaches are never detected. If a breach is made known to the company holding the data, post-breach efforts commonly include containing the breach, investigating its scope and cause, and notifications to people whose records were compromised, as required by law in many jurisdictions. Law enforcement agencies may investigate breaches, although the hackers responsible are rarely caught.

Many criminals sell data obtained in breaches on the dark web. Thus, people whose personal data was compromised are at elevated risk of identity theft for years afterwards and a significant number will become victims of this crime. Data breach notification laws in many jurisdictions, including all states of the United States and European Union member states, require the notification of people whose data has been breached. Lawsuits against the company that was breached are common, although few victims receive money from them. There is little empirical evidence of economic harm to firms from breaches except the direct cost, although there is some evidence suggesting a temporary, short-term decline in stock price.

Definition
A data breach is a violation of "organizational, regulatory, legislative or contractual" law or policy that causes "the unauthorized exposure, disclosure, or loss of personal information". Legal and contractual definitions vary. Some researchers include other types of information, for example intellectual property or classified information. However, companies mostly disclose breaches because it is required by law, and only personal information is covered by data breach notification laws.

Prevalence
The first reported data breach occurred on 5 April 2002 when 250,000 social security numbers collected by the State of California were stolen from a data center. Before the widespread adoption of data breach notification laws around 2005, the prevalence of data breaches is difficult to determine. Even afterwards, statistics per year cannot be relied on because data breaches may be reported years after they occurred, or not reported at all. Nevertheless, the statistics show a continued increase in the number and severity of data breaches that continues. In 2016, researcher Sasha Romanosky estimated that data breaches (excluding phishing) outnumbered other security breaches by a factor of four.

Perpetrators
According to a 2020 estimate, 55 percent of data breaches were caused by organized crime, 10 percent by system administrators, 10 percent by end users such as customers or employees, and 10 percent by states or state-affiliated actors. Opportunistic criminals may cause data breaches—often using malware or social engineering attacks, but they will typically move on if the security is above average. More organized criminals have more resources and are more focused in their targeting of particular data. Both of them sell the information they obtain for financial gain. Another source of data breaches are politically motivated hackers, for example Anonymous, that target particular objectives. State-sponsored hackers target either citizens of their country or foreign entities, for such purposes as political repression and espionage. Often they use undisclosed zero-day vulnerabilities for which the hackers are paid large sums of money. The Pegasus spyware—a no-click malware developed by the Israeli company NSO Group that can be installed on most cellphones and spies on the users' activity—has drawn attention both for use against criminals such as drug kingpin El Chapo as well as political dissidents, facilitating the murder of Jamal Khashoggi.

Technical causes
Despite developers' goal of delivering a product that works entirely as intended, virtually all software and hardware contains bugs. If a bug creates a security risk, it is called a vulnerability. Patches are often released to fix identified vulnerabilities, but those that remain unknown (zero days) as well as those that have not been patched are still liable for exploitation. Both software written by the target of the breach and third party software used by them are vulnerable to attack. The software vendor is rarely legally liable for the cost of breaches, thus creating an incentive to make cheaper but less secure software.

Vulnerabilities vary in their ability to be exploited by malicious actors. The most valuable allow the attacker to inject and run their own code (called malware), without the user being aware of it. Some malware is downloaded by users via clicking on a malicious link, but it is also possible for malicious web applications to download malware just from visiting the website (drive-by download). Keyloggers, a type of malware that records a user's keystrokes, are often used in data breaches. The majority of data breaches could have been averted by storing all sensitive information in an encrypted format. That way, physical possession of the storage device or access to encrypted information is useless unless the attacker has the encryption key. Hashing is also a good solution for keeping passwords safe from brute-force attacks, but only if the algorithm is sufficiently secure.

Many data breaches occur on the hardware operated by a partner of the organization targeted—including the 2013 Target data breach and 2014 JPMorgan Chase data breach. Outsourcing work to a third party leads to a risk of data breach if that company has lower security standards; in particular, small companies often lack the resources to take as many security precautions. As a result, outsourcing agreements often include security guarantees and provisions for what happens in the event of a data breach.

Human causes
Human causes of breach are often based on trust of another actor that turns out to be malicious. Social engineering attacks rely on tricking an insider into doing something that compromises the system's security, such as revealing a password or clicking a link to download malware. Data breaches may also be deliberately caused by insiders. One type of social engineering, phishing, obtains a user's credentials by sending them a malicious message impersonating a legitimate entity, such as a bank, and getting the user to enter their credentials onto a malicious website controlled by the cybercriminal. Two-factor authentication can prevent the malicious actor from using the credentials. Training employees to recognize social engineering is another common strategy.

Another source of breaches is accidental disclosure of information, for example publishing information that should be kept private. With the increase in remote work and bring your own device policies, large amounts of corporate data is stored on personal devices of employees. Via carelessness or disregard of company security policies, these devices can be lost or stolen. Technical solutions can prevent many causes of human error, such as encrypting all sensitive data, preventing employees from using insecure passwords, installing antivirus software to prevent malware, and implementing a robust patching system to ensure that all devices are kept up to date.

Prevention
Although attention to security can reduce the risk of data breach, it cannot bring it to zero. Security is not the only priority of organizations, and an attempt to achieve perfect security would make the technology unusable. Many companies hire a chief information security officer (CISO) to oversee the company's information security strategy. To obtain information about potential threats, security professionals will network with each other and share information with other organizations facing similar threats. Defense measures can include an updated incident response strategy, contracts with digital forensics firms that could investigate a breach, cyber insurance, and monitoring the dark web for stolen credentials of employees. In 2024, the United States National Institute of Standards and Technology (NIST) issued a special publication, "Data Confidentiality: Identifying and Protecting Assets Against Data Breaches". The NIST Cybersecurity Framework also contains information about data protection. Other organizations have released different standards for data protection.

The architecture of a company's systems plays a key role in deterring attackers. Daswani and Elbayadi recommend having only one means of authentication, avoiding redundant systems, and making the most secure setting default. Defense in depth and distributed privilege (requiring multiple authentications to execute an operation) also can make a system more difficult to hack. Giving employees and software the least amount of access necessary to fulfill their functions (principle of least privilege) limits the likelihood and damage of breaches. Several data breaches were enabled by reliance on security by obscurity; the victims had put access credentials in publicly accessible files. Nevertheless, prioritizing ease of use is also important because otherwise users might circumvent the security systems. Rigorous software testing, including penetration testing, can reduce software vulnerabilities, and must be performed prior to each release even if the company is using a continuous integration/continuous deployment model where new versions are constantly being rolled out.

The principle of least persistence—avoiding the collection of data that is not necessary and destruction of data that is no longer necessary—can mitigate the harm from breaches. The challenge is that destroying data can be more complex with modern database systems.

Response
A large number of data breaches are never detected. Of those that are, most breaches are detected by third parties; others are detected by employees or automated systems. Responding to breaches is often the responsibility of a dedicated computer security incident response team, often including technical experts, public relations, and legal counsel. Many companies do not have sufficient expertise in-house, and subcontract some of these roles; often, these outside resources are provided by the cyber insurance policy. After a data breach becomes known to the company, the next steps typically include confirming it occurred, notifying the response team, and attempting to contain the damage.

To stop exfiltration of data, common strategies include shutting down affected servers, taking them offline, patching the vulnerability, and rebuilding. Once the exact way that the data was compromised is identified, there is typically only one or two technical vulnerabilities that need to be addressed in order to contain the breach and prevent it from reoccurring. A penetration test can then verify that the fix is working as expected. If malware is involved, the organization must investigate and close all infiltration and exfiltration vectors, as well as locate and remove all malware from its systems. If data was posted on the dark web, companies may attempt to have it taken down. Containing the breach can compromise investigation, and some tactics (such as shutting down servers) can violate the company's contractual obligations.

Gathering data about the breach can facilitate later litigation or criminal prosecution, but only if the data is gathered according to legal standards and the chain of custody is maintained. Database forensics can narrow down the records involved, limiting the scope of the incident. Extensive investigation may be undertaken, which can be even more expensive than litigation. In the United States, breaches may be investigated by government agencies such as the Office for Civil Rights, the United States Department of Health and Human Services, and the Federal Trade Commission (FTC). Law enforcement agencies may investigate breaches although the hackers responsible are rarely caught.

Notifications are typically sent out as required by law. Many companies offer free credit monitoring to people affected by a data breach, although only around 5 percent of those eligible take advantage of the service. Issuing new credit cards to consumers, although expensive, is an effective strategy to reduce the risk of credit card fraud. Companies try to restore trust in their business operations and take steps to prevent a breach from reoccurring.

For consumers
After a data breach, criminals make money by selling data, such as usernames, passwords, social media or customer loyalty account information, debit and credit card numbers, and personal health information (see medical data breach). Criminals often sell this data on the dark web—parts of the internet where it is difficult to trace users and illicit activity is widespread—using platforms like .onion or I2P. Originating in the 2000s, the dark web, followed by untraceable cryptocurrencies such as Bitcoin in the 2010s, made it possible for criminals to sell data obtained in breaches with minimal risk of getting caught, facilitating an increase in hacking. One popular darknet marketplace, Silk Road, was shut down in 2013 and its operators arrested, but several other marketplaces emerged in its place. Telegram is also a popular forum for illegal sales of data.

This information may be used for a variety of purposes, such as spamming, obtaining products with a victim's loyalty or payment information, identity theft, prescription drug fraud, or insurance fraud. The threat of data breach or revealing information obtained in a data breach can be used for extortion.

Consumers may suffer various forms of tangible or intangible harm from the theft of their personal data, or not notice any harm. A significant portion of those affected by a data breach become victims of identity theft. A person's identifying information often circulates on the dark web for years, causing an increased risk of identity theft regardless of remediation efforts. Even if a customer does not end up footing the bill for credit card fraud or identity theft, they have to spend time resolving the situation. Intangible harms include doxxing (publicly revealing someone's personal information), for example medication usage or personal photos.

For organizations
There is little empirical evidence of economic harm from breaches except the direct cost, although there is some evidence suggesting a temporary, short-term decline in stock price. Other impacts on the company can range from lost business, reduced employee productivity due to systems being offline or personnel redirected to working on the breach, resignation or firing of senior executives, reputational damage, and increasing the future cost of auditing or security. Consumer losses from a breach are usually a negative externality for the business. Some experts have argued that the evidence suggests there is not enough direct costs or reputational damage from data breaches to sufficiently incentivize their prevention.

Estimating the cost of data breaches is difficult, both because not all breaches are reported and also because calculating the impact of breaches in financial terms is not straightforward. There are multiple ways of calculating the cost to businesses, especially when it comes to personnel time dedicated to dealing with the breach. Author Kevvie Fowler estimates that more than half the direct cost incurred by companies is in the form of litigation expenses and services provided to affected individuals, with the remaining cost split between notification and detection, including forensics and investigation. He argues that these costs are reduced if the organization has invested in security prior to the breach or has previous experience with breaches. The more data records involved, the more expensive a breach typically will be. In 2016, researcher Sasha Romanosky estimated that while the mean breach cost around the targeted firm $5 million, this figure was inflated by a few highly expensive breaches, and the typical data breach was much less costly, around $200,000. Romanosky estimated the total annual cost to corporations in the United States to be around $10 billion.

Notification
The law regarding data breaches is often found in legislation to protect privacy more generally, and is dominated by provisions mandating notification when breaches occur. Laws differ greatly in how breaches are defined, what type of information is protected, the deadline for notification, and who has standing to sue if the law is violated. Notification laws increase transparency and provide a reputational incentive for companies to reduce breaches. The cost of notifying the breach can be high if many people were affected and is incurred regardless of the company's responsibility, so it can function like a strict liability fine.

, Thomas on Data Breach listed 62 United Nations member states that are covered by data breach notification laws. Some other countries require breach notification in more general data protection laws. Shortly after the first reported data breach in April 2002, California passed a law requiring notification when an individual's personal information was breached. In the United States, notification laws proliferated after the February 2005 ChoicePoint data breach, widely publicized in part because of the large number of people affected (more than 140,000) and also because of outrage that the company initially informed only affected people in California. In 2018, the European Union's General Data Protection Regulation (GDPR) took effect. The GDPR requires notification within 72 hours, with very high fines possible for large companies not in compliance. This regulation also stimulated the tightening of data privacy laws elsewhere. , the only United States federal law requiring notification for data breaches is limited to medical data regulated under HIPAA, but all 50 states (since Alabama passed a law in 2018) have their own general data breach notification laws.

Security safeguards
Measures to protect data from a breach are typically absent from the law or vague. Filling this gap is standards required by cyber insurance, which is held by most large companies and functions as de facto regulation. Of the laws that do exist, there are two main approaches—one that prescribes specific standards to follow, and the reasonableness approach. The former is rarely used due to a lack of flexibility and reluctance of legislators to arbitrate technical issues; with the latter approach, the law is vague but specific standards can emerge from case law. Companies often prefer the standards approach for providing greater legal certainty, but they might check all the boxes without providing a secure product. An additional flaw is that the laws are poorly enforced, with penalties often much less than the cost of a breach, and many companies do not follow them.

Litigation
Many class-action lawsuits, derivative suits, and other litigation have been brought after data breaches. They are often settled regardless of the merits of the case due to the high cost of litigation. Even if a settlement is paid, few affected consumers receive any money as it usually is only cents to a few dollars per victim. Legal scholars Daniel J. Solove and Woodrow Hartzog argue that "Litigation has increased the costs of data breaches but has accomplished little else." Plaintiffs often struggle to prove that they suffered harm from a data breach. The contribution of a company's actions to a data breach varies, and likewise the liability for the damage resulting for data breaches is a contested matter. It is disputed what standard should be applied, whether it is strict liability, negligence, or something else.