User:Thisismyusername31/sandbox

Data sanitization
Data sanitization involves the secure and permanent erasure of sensitive data from datasets and devices to guarantee that there remains no residual data that can be recovered even through extensive forensic analysis. Data sanitization has a wide range of applications but it is mainly used for clearing out old personal electronic devices or for the sharing and use of large datasets that contain sensitive information. The main strategies for erasing personal data from devices are physical destruction, cryptographic erasure, and data erasure. Data sanitization methods are also applied for the cleaning of sensitive data, such as through heuristic based methods, machine learning based methods, and k-source anonymity.

This erasure is necessary as an increasing amount of data is moving to online storage, which poses a privacy risk in the situation that the old device is resold to another individual. The importance for data sanitization has risen in recent years as private information is increasingly stored in an electronic format and larger, more complex datasets are being utilized to distribute private information. Electronic storage has expanded and enabled more private data to be stored and therefore requires more advanced and thorough data sanitizaiton techniques to ensure that no data is left on the device once it is no longer in use. Technological tools that enable the transfer of large amounts of data also allow more private data to be shared. Especially with the increasing popularity of cloud-based information sharing and storage, data sanitization methods that ensure that all data shared is cleaned has become a major concern.

Clearing devices
The main use of data sanitization is for the complete clearing of devices and destruction of all sensitive data once the device is no longer in use. This is an important stage in Information Lifecycle Management (ILM), an approach for ensuring privacy and data management throughout the usage of an electronic device, as it ensures that all data is completely destroyed and unrecoverable when devices reach the end of their lifecycle.

There are three main methods of data sanitization for complete erasure of data: physical destruction, cryptographic erasure, and data erasure. All three erasure methods aim to ensure that deleted data cannot be accessed even through advanced forensic methods, which maintains the privacy of individuals’ data even after the mobile device is no longer in use.

Physical destruction

Physical erasure involves the manual destruction of stored data. This method uses mechanical shredders or degaussers to shred devices, such as phones, computers, hard drives, and printers into small individual pieces.

Degaussing is most commonly used on solid-state drives (SSDs), such as hard disk drives (HDDs), and involves the utilization of high energy magnetic fields to permanently disrupt the functionality and memory storage of the device. When data is exposed to this strong magnetic field, any memory storage is neutralized and can not be recovered or used again.

Physical destruction often ensures that data is completely erased and cannot be used again. However, the physical byproducts of mechanical waste from mechanical shredding can be damaging to the environment. Furthermore, once data is physically destroyed, it can no longer be resold or used again.

Cryptographic erasure

Cryptographic erasure involves the destruction of the secure key, or passphrase, that is used to protect stored information. Data encryption involves the development of a secure key that only enables authorized parties to gain access to the data that is stored. The permanent erasure of this key ensures that the private data stored can no longer be accessed. Cryptographic erasure is commonly installed through manufactures of the device itself as encryption software is often built into the device. Encryption with key erasure involves encrypting all sensitive material in a way that requires a secure key to decrypt the information when it needs to be used. When the information needs to be deleted, the secure key can be erased. This provides a higher ease of use than other software methods because it involves one deletion of secure information rather than each individual file.

Cryptographic erasure is often used for data storage that does not contain as much private information since there is a possibility that errors can occur due to manufacturing failures or human error during the process of key destruction. This creates a wider range of possible results of data erasure. This method allows for data to continue to be stored on the device and does not require that the device be completely erased. This way, the device can be resold again to another individual or company since the physical integrity of the device itself is maintained.

Data erasure

The process of data erasure involves masking all information at the byte level through the insertion of random 0s and 1s in on all sectors of the electronic equipment that is no longer in use. This software based method ensures that all data previous stored is completely hidden and unrecoverable, which ensures full data sanitization. The efficacy and accuracy of this sanitization method can also be analyzed through audit-able reports.

This method ensures complete sanitization while also maintaining the physical integrity of the electronic equipment so that the technology can be resold or reused. This ability to recycle technological devices makes data erasure a more environmentally sound version of data sanitization. This method is also the most accurate and comprehensive since the efficacy of the data masking can be tested afterwards to ensure complete deletion. However, data erasure through software based mechanisms requires more time compared to other methods.

Necessity of data sanitization
There are been increased usage of mobile devices, Internet of Things (IoT) technologies, cloud-based storage systems, portable electronic devices, and various other electronic methods to store sensitive information, therefore implementing effective erasure methods once the device is not longer in use has become crucial to protect sensitive data. Due to the increased usage of electronic devices in general and the increased storage of private information on these electronic devices, the need for data sanitization has been much more urgent in recent years.

Applications of data sanitization
Data sanitization methods are also implemented for privacy preserving data mining, association rule hiding, and blockchain-based secure information sharing. These methods involve the transfer and analysis of large datasets containing private information that needs to be sanitized before being made available online so that any sensitive material not left vulnerable. Data sanitization is used to ensure that the privacy is maintained in the dataset through the clearing of any sensitive information prior to its use.

Privacy preserving data mining
Privacy Preserving Data Mining (PPDM) has a wide range of uses and is an integral step in the transfer or use of any large data set. It is also commonly linked to blockchain-based secure information sharing within supply chain management systems.


 * 5G data
 * Internet of Things (IoT) technologies eg: Alexa, Google Home, etc.
 * Healthcare industry, using large datasets
 * Supply chain industry, usage of blockchain and optimal key generation

Privacy preserving data mining and data sanitization work in tandem to clear large datasets containing sensitive information so that it can be utilized by individuals or companies for analysis. The aim of privacy preserving data mining is to ensure that private information cannot be leaked or accessed by attackers and sensitive data is not traceable to individuals that have submitted the data. Privacy preserving data mining aims to maintain this level of privacy for individuals while also maintaining the integrity and functionality of the original dataset. In order for the dataset to be used, necessary aspects of the original data need to be protected during the process of data sanitization. This balance between privacy and utility has been the primary goal of data sanitization methods.

Certain models of data sanitization delete or add information to the original database in an effort to preserve the privacy of each subject. These heuristic based algorithms are beginning to become more popularized, especially in the field of association rule mining. Heuristic methods involve specific algorithms that use pattern hiding, rule hiding, and sequence hiding to keep specific information hidden. This type of data hiding can be used to cover wide patterns in data, but is not as effective for specific information protection. Heuristic based methods are not as suited to sanitizing large datasets, however, recent developments in the heuristics based field have analyzed ways to tackle this problem. An example includes the MR-OVnTSA approach, a  heuristics based sensitive pattern hiding approach for big data, introduced by Shivani Sharma and Durga Toshniwa. This approach uses a heuristics based method called the ‘MapReduce Based Optimum Victim Item and Transaction Selection Approach’, also called MR-OVnTSA, that aims to reduce the loss of important data while removing and hiding sensitive information. It takes advantage of algorithms that compare steps and optimize sanitization.

An important goal of PPDM is to strike a balance between maintaining the privacy of users that have submitted the data while also enabling developers to make full use of the dataset. Many measures of PPDM directly modify the dataset and create a new version that makes the original unrecoverable. It strictly erases any sensitive information and makes it inaccessible for attackers.

One type of data sanitization is rule based PPDM that uses defined computer algorithms to clean datasets. Association rule hiding is the process of data sanitization as applied to transactional databases. Transactional databases are the general term for data storage used to record transactions as organizations conduct their business. Examples include shipping payments, credit card payments, and sales orders. This source analyzes fifty four different methods of data sanitization and presents its four major findings of its trends

Certain new methods of data sanitization that rely on machine deep learning. There are various weaknesses in the current use of data sanitization. Many methods are not intricate or detailed enough to protect against more specific data attacks. This effort to maintain privacy while dating important data is referred to as privacy-preserving data mining. Machine learning develops methods that are more adapted to different types of attacks and can learn to face a broader range of situations. Deep learning is able to simplify the data sanitization methods and run these protective measures in a more efficient and less time consuming way.

There have also been hybrid models that utilize both rule based and machine deep learning methods to achieve a balance between the two techniques.

Blockchain-based secure information sharing

Browser backed cloud storage systems are heavily reliant on data sanitization and are becoming an increasingly popular route of data storage. Furthermore, the ease of usage is important for enterprises and workplaces that use cloud storage for communication and collaboration.

Blockchain is used to record and transfer information in a secure way and data sanitization techniques are required to ensure that this data is transferred more securely and accurately. It’s especially applicable for those working in supply chain management and may be useful for those looking to optimize the supply chain process. The need to improve blockchain methods is becoming increasingly relevant as the global level of development increases and becomes more electronically dependent.

Risks posed by inadequate sanitization
Inadequate data sanitization methods can result in two main problems: a breach of private information and compromises to the integrity of the original dataset. If data sanitization methods are unsuccessful at removing all sensitive information, it poses the risk of leaking this information to attackers. Numerous studies have been conducted to optimize ways of preserving sensitive information. Some methods of data sanitization have a high sensitivity to distinct points that have no closeness to data points. This type of data sanitization is very precise and can detect anomalies even if the poisoned data point is relatively close to true data. Another method of data sanitization is one that also removes outliers in data, but does so in a more general way. It detects the general trend of data and discards any data that strays and it’s able to target anomalies even when inserted as a group. In general, data sanitization techniques use algorithms to detect anomalies and remove any suspicious points that may be poisoned data or sensitive information.

Furthermore, data sanitization methods may remove useful, non-sensitive information, which then renders the sanitized dataset less useful and altered from the original. There have been iterations of common data sanitization techniques that attempt to correct the issue of the loss of original dataset integrity. In particular, Liu, Xuan, Wen, and Song offered a new algorithm for data sanitization called the Improved Minimum Sensitive Itemsets Conflict First Algorithm (IMSICF) method. There is often a lot of emphasis that is put into protecting the privacy of users, so this method brings a new perspective that focuses on also protecting the integrity of the data. It functions in a way that has three main advantages: it learns to optimize the process of sanitization by only cleaning the item with the highest conflict count, keeps parts of the dataset with highest utility, and also analyzes the conflict degree of the sensitive material. Robust research was conducted on the efficacy and usefulness of this new technique to reveal the ways that it can benefit in maintaining the integrity of the dataset. This new technique is able to firstly pinpoint the specific parts of the dataset that are possibly poisoned data and also use computer algorithms to make a calculation between the tradeoffs of how useful it is to decide if it should be removed. This is a new way of data sanitization that takes into account the utility of the data before it is immediately discarded.

Which article are you evaluating?
(Provide a link to the article here.) Information privacy

Why you have chosen this article to evaluate?
(Briefly explain why you chose it, why it matters, and what your preliminary impression of it was.)

I chose this article to read as a summary of informational privacy. Data privacy and protection has become an increasingly prevalent concern. The article provides a summary of the challenges of informational privacy by first outlining the different types of information and then going on to describe legal issues.

Evaluate the article
(Compose a detailed evaluation of the article here, considering each of the key aspects listed above. Consider the guiding questions, and check out the examples of what a useful Wikipedia article evaluation looks like.)

The lead section was a concise summary of what the article would cover and also addressed why the issue of informational privacy is so important. It presents the challenges associated with data protection and what other relevant fields are also involved in this issue. Furthermore, the content itself is helpful in describing informational privacy. The first part of the article outlines the different types of information and how each can deal with privacy issues. This is helpful because it allows for readers who don't have as much prior knowledge on informational privacy to understand the basics. However, the content does not go too detailed into each source of information which is helpful in keeping the article brief and engaging. The content is up to date and addresses modern issues. For example, in the legality section, it addresses the fact that laws and regulations surrounding informational privacy are constantly changing in different countries. The article has a neutral tone and approaches the topic from multiple perspectives. It addresses the fact that issues with informational privacy and how they are addressed varies greatly in different countries. The article strikes a balance between defining informational privacy and the ways that information is currently being protected. There is a focus on current issues and mechanisms surrounding protecting informational privacy which makes the article relevant. All facts and sources are properly cited, with more than one citation per paragraph which allows the reader to clearly see secondary sources. The article is well written and has no obvious grammatical errors. The article talk page was active with the last comment being just a few months ago. Each comment helped make the article more comprehensive and take a more neutral stance.

Which article are you evaluating?
(Provide a link to the article here.) Computer security

Why you have chosen this article to evaluate?
(Briefly explain why you chose it, why it matters, and what your preliminary impression of it was.)

I have chosen this article to evaluate because it's a topic that I'm interested in and I'm curious to see what's on the Wikipedia page for this type of broad subject. I wonder how comprehensive and detailed the page will be and how much I can learn from it.

Evaluate the article
(Compose a detailed evaluation of the article here, considering each of the key aspects listed above. Consider the guiding questions, and check out the examples of what a useful Wikipedia article evaluation looks like.)

The lead section of this article first provided a clear definition of what cybersecurity is. This is useful because it makes the topic more precise and gives a brief explanation of what cyber security is. The next paragraph addresses the significance of cybersecurity and why it is becoming increasingly relevant. I found this useful because it provides a real world application to the issues of cyber security. The article is very comprehensive and the contents range from a detailed description of the ways that cyber security can be compromised electronically to attacker motivation and what the data may be used for. This article is very long and provides a lot of detail into what cyber security is and why security breaches occur. The first sections of the article describe how cyber security can be compromised through various attacks, such as phishing, malware, and tampering. The next section goes into detail on the motivation of attackers and the impact of these breaches. The second major section of the article focuses on ways that cyber security attacks can be prevented. I find this section to be very relevant and a natural connector to the first section of the article. The content does go into quite finite detail on the various ways that cyber security attacks can happen and methods of prevention. The article maintains a neutral tone throughout and merely provides facts on cyber security issues. The focus is on real world issues and aims to address modern complications that occur with cyber security. There are many citations with a lengthy reference at the end of the article, which provides a lot of additional information that can be easily accessed. The talk page is active, with recent users pointing out the lack of relevancy on a few subtopics.

What

What I Plan To Contribute
The article that I plan to contribute to is the Data Sanitization page on Wikipedia. There is currently already a page up on the Wikipedia website, so I will need to work and collaborate with the author and hopefully add my contributions that can help make it a much more robust and comprehensive overview. I want to expand on the current research that is being conducted for data sanitization and also add more than just the technical aspects that are currently posted on the Wikipedia page. The page right now gives a good definition of the specifics of data injection and the problems that come with it, but I'd also like to add more methods and ways that it data sanitization is currently being utilized.

Improving an existing article (Outline)

 * Data sanitization currently redirects to an article on code injection
 * Talks about a specific way that virus code can be injected into vulnerable devices
 * Goes into the details of examples of specific code injection methods
 * Does not specifically cover the purpose of data sanitization or general methods
 * Loosely relates to data sanitization, only addresses ways that electronic privacy can be compromised, doesn't talk about ways to prevent it

New ideas:


 * General use of data sanitization
 * Purpose of data sanitization
 * Methods of data sanitization
 * Applications to real world issues (5G, Google Home, Alexa products etc.)