User:ErinPapadimitriou/sandbox

Data Preservation is the act of conserving and maintaining both the safety and integrity of data. This is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and it's metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data. Or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.

History
Most historical data collected overtime has been lost or destroyed. War and natural disasters combined with the lack of materials and necessary practices to preserve and protect data has caused this. Usually, only the most important data sets were saved, such as government records and statistics, legal contracts and economic transactions. Scientist research and doctoral theses data have mostly been destroyed from improper storage and lack of data preservation awareness and execution. Overtime, data preservation has evolved and has generated importance and awareness. We now have many different ways to preserve data and many different important organizations revolved around doing so.

The first digital data preservation storage solutions appeared in the 1950's, which were usually flat or hierarchically structured. While there were still issues with these solutions, it made storing data much cheaper, and more easily accessible. In the 1970's relational databases as well as spreadsheets appeared. Relational data bases structure data into tables using SQL's or structured query languages which made them more efficient than the preceding storage solutions, and spreadsheets hold high volumes of numeric data which can be applied to these relational databases to produce derivative data. More recently, non-relational (NoSQL) databases have appeared as compliments to these relational databases which hold high volumes of unstructured data.

Importance
The importance of preserving data is vast. When data is lost it is as though it never existed. It is important to realize that data is the building block of everything, it is seen on both small and large scales. Data can be lost in many different ways, whether it be natural disasters, wars, data breaches or just through negligence or decay.

Data can be lost on a small or independent scale whether it's personal data loss, or data loss within businesses and organizations, as well as on a larger or national or global scale which can negatively and potentially permanently affect things such as environmental protection, medial research, homeland security, public health and safety, economic development and culture.

Ways in which data collections can be used when preserved and stored properly can be seen through the U.S. Geological Survey (USGS), which stores data collections on natural hazards, natural resources, and landscapes. The data collected by the USGS is used by federal and state land management agencies towards land- use planning and management.

In Contrast
In contrast, data holdings are collections of gathered data that are informally kept and do not prepare for long-term preservation. For example, a collection or back-up of personal files. Data holdings are generally the storage methods used in the past when data has been lost due to environmental and other historical disasters.

Furthemore, data retention differs from data preservation in the sense that by definition, to retain an object [data] is to hold or keep possession or use of the object. To preserve an object is to protect, maintain and keep up for future use.

Thus, data preservation exceeds the concept of having or possessing data or back up copies of data. Data preservation ensures persistent access to data by planning back-up and recovery strategies, preceding the event of a disaster or technological change.

Digital
Digital Preservation, is similar to data preservation, but is mainly concerned with technological threats, and solely digital data. Essentially digital data is a set of formal activities to enable ongoing or persistent use and access of digital data exceeding the occurrence of technological malfunction or change. Digital Preservation is aware of the inevitable change in technology and protocols, and prepares for data will need to be accessible across new types of technologies and platforms while being the integrity of the data and metadata being conserved.

Technology, while providing great process in conserving data that may not have been possible in the past, is also changing at such a quick rate that digital data may not be accessible anymore due to the format being incompatible with new software. Without the use of data preservation much of our existing digital data is at risk.

The majority of methods used towards data preservation today are digital methods, which are so far the most effective methods that exist.

Archives
Archives are a collection of historical documents and records. Archives contribute and work towards the preservation of data by collecting data that is well organized, while providing the appropriate metadata to confirm it. An example of an important data archive is The LONI Image Data Archive, which is an archive that collects data regarding clinical trials and clinical research studies.

Catalogues, Directories and Portals
Catalogues, directories and portals are consolidated resources which are kept by individual institutions, and are associated with data archives and holdings. In other words, the data is not presented on the site, but instead might act as metadata and aggregators, and may administer thorough inventories.

Repositories
Repositories are places where data archives and holdings can be accessed and stored. The goal of repositories is to make sure that all requirements and protocols of archives and holdings are being met, and data is being certified to ensure data integrity and user trust. Single-site Repositories A repository that holds all data sets on a single site. An example of a major single-site repository the Data Archiving and Networking Services (DANS) which is a repository which provides ongoing access to digital reseacrch resources for the Netherlands.

Multi-Site Repositories A repository that hosts data set on multiple institutional sites. An example of a well known multi-site repository is OpenAIRE which is a repository that hosts research data and publications collaborating all of the EU countries and more. OpenAIRE promotes open scholarship and seeks to improves discover-ability and re-usability of data. Trusted Digital Repostitory (TDR) A repository that seeks to provide reliable, trusted access over a long period of time. The repository can be single or multi-sited but must cooperate with the Reference Model for an Open Archival Information System, as well as adhere to a set of rules or attributes that contribute to its trust such as having persistent financial responsibility, organizational buoyancy, administrative responsibility security and safety. An example of a trusted digital repository is The Digital Repository of Ireland (DRI) which is a multi-site repository that hosts Ireland's humanity and social science data sets.

Cyber Infrastructures
Cyber infrastructures which consists of archive collections which are made available through the system of hardware, technologies, software, polices, services and tools. Cyber infrastructures are geared towards the sharing of data supporting peer-to-peer collaborations and a cultural community.

An example of a major cyber-infrastructure is The Canadian Geo-spacial Data Infrastructure which provides access to spacial data in Canada.

References:
1.

2.

3.

4.

Is everything in the article relevant to the article topic? Is there anything that distracted you? Everything in the article was relevant to the topic of WikiData, nothing is really off topic of distracting.

Is the article neutral? Are there any claims, or frames, that appear heavily biased toward a particular position? The article is pretty neutral it outlines wikidata in a pretty unbiased way, although it does express some concerns with some of the sources of the data.

Are there viewpoints that are overrepresented, or underrepresented? The viewpoint that wikidata may be in creditable is the main thing that is represented, however the article is not very detailed. Check a few citations. Do the links work? Does the source support the claims in the article? The links I checked worked and supported the claims of the article.

Is each fact referenced with an appropriate, reliable reference? Where does the information come from? Are these neutral sources? If biased, is that bias noted? The facts in this article are referenced with appropriate and reliable references. The information comes from a variety of different sources, mostly ones with seemingly positive feedback. Only one or two that are focused on the downfalls of Wikidata.

Is any information out of date? Is anything missing that could be added? The information given seems to be up to date, howevere there is much more research that could be done on this topic.

Check out the Talk page of the article. What kinds of conversations, if any, are going on behind the scenes about how to represent this topic? The talk page has a few conversations, mostly regarding the fact that the article needs to be more sufficiently researched and finished

How is the article rated? Is it a part of any WikiProjects? The article is rated with an Alexa rating, and is not part of any wikiprojects. How does the way Wikipedia discusses this topic differ from the way we've talked about it in class? We haven't talked about wikidata in class as of yet.