User:Serialsgirl/Digitization

Lead
Digitization is the process of converting information into a digital (i.e. computer-readable) format. The result is the representation of an object, image, sound, document or signal (usually an analog signal) obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitates processing by digital computers and other operations, but, digitizing simply means the conversion of analog source material into a numerical format; the decimal or any other number system can be used instead.

Digitization is of crucial importance to data processing, storage and transmission, because it "allows information of all kinds in all formats to be carried with the same efficiency and also intermingled". Though analog data is typically more stable, digital data, has the potential to be more easily shared and accessed and, in theory, can be propagated indefinitely, without generation loss, provided it is migrated to new, stable formats as needed. This potential has led to institutional digitization projects designed to improve access and the rapid growth of the digital preservation field.

Sometimes digitization and digital preservation are mistaken for the same thing, however they are different, but digitization is often a vital first step in digital preservation. Libraries, archives, museums and other memory institutions digitize items to preserve fragile materials and create more access points for patrons. Doing this creates challenges for information professionals and solutions can be as varied as the institutions that implement them. Some analog materials, such as audio and video tapes, are nearing the end of their life-cycle and it is important to digitize them before equipment obsolescence and media deterioration makes the data irretrievable.

There are challenges and implications surrounding digitization including time, cost, cultural history concerns and creating an equitable platform for historically marginalized voices. Many digitizing institutions develop their own solutions to these challenges.

Mass digitization projects have had mixed results over the years, but some institutions have had success even if not in the traditional Google Books model.

Technological changes can happen often and quickly, so digitization standards are difficult to keep updated. Professionals in the field can attend conferences and join organizations and working groups to keep their knowledge current and add to the conversation.

History

 * 1957 The Standards Electronic Automatic Computer (SEAC) was invented. That same year, Russell Kirsch used a rotating drum scanner and photomultiplier connected to SEAC to create the first digital image (176x176 pixels) from a photo of his infant son. This image was stored in SEAC memory via a staticizer and viewed via a cathode ray oscilloscope.
 * 1971 Invention of Charge-Coupled Devices that made conversion from analog data to a digital format easy.
 * 1986 work started on the JPEG format.
 * 1990s Libraries began scanning collections to provide access via the world wide web.

Analog signals to digital
Analog signals are continuous electrical signals; digital signals are non-continuous. Analog signals can be converted to digital signals by using an analog-to-digital converter.

The process of converting analog to digital consists of two parts: sampling and quantizing. Sampling measures wave amplitudes at regular intervals, splits them along the vertical axis, and assigns them a numerical value, while quantizing looks for measurements that are between binary values and rounds them up or down.

Nearly all recorded music has been digitized, and about 12 percent of the 500,000+ movies listed on the Internet Movie Database are digitized and were released on DVD.

Digitization of home movies, slides, and photographs is a popular method of preserving and sharing personal multimedia. Slides and photographs may be scanned quickly using an image scanner, but analog video requires a video tape player to be connected to a computer while the item plays in real time. Slides can be digitized quicker with a slide scanner such as the Nikon Coolscan 5000ED.

Another example of digitization is the VisualAudio process developed by the Swiss Fonoteca Nazionale in Lugano, by scanning a high resolution photograph of a record, they are able to extract and reconstruct the sound from the processed image.

Digitization of analog tapes before they degrade, or after damage has already occurred, can rescue the only copies of local and traditional cultural music for future generations to study and enjoy.

Analog texts to digital
Academic and public libraries, foundations, and private companies like Google are scanning older print books and applying optical character recognition (OCR) technologies so they can be keyword searched, but as of 2006, only about 1 in 20 texts had been digitized. Librarians and archivists are working to increase this statistic and in 2019 began digitizing 480,000 books published between 1923 and 1964 that had entered the public domain.

Unpublished manuscripts and other rare papers and documents housed in special collections are being digitized by libraries and archives, but backlogs often slow this process and keep materials with enduring historical and research value hidden from most users (see digital libraries). Digitization has not completely replaced other archival imaging options, such as microfilming which is still used by institutions such as the National Archives and Records Administration (NARA) to provide preservation and access to these resources.

While digital versions of analog texts can potentially be accessed from anywhere in the world, they are not as stable as most print materials or manuscripts and are unlikely to be accessible decades from now without further preservation efforts, while many books manuscripts and scrolls have already been around for centuries. However, for some materials that have been damaged by water, insects, or catastrophes, digitization might be the only option for continued use.

Digitization versus digital preservation
Digitizing something is not the same as digitally preserving it. To digitize something is to create a digital surrogate (copy or format) of an existing analog item (book, photograph, or record) and is often described as converting it from analog to digital, however both copies remain. An example would be scanning a photograph and having the original piece in a photo album and a digital copy saved to a computer. This is essentially the first step in digital preservation which is to maintain the digital copy over a long period of time and making sure it remains authentic and accessible.

Digitization is done once with the technology currently available, while digital preservation is more complicated because technology changes so quickly that a once popular storage format may become obsolete before it breaks. An example is a 5 1/4" floppy drive, computers are no longer made with them and obtaining the hardware to convert a file stored on 5 1/4" floppy disc can be expensive. To combat this risk, equipment must be upgraded as newer technology becomes affordable (about 2 to 5 years), but before older technology becomes unobtainable (about 5 to 10 years).

Digital preservation can also apply to born-digital material, such as a Microsoft Word document or a social media post. In contrast, digitization only applies exclusively to analog materials. Born-digital materials present a unique challenge to digital preservation not only due to technological obsolescence but also because of the inherently unstable nature of digital storage and maintenance. Most websites last between 2.5 and 5 years, depending on the purpose for which they were designed.

The Library of Congress provides numerous resources and tips for individuals looking to practice digitization and digital preservation for their personal collections.

Challenges
Many libraries, archives, museums, and other memory institutions, struggle with catching up and staying current regarding digitization and the expectation that everything should already be online. The time spent planning, doing the work, and processing the digital files along with the expense and fragility of some materials are some of the most common.

Time spent
Digitization is a time-consuming process, even more so when the condition or format of the analog resources requires special handling. Deciding what part of a collection to digitize can sometimes take longer than digitizing it in its entirety. Each digitization project is unique and workflows for one will be different from every other project that goes through the process, so time must be spent thoroughly studying and planning each one to create the best plan for the materials and the intended audience.

Expense
Cost of equipment, staff time, metadata creation, and digital storage media make large scale digitization of collections expensive for all types of cultural institutions.

Ideally all institutions want their digital copies to have the best image quality so a high-quality copy can be maintained over time. However, smaller institutions may not be able to afford such equipment or manpower, which limits how much material can be digitized, so archivists and librarians must know what their patrons need and prioritize digitization of those items. Often the cost of time and expertise involved with describing materials and adding metadata is more than the digitization process.

Fragility of materials
Some materials, such as brittle books, are so fragile that undergoing the process of digitization could damage them irreparably. Despite potential damage, one reason for digitizing fragile materials is because they are so heavily used that creating a digital surrogate will help preserve the original copy long past its expected lifetime and increase access to the item.

Copyright
Copyright is not only a problem faced by projects like Google Books, but by institutions that may need to contact private citizens or institutions mentioned in archival documents for permission to scan the items for digital collections. It can be time consuming to make sure all potential copyright holders have given permission, but if copyright cannot be determined or cleared, it may be necessary to restrict even digital materials to in library use.

Solutions
Institutions can make digitization more cost-effective by planning before a project begins, including outlining what they hope to accomplish and the minimum amount of equipment, time, and effort that can meet those goals. If a budget needs more money to cover the cost of equipment or staff, an institution might investigate if grants are available.

Collaboration
Collaborations between institutions have the potential to save money on equipment, staff, and training as individual members share their equipment, manpower, and skills rather than pay outside organizations to provide these services. Collaborations with donors can build long-term support of current and future digitization projects.

Outsourcing
Outsourcing can be an option if an institution does not want to invest in equipment but since most vendors require an inventory and basic metadata for materials, this is not an option for institutions hoping to digitize without processing.

Non-traditional staffing
Many institutions have the option of using volunteers, student employees, or temporary employees on projects. While this saves on staffing costs, it can add costs elsewhere such as on training or having to re-scan items due to poor quality.

MPLP
One way to save time and resources is by using the More Product, Less Process (MPLP) method to digitize materials while they are being processed. Since GLAM (Galleries, Libraries, Archives, and Museums) institutions are already committed to preserving analog materials from special collections, digital access copies do not need to be high-resolution preservation copies, just good enough to provide access to rare materials. Sometimes institutions can get by with 300 dpi JPGs rather than a 600 dpi TIFF for images, and a 300 dpi grayscale scan of a document rather than a color one at 600 dpi.

Mass Digitization
The expectation that everything should be online has led to mass digitization practices, but it is an ongoing process with obstacles that have led to alternatives. As new technology makes automated scanning of materials safer for materials and decreases need for cropping and de-skewing, mass digitization should be able to increase.

Obstacles
Digitization can be a physically slow process involving selection and preparation of collections that can take years if materials need to be compared for completeness or are vulnerable to damage. Price of specialized equipment, storage costs, website maintenance, quality control, and retrieval system limitations all add to the problems of working on a large scale.

Digitization on demand
Scanning materials as users ask for them, provides copies for others to use and cuts down on repeated copying of popular items. If one part of a folder, document, or book is asked for, scanning the entire object can save time in the future by already having the material access if someone else needs the material. Digitizing on demand can increase volume because time spent on selection and prep has been used on scanning instead.

Google books
From the start, Google has concentrated on text rather than images or special collections. Although criticized in the past for poor image quality, selection practices, and lacking long-term preservation plans, their focus on quantity over quality has enabled Google to digitize more books than other digitizers.

Standards
Digitization is not a static field and standards change with new technology, so it is up to digitization managers to stay current with new developments. Although each digitization project is different, common standards in formats, metadata, quality, naming, and file storage should be used to give the best chance of interoperability and patron access. As digitization is often the first step in digital preservation, questions about how to handle digital files should be addressed in institutional standards.

A standard for still images adapted from the Smithsonian digitization standards might include the following:

Resources to create local standards are available from the Society of American Archivists, the Smithsonian, and the Northeast Document Conservation Center.

Digitizing Marginalized Voices
Digitization can be used to highlight voices of historically marginalized peoples and add them to the greater body of knowledge. Many projects, some community archives created by members of those groups, are doing this in a way that supports the people, values their input and collaboration, and gives them a sense of ownership of the collection. Examples of projects are Gi-gikinomaage-min and the South Asian American Digital Archive (SAADA).

Gi-gikinomaage-min
Gi-gikinomaage-min is Anishinaabemowin for "We are all teachers" and its main purpose is "to document the history of Native Americans in Grand Rapids, Michigan." It combines new audio and video oral histories with digitized flyers, posters, and newsletters from Grand Valley State University's analog collections. Although not entirely a newly digitized project, what was created also added item-level metadata to enhance context. At the start, collaboration between several university departments and the Native American population was deemed important and remained strong throughout the project.

SAADA
The South Asian American Digital Archive (SAADA) has no physical building, is entirely digital and everything is handled by volunteers. This archive was started by Michelle Caswell and Samip Mallick and collects a broad variety of materials "created by or about people residing in the United States who trace their  heritage to Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, Sri Lanka, and the many South Asian diaspora communities across the globe." (Caswell, 2015, 2). The collection of digitized items includes private, government, and university held materials.

Black Campus Movement Collection (BCM)
Kent State University began its BCM collection when it acquired the papers of African American alumnus Lafayette Tolliver, which included about 1,000 photographs that chronicled the black student experience at Kent State from 1968-1971. The collection continues to add materials from the 1960s up to and including the current student body and several oral histories have been added since it debuted. When digitizing the items, it was necessary to work with alumni to create descriptions for the images. This collaboration created changes in local controlled vocabularies the libraries used to create metadata for the images.

Cultural Heritage Concerns
Digitization of community archives by indigenous and other marginalized people has led to traditional memory institutions reassessing how they digitize and handle objects in their collections that may have ties to these groups. The topics they are rethinking are varied and include how items are chosen for digitization projects, what metadata to use to convey proper context to be retrievable by the groups they represent, and whether an item should be accessed by the world or just those who the groups originally intended to have access, such as elders. Many navigate these concerns by collaborating with the communities they seek to represent through their digitized collections.