National Software Reference Library

The National Software Reference Library (NSRL), is a project of the National Institute of Standards and Technology (NIST) which maintains a repository of known software, file profiles and file signatures for use by law enforcement and other organizations involved with computer forensic investigations. The project is supported by the United States Department of Justice's National Institute of Justice, the Federal Bureau of Investigation (FBI), Defense Computer Forensics Laboratory (DCFL), the U.S. Customs Service, software vendors, and state and local law enforcement. It also provides a research environment for computational analysis of large sets of files.

Components
The NSRL is made up of three major elements:
 * 1) A large physical collection of commercial software packages (e.g., operating systems, off-the-shelf application software);
 * 2) A database containing detailed information, or metadata, about each file that makes up each of those software packages;
 * 3) A smaller public dataset containing the most widely used metadata for each file in the collection that is published and updated quarterly.  This is called the Reference Data Set.

Reference Data Set
The NSRL collects software from various sources and computes message digests, or cryptographic hash values, from them. The digests are stored in the Reference Data Set (RDS) which can be used to identify "known" files on digital media. This will help alleviate much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations. Although the RDS hashset contains some malicious software (such as steganography and hacking tools) it does not contain illicit material (e.g. indecent images).

The collection of original software media is maintained in order to provide repeatability of the calculated hash values, ensuring admissibility of this data in court.

In 2004 the NSRL released a set of hashes for verifying eVoting software, as part of the US Election Assistance Commission's Electronic Voting Security Strategy.

As of October 1, 2013 the Reference Data Set is at version 2.42 and contains over 33.9 million unique hash values. The data set is available at no cost to the public.

In addition to operating system and application software, the library has also collected numerous popular video game titles to be used both as part of data forensics, as well as partially to serve as video game preservation.