Crystallographic database

A crystallographic database is a database specifically designed to store information about the structure of molecules and crystals. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or molecules. They are characterized by symmetry, morphology, and directionally dependent physical properties. A crystal structure describes the arrangement of atoms, ions, or molecules in a crystal. (Molecules need to crystallize into solids so that their regularly repeating arrangements can be taken advantage of in X-ray, neutron, and electron diffraction based crystallography).

Crystal structures of crystalline material are typically determined from X-ray or neutron single-crystal diffraction data and stored in crystal structure databases. They are routinely identified by comparing reflection intensities and lattice spacings from X-ray powder diffraction data with entries in powder-diffraction fingerprinting databases.

Crystal structures of nanometer sized crystalline samples can be determined via structure factor amplitude information from single-crystal electron diffraction data or structure factor amplitude and phase angle information from Fourier transforms of HRTEM images of crystallites. They are stored in crystal structure databases specializing in nanocrystals and can be identified by comparing zone axis subsets in lattice-fringe fingerprint plots with entries in a lattice-fringe fingerprinting database.

Crystallographic databases differ in access and usage rights and offer varying degrees of search and analysis capacity. Many provide structure visualization capabilities. They can be browser based or installed locally. Newer versions are built on the relational database model and support the Crystallographic Information File (CIF) as a universal data exchange format.

Overview
Crystallographic data are primarily extracted from published scientific articles and supplementary material. Newer versions of crystallographic databases are built on the relational database model, which enables efficient cross-referencing of tables. Cross-referencing serves to derive additional data or enhance the search capacity of the database.

Data exchange among crystallographic databases, structure visualization software, and structure refinement programs has been facilitated by the emergence of the Crystallographic Information File (CIF) format. The CIF format is the standard file format for the exchange and archiving of crystallographic data. It was adopted by the International Union of Crystallography (IUCr), who also provides full specifications of the format. It is supported by all major crystallographic databases.

The increasing automation of the crystal structure determination process has resulted in ever higher publishing rates of new crystal structures and, consequentially, new publishing models. Minimalistic articles contain only crystal structure tables, structure images, and, possibly, abstract-like structure description. They tend to be published in author-financed or subsidized open-access journals. Acta Crystallographica Section E and Zeitschrift für Kristallographie belong in this category. More elaborate contributions may go to traditional subscriber-financed journals. Hybrid journals, on the other hand, embed individual author-financed open-access articles among subscriber-financed ones. Publishers may also make scientific articles available online, as Portable Document Format (PDF) files.

Crystal structure data in CIF format are linked to scientific articles as supplementary material. CIFs may be accessible directly from the publisher's website, crystallographic databases, or both. In recent years, many publishers of crystallographic journals have come to interpret CIFs as formatted versions of open data, i.e. representing non-copyrightable facts, and therefore tend to make them freely available online, independent of the accessibility status of linked scientific articles.

Trends


As of 2008, more than 700,000 crystal structures had been published and stored in crystal structure databases. The publishing rate has reached more than 50,000 crystal structures per year. These numbers refer to published and republished crystal structures from experimental data. Crystal structures are republished owing to corrections for symmetry errors, improvements of lattice and atomic parameters, and differences in diffraction technique or experimental conditions. As of 2016, there are about 1,000,000 molecule and crystal structures known and published, approximately half of them in open access.

Crystal structures are typically categorized as minerals, metals-alloys, inorganics, organics, nucleic acids, and biological macromolecules. Individual crystal structure databases cater for users in specific chemical, molecular-biological, or related disciplines by covering super- or subsets of these categories. Minerals are a subset of mostly inorganic compounds. The category ‘metals-alloys’ covers metals, alloys, and intermetallics. Metals-alloys and inorganics can be merged into ‘non-organics’. Organic compounds and biological macromolecules are separated according to molecular size. Organic salts, organometallics, and metalloproteins tend to be attributed to organics or biological macromolecules, respectively. Nucleic acids are a subset of biological macromolecules.

Comprehensiveness can refer to the number of entries in a database. On those terms, a crystal structure database can be regarded as comprehensive, if it contains a collection of all (re-)published crystal structures in the category of interest and is updated frequently. Searching for structures in such a database can replace more time-consuming scanning of the open literature. Access to crystal structure databases differs widely. It can be divided into reading and writing access. Reading access rights (search, download) affect the number and range of users. Restricted reading access is often coupled with restricted usage rights. Writing access rights (upload, edit, delete), on the other hand, determine the number and range of contributors to the database. Restricted writing access is often coupled with high data integrity.

In terms of user numbers and daily access rates, comprehensive and thoroughly vetted open-access crystal structure databases naturally surpass comparable databases with more restricted access and usage rights. Independent of comprehensiveness, open-access crystal structure databases have spawned open-source software projects, such as search-analysis tools, visualization software, and derivative databases. Scientific progress has been slowed down by restricting access or usage rights as well as limiting comprehensiveness or data integrity. Restricted access or usage rights are commonly associated with commercial crystal structure databases. Lack of comprehensiveness or data integrity, on the other hand, are associated with some of the open-access crystal structure databases other than the Crystallography Open Database (COD), and is "macromolecular open-access counterpart", the world wide Protein Database. Apart from that, several crystal structure databases are freely available for primarily educational purposes, in particular mineralogical databases and educational offshoots of the COD.

Crystallographic databases can specialize in crystal structures, crystal phase identification, crystallization, crystal morphology, or various physical properties. More integrative databases combine several categories of compounds or specializations. Structures of incommensurate phases, 2D materials, nanocrystals, thin films on substrates, and predicted crystal structures are collected in tailored special structure databases.

Search
Search capacities of crystallographic databases differ widely. Basic functionality comprises search by keywords, physical properties, and chemical elements. Of particular importance is search by compound name and lattice parameters. Very useful are search options that allow the use of wildcard characters and logical connectives in search strings. If supported, the scope of the search can be constrained by the exclusion of certain chemical elements.

More sophisticated algorithms depend on the material type covered. Organic compounds might be searched for on the basis of certain molecular fragments. Inorganic compounds, on the other hand, might be of interest with regard to a certain type of coordination geometry. More advanced algorithms deal with conformation analysis (organics), supramolecular chemistry (organics), interpolyhedral connectivity (‘non-organics’) and higher-order molecular structures (biological macromolecules). Search algorithms used for a more complex analysis of physical properties, e.g. phase transitions or structure-property relationships, might apply group-theoretical concepts.

Modern versions of crystallographic databases are based on the relational database model. Communication with the database usually happens via a dialect of the Structured Query Language (SQL). Web-based databases typically process the search algorithm on the server interpreting supported scripting elements, while desktop-based databases run locally installed and usually precompiled search engines.

Crystal phase identification
Crystalline material may be divided into single crystals, twin crystals, polycrystals, and crystal powder. In a single crystal, the arrangement of atoms, ions, or molecules is defined by a single crystal structure in one orientation. Twin crystals, on the other hand, consist of single-crystalline twin domains, which are aligned by twin laws and separated by domain walls.

Polycrystals are made of a large number of small single crystals, or crystallites, held together by thin layers of amorphous solid. Crystal powder is obtained by grinding crystals, resulting in powder particles, made up of one or more crystallites. Both polycrystals and crystal powder consist of many crystallites with varying orientation.

Crystal phases are defined as regions with the same crystal structure, irrespective of orientation or twinning. Single and twinned crystalline specimens therefore constitute individual crystal phases. Polycrystalline or crystal powder samples may consist of more than one crystal phase. Such a phase comprises all the crystallites in the sample with the same crystal structure.

Crystal phases can be identified by successfully matching suitable crystallographic parameters with their counterparts in database entries. Prior knowledge of the chemical composition of the crystal phase can be used to reduce the number of database entries to a small selection of candidate structures and thus simplify the crystal phase identification process considerably.

Powder diffraction fingerprinting (1D)
Applying standard diffraction techniques to crystal powders or polycrystals is tantamount to collapsing the 3D reciprocal space, as obtained via single-crystal diffraction, onto a 1D axis. The resulting partial-to-total overlap of symmetry-independent reflections renders the structure determination process more difficult, if not impossible.

Powder diffraction data can be plotted as diffracted intensity (I) versus reciprocal lattice spacing (1/d). Reflection positions and intensities of known crystal phases, mostly from X-ray diffraction data, are stored, as d-I data pairs, in the Powder Diffraction File (PDF) database. The list of d-I data pairs is highly characteristic of a crystal phase and, thus, suitable for the identification, also called ‘fingerprinting’, of crystal phases.

Search-match algorithms compare selected test reflections of an unknown crystal phase with entries in the database. Intensity-driven algorithms utilize the three most intense lines (so-called ‘Hanawalt search’), while d-spacing-driven algorithms are based on the eight to ten largest d-spacings (so-called ‘Fink search’).

X-ray powder diffraction fingerprinting has become the standard tool for the identification of single or multiple crystal phases and is widely used in such fields as metallurgy, mineralogy, forensic science, archeology, condensed matter physics, and the biological and pharmaceutical sciences.

Lattice-fringe fingerprinting (2D)
Powder diffraction patterns of very small single crystals, or crystallites, are subject to size-dependent peak broadening, which, below a certain size, renders powder diffraction fingerprinting useless. In this case, peak resolution is only possible in 3D reciprocal space, i.e. by applying single-crystal electron diffraction techniques.

High-Resolution Transmission Electron Microscopy (HRTEM) provides images and diffraction patterns of nanometer sized crystallites. Fourier transforms of HRTEM images and electron diffraction patterns both supply information about the projected reciprocal lattice geometry for a certain crystal orientation, where the projection axis coincides with the optical axis of the microscope.

Projected lattice geometries can be represented by so-called ‘lattice-fringe fingerprint plots’ (LFFPs), also called angular covariance plots. The horizontal axis of such a plot is given in reciprocal lattice length and is limited by the point resolution of the microscope. The vertical axis is defined as acute angle between Fourier transformed lattice fringes or electron diffraction spots. A 2D data point is defined by the length of a reciprocal lattice vector and its (acute) angle with another reciprocal lattice vector. Sets of 2D data points that obey Weiss's zone law are subsets of the entirety of data points in an LFFP. A suitable search-match algorithm using LFFPs, therefore, tries to find matching zone axis subsets in the database. It is, essentially, a variant of a lattice matching algorithm.

In the case of electron diffraction patterns, structure factor amplitudes can be used, in a later step, to further discern among a selection of candidate structures (so-called 'structure factor fingerprinting'). Structure factor amplitudes from electron diffraction data are far less reliable than their counterparts from X-ray single-crystal and powder diffraction data. Existing precession electron diffraction techniques greatly improve the quality of structure factor amplitudes, increase their number and, thus, make structure factor amplitude information much more useful for the fingerprinting process.

Fourier transforms of HRTEM images, on the other hand, supply information not only about the projected reciprocal lattice geometry and structure factor amplitudes, but also structure factor phase angles. After crystallographic image processing, structure factor phase angles are far more reliable than structure factor amplitudes. Further discernment of candidate structures is then mainly based on structure factor phase angles and, to a lesser extent, structure factor amplitudes (so-called 'structure factor fingerprinting').

Morphological fingerprinting (3D)
The Generalized Steno Law states that the interfacial angles between identical faces of any single crystal of the same material are, by nature, restricted to the same value. This offers the opportunity to fingerprint crystalline materials on the basis of optical goniometry, which is also known as crystallometry. In order to employ this technique successfully, one must consider the observed point group symmetry of the measured faces and creatively apply the rule that "crystal morphologies are often combinations of simple (i.e. low multiplicity) forms where the individual faces have the lowest possible Miller indices for any given zone axis". This shall ensure that the correct indexing of the crystal faces is obtained for any single crystal.

It is in many cases possible to derive the ratios of the crystal axes for crystals with low symmetry from optical goniometry with high accuracy and precision and to identify a crystalline material on their basis alone employing databases such as 'Crystal Data'. Provided that the crystal faces have been correctly indexed and the interfacial angles were measured to better than a few fractions of a tenth of a degree, a crystalline material can be identified quite unambiguously on the basis of angle comparisons to two rather comprehensive databases: the 'Bestimmungstabellen für Kristalle (Определитель Кристаллов)' and the 'Barker Index of Crystals'.

Since Steno's Law can be further generalized for a single crystal of any material to include the angles between either all identically indexed net planes (i.e. vectors of the reciprocal lattice, also known as 'potential reflections in diffraction experiments') or all identically indexed lattice directions (i.e. vectors of the direct lattice, also known as zone axes), opportunities exist for morphological fingerprinting of nanocrystals in the transmission electron microscope (TEM) by means of transmission electron goniometry.

The specimen goniometer of a TEM is thereby employed analogously to the goniometer head of an optical goniometer. The optical axis of the TEM is then analogous to the reference direction of an optical goniometer. While in optical goniometry net-plane normals (reciprocal lattice vectors) need to be successively aligned parallel to the reference direction of an optical goniometer in order to derive measurements of interfacial angles, the corresponding alignment needs to be done for zone axes (direct lattice vector) in transmission electron goniometry. (Note that such alignments are by their nature quite trivial for nanocrystals in a TEM after the microscope has been aligned by standard procedures.) Since transmission electron goniometry is based on Bragg's Law for the transmission (Laue) case (diffraction of electron waves), interzonal angles (i.e. angles between lattice directions) can be measured by a procedure that is analogous to the measurement of interfacial angles in an optical goniometer on the basis of Snell's Law, i.e. the reflection of light. The complements to interfacial angles of external crystal faces can, on the other hand, be directly measured from a zone-axis diffraction pattern or from the Fourier transform of a high resolution TEM image that shows crossed lattice fringes.

Lattice matching (3D)
Lattice parameters of unknown crystal phases can be obtained from X-ray, neutron, or electron diffraction data. Single-crystal diffraction experiments supply orientation matrices, from which lattice parameters can be deduced. Alternatively, lattice parameters can be obtained from powder or polycrystal diffraction data via profile fitting without structural model (so-called 'Le Bail method').

Arbitrarily defined unit cells can be transformed to a standard setting and, from there, further reduced to a primitive smallest cell. Sophisticated algorithms compare such reduced cells with corresponding database entries. More powerful algorithms also consider derivative super- and subcells. The lattice-matching process can be further sped up by precalculating and storing reduced cells for all entries. The algorithm searches for matches within a certain range of the lattice parameters. More accurate lattice parameters allow a narrower range and, thus, a better match.

Lattice matching is useful in identifying crystal phases in the early stages of single-crystal diffraction experiments and, thus, avoiding unnecessary full data collection and structure determination procedures for already known crystal structures. The method is particularly important for single-crystalline samples that need to be preserved. If, on the other hand, some or all of the crystalline sample material can be ground, powder diffraction fingerprinting is usually the better option for crystal phase identification, provided that the peak resolution is good enough. However, lattice matching algorithms are still better at treating derivative super- and subcells.

Visualization
Newer versions of crystal structure databases integrate the visualization of crystal and molecular structures. Specialized or integrative crystallographic databases may provide morphology or tensor visualization output.

Crystal structures
The crystal structure describes the three-dimensional periodic arrangement of atoms, ions, or molecules in a crystal. The unit cell represents the simplest repeating unit of the crystal structure. It is a parallelepiped containing a certain spatial arrangement of atoms, ions, molecules, or molecular fragments. From the unit cell the crystal structure can be fully reconstructed via translations.

The visualization of a crystal structure can be reduced to the arrangement of atoms, ions, or molecules in the unit cell, with or without cell outlines. Structure elements extending beyond single unit cells, such as isolated molecular or polyhedral units as well as chain, net, or framework structures, can often be better understood by extending the structure representation into adjacent cells.

The space group of a crystal is a mathematical description of the symmetry inherent in the structure. The motif of the crystal structure is given by the asymmetric unit, a minimal subset of the unit cell contents. The unit cell contents can be fully reconstructed via the symmetry operations of the space group on the asymmetric unit. Visualization interfaces usually allow for switching between asymmetric unit and full structure representations.

Bonds between atoms or ions can be identified by characteristic short distances between them. They can be classified as covalent, ionic, hydrogen, or other bonds including hybrid forms. Bond angles can be deduced from the bond vectors in groups of atoms or ions. Bond distances and angles can be made available to the user in tabular form or interactively, by selecting pairs or groups of atoms or ions. In ball-and-stick models of crystal structures, balls represent atoms and sticks represent bonds.

Since organic chemists are particularly interested in molecular structures, it might be useful to be able to single out individual molecular units interactively from the drawing. Organic molecular units need to be given both as 2D structural formulae and full 3D molecular structures. Molecules on special-symmetry positions need to be reconstructed from the asymmetric unit. Protein crystallographers are interested in molecular structures of biological macromolecules, so that provisions need to be made to be able to represent molecular subunits as helices, sheets, or coils, respectively.

Crystal structure visualization can be integrated into a crystallographic database. Alternatively, the crystal structure data are exchanged between the database and the visualization software, preferably using the CIF format. Web-based crystallographic databases can integrate crystal structure visualization capability. Depending on the complexity of the structure, lighting, and 3D effects, crystal structure visualization can require a significant amount of processing power, which is why the actual visualization is typically run on the client.

Currently, web-integrated crystal structure visualization is based on Java applets from open-source projects such as Jmol. Web-integrated crystal structure visualization is tailored for examining crystal structures in web browsers, often supporting wide color spectra (up to 32 bit) and window size adaptation. However, web-generated crystal structure images are not always suitable for publishing due to issues such as resolution depth, color choice, grayscale contrast, or labeling (positioning, font type, font size).

Morphology and physical properties
Mineralogists, in particular, are interested in morphological appearances of individual crystals, as defined by the actually formed crystal faces (tracht) and their relative sizes (habit). More advanced visualization capabilities allow for displaying surface characteristics, imperfections inside the crystal, lighting (reflection, shadow, and translucency), and 3D effects (interactive rotatability, perspective, and stereo viewing).

Crystal physicists, in particular, are interested in anisotropic physical properties of crystals. The directional dependence of a crystal's physical property is described by a 3D tensor and depends on the orientation of the crystal. Tensor shapes are more palpable by adding lighting effects (reflection and shadow). 2D sections of interest are selected for display by rotating the tensor interactively around one or more axes.

Crystal morphology or physical property data can be stored in specialized databases or added to more comprehensive crystal structure databases. The Crystal Morphology Database (CMD) is an example for a web-based crystal morphology database with integrated visualization capabilities.

Crystal structures

 * American Mineralogist Crystal Structure Database (AMCSD) (contents: crystal structures of minerals, access: free, size: large)
 * Cambridge Structural Database (CSD) (contents: crystal structures of organics and metal-organics, access: restricted, size: very large)
 * Crystallography Open Database (COD) (contents: crystal structures of organics, metalorganics, minerals, inorganics, metals, alloys, and intermetallics, access: free, size: very large)
 * COD+ (Web Interface for COD) (contents: crystal structures of organics, metalorganics, minerals, inorganics, metals, alloys, and intermetallics, access: free, size: very large)
 * Database of Zeolite Structures (contents: crystal structures of zeolites, access: free, size: small)
 * Incommensurate Structures Database (contents: incommensurate structures, access: free, size: small)
 * Inorganic Crystal Structure Database (ICSD) (contents: crystal structures of minerals and inorganics, access: restricted, size: large)
 * MaterialsProject Database (contents: crystal structures of inorganic compounds, access: free, size: large)
 * Materials Platform for Data Science (MPDS) or PAULING FILE (contents: critically evaluated crystal structures, as well as physical properties and phase diagrams, from the world scientific literature, access: partially free, size: very large)
 * MaterialsWeb Database (contents: crystal structures of inorganic 2D materials and bulk compounds, access: free, size: large)
 * Metals Structure Database (CRYSTMET) (contents: crystal structures of metals, alloys, and intermetallics, access: restricted, size: large)
 * Mineralogy Database (contents: crystal structures of minerals, access: free, size: medium)
 * MinCryst (contents: crystal structures of minerals, access: free, size: medium)
 * NIST Structural Database NIST Structural Database (contents: crystal structures of metals, alloys, and intermetallics, access: restricted, size: large)
 * NIST Surface Structure Database (contents: surface and interface structures, access: restricted, size: small-medium)
 * Nucleic Acid Database (contents: crystal and molecular structures of nucleic acids, access: free, size: medium)
 * Pearson's Crystal Data (contents: crystal structures of inorganics, minerals, salts, oxides, hydrides, metals, alloys, and intermetallics, access: restricted, size: very large)
 * Worldwide Protein Data Bank (PDB) (contents: crystal and molecular structures of biological macromolecules, access: free, size: very large)
 * Wiki Crystallography Database (WCD) (contents: crystal structures of organics, metalorganics, minerals, inorganics, metals, alloys, and intermetallics, access: free, size: medium)

Crystal phase identification

 * Match! (method: powder diffraction fingerprinting)
 * NIST Crystal Data (method: lattice matching)
 * Powder Diffraction File (PDF) (method: powder diffraction fingerprinting)

Specialized databases

 * Educational Subset of the Crystallography Open Database (EDU-COD) (specialization: crystal and molecule structures for college education, access: free, size: medium)
 * Biological Macromolecule Crystallization Database (BMCD) (specialization: crystallization of biological macromolecules, access: free, size: medium)
 * Crystal Morphology Database (CMD) (specialization: morphology of crystals, access: free, size: very small)
 * Database of Hypothetical Structures (specialization: predicted zeolite-like crystal structures, access: free, size: large)
 * Database of Zeolite Structures (specialization: crystal structures of zeolites, access: free, size: small)
 * Hypothetical MOFs Database (specialization: predicted metal-organic framework crystal structures, access: free, size: large)
 * Incommensurate Structures Database (specialization: incommensurate structures, access: free, size: small)
 * Marseille Protein Crystallization Database (MPCD) (specialization: crystallization of biological macromolecules, access: free, size: medium)
 * MOFomics (specialization: pore structures of metal-organic frameworks, access: free, size: medium)
 * Nano-Crystallography Database (NCD) (specialization: crystal structures of nanometer sized crystallites, access: free, size: small)
 * NIST Surface Structure Database (specialization: surface and interface structures, access: restricted, size: small-medium)
 * Predicted Crystallography Open Database (PCOD) (spezialization: predicted crystal structures of organics, metal-organics, metals, alloys, intermetallics, and inorganics, access: free, size: very large)
 * Theoretical Crystallography Open Database (TCOD) (spezialization: crystal structures of organics, metal-organics, metals, alloys, intermetallics, and inorganics that were refined or predicted from density functional theory with some experimental input, access: free, size: small)
 * ZEOMICS (specialization: pore structures of zeolites, access: free, size: small)