12 | Crystallographic information#
Databases#
Word count: 0 words
Reading time: 0 minutes
Crystallographic information files contain all of the requisite information needed to generate all of the atomic positions within in infinitly repeating crystal structure. Additionally, important experimental details, statistical information about the data and model quality, chemical information like the shape, size, color, density, and molecular formula, and bibliographic information about the source of the data are also included. Here we will focus on those sections that are only essential for the complete description of the crystal structure. Essentially, we need to know the space group, unit cell parameters, and the atomic positions of a subset of symmetry unique atoms in the unit cell. By applying all of the symmetry operations of the space group to these atom positions then all of the atoms in the unit cell can be generated.
As an example we’ll consider a cif file for sodium chloride that be be accessed through the Inorganic Crystal Structure Database (ICSD).
#(C) 2020 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure. All rights reserved.
data_165592-ICSD
_database_code_ICSD 165592
_audit_creation_date 2010-02-01
_chemical_name_common 'Sodium chloride'
_chemical_formula_structural 'Na Cl'
_chemical_formula_sum 'Cl1 Na1'
_chemical_name_structure_type NaCl
_exptl_crystal_density_diffrn 2.14
_diffrn_ambient_temperature 290.
_citation_title 'Solubility of Al2 O3 in some chloride-fluoride melts'
This top section contains much of the chemical information about the compound. A cif file is mostly composed of key-value pairs where the first work on the line is the key followed by a space then the value. Values that are multiple words are enclosed in single quotes.
loop_
_citation_id
_citation_journal_full
_citation_year
_citation_journal_volume
_citation_page_first
_citation_page_last
_citation_journal_id_ASTM
primary 'Inorganic Chemistry' 2006 45 7367 7371 INOCAJ
loop_
_citation_author_citation_id
_citation_author_name
primary 'Cherginets, V.L.'
primary 'Baumer, V.N.'
primary 'Galkin, S.S.'
primary 'Glushkova, L.V.'
primary 'Rebrova, T.P.'
primary 'Shtitelman, Z.V.'
The next section contains bibliographic information. Here some of the data is structured as a loop_
where each following row that starts with an underscore represents the columns of a data table. The data of the table then follows one row at a time with each column separated by a space.
In this case the first table contains the citation (journal, year, volumn, pages) and the second table is a list of the authors.
_cell_length_a 5.6573(7)
_cell_length_b 5.6573(7)
_cell_length_c 5.6573(7)
_cell_angle_alpha 90.
_cell_angle_beta 90.
_cell_angle_gamma 90.
_cell_volume 181.06
_cell_formula_units_Z 4
_space_group_name_H-M_alt 'F m -3 m'
_space_group_IT_number 225
The next section contains all the expected unit cell parameters and the number of formula units in the unit cell and the space group. Number in parentheses are the standard deviations of the values. This unit cell is cubic so no standard deviations are present for the angles; that are exactly equal to 90 º by symmetry. To get the chemical composition the unit cell we can multiply _chemical_formula_sum
by _cell_formula_units_Z
.
loop_
_space_group_symop_id
_space_group_symop_operation_xyz
1 'z, y, -x'
2 'y, x, -z'
3 'x, z, -y'
4 'z, x, -y'
5 'y, z, -x'
...
188 'y+1/2, x+1/2, z'
189 'x+1/2, z+1/2, y'
190 'z+1/2, x+1/2, y'
191 'y+1/2, z+1/2, x'
192 'x+1/2, y+1/2, z'
The next section explicitly lists all of the symmetry operations of the space group. For low symmetry structures this list can be very short. For the space group \(P1\) the only operation that will appear in in this list is the identity ‘1 x, y, z’. The algebraic short hand used can be expanded to the full transformation \(W,w\). In the case above the first operation is a pure rotation:
loop_
_atom_type_symbol
_atom_type_oxidation_number
Na1+ 1
Cl1- -1
loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_symmetry_multiplicity
_atom_site_Wyckoff_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_U_iso_or_equiv
_atom_site_occupancy
Na1 Na1+ 4 a 0. 0. 0. 0.00735 1.
Cl1 Cl1- 4 b 0.5 0.5 0.5 0.00846 1.
#End of TTdata_165592-ICSD
Finally there is a loop that defines all the unique elements or ions in the crystal follow by a loop of their crystallographic positions in fractional coordinates. With crystal structures at atom’s position is defined in fractional coordinates of the unit cells basis vectors. For NaCl there is a sodium atom at the the origin (0, 0, 0) or \(\hat{X} = 0 \times \vec{a} + 0 \times \vec{b} + 0 \times \vec{c}\) and a chloride ion at the center of the cell (0.5, 05, 0.5) or \(\hat{X} = 0.5 \times \vec{a} + 0.5 \times \vec{b} + 0.5 \times \vec{c}\). The other data in this table include the site multiplicity for the atom (Na and Cl will be generated by symmetry at 4 unique locations in the unit cell), the thermal displacement parameter \(U_{iso}\), and the occupancy of the site (which in this case shows that 100% of both the Na and Cl positions are occupied by their respective atom). In more complex structures sometimes site may only be partially occupied (fractional vacancies) or may be occupied by multiple elements (as in solid solutions or substitutional alloys).
Online databases of cif files#
There are several locations for obtaining cif files. The standard repositories for high quality peer previewed data are the CSD and ICSD. The CSD specializes in organic and metal-organic compounds and the ICSD in inorganic compounds (no C-H bonds). However, there is some overlap and the CSD will has a few inorganic structures and the ICSD some metal-organic structures. These are subscription based and available through the university library.
The AMCSD and PDB are also high quality research databases of crystal structures specialize in data from mineralogical samples and biological compounds respectively.
The COD and the materials project are open access and contain a lot of high quality data but should be used viewed with caution as the data is not a rigorously curated as in the other databases. Importantly the Materials project contains only computed structures are not necessarily experimentally verified and when they are are missing the experimental conditions under which such a structure can be isolated or observed. Despites these short comings the Materials Project does highlight which phases they believe are consistent with experimental data and link to their entries in the ICSD. The Materials project largely focuses only on inorganic materials. The COD contains experimentally determined crystal structures but they are not rigorously reviewed or curated which may more likely lead to errors in the records. In my experience it is also much more difficult to search than the CSD or the ICSD. However, the COD is relatively new and rapidly improving.
Materials Project (computed structures with links to experimental data in ICSD)
Protein Data Bank (biological macromolecules)