Skip to content

API reference

This section contains the automatic API reference for Cif and CifEnsemble modules in the cifkit package.



CN_best_methods property

Determines the optimal coordination method for each atomic site.

For each atomic site, the coordination polyhedron is generated for each method in self.CN_max_gap_per_site. The method with the smallest value of polyhedron_metrics["distance_from_avg_point_to_center"], indicating the highest symmetry of the polyhedron, is selected as the "best method" among the four methods used to determine the CN gap in self.CN_max_gap_per_site.


Type Description
dict[str, dict[str, float | int | str]]]

Dictionary where each key represents an atomic site, and the corresponding value is a dictionary containing:

  • volume_of_polyhedron (float): The volume of the polyhedron surrounding the atomic site.
  • distance_from_avg_point_to_center (float): The average distance from the polyhedron's vertices to its geometric center, used as a measure of symmetry.
  • number_of_vertices (int): The number of vertices in the coordination polyhedron.
  • number_of_edges (int): The number of edges connecting vertices in the polyhedron.
  • number_of_faces (int): The number of faces in the coordination polyhedron.
  • shortest_distance_to_face (float): The shortest distance between the atomic site and the nearest face.
  • shortest_distance_to_edge (float): The shortest distance between the atomic site and the nearest edge.
  • volume_of_inscribed_sphere (float): Volume of the largest sphere that can it inside the polyhedron.
  • packing_efficiency (float): A measure of how efficiently the polyhedron is packed around the atomic site.
  • method_used (str): The name of the chosen method (e.g., dist_by_shortest_dist) providing the highest symmetry based on distance_from_avg_point_to_center.


>>> CN_best_methods = cif_URhIn.CN_best_methods
>>> CN_best_methods["In1"]["number_of_vertices"] == 14
>>> CN_best_methods["Rh2"]["number_of_vertices"] == 9
>>> CN_best_methods["In1"]["method_used"] == "dist_by_shortest_dist"
>>> CN_best_methods["Rh2"]["method_used"] == "dist_by_shortest_dist"

CN_max_gap_per_site property

Determines the maximum gap in coordination number (CN) for each atomic site.

For each atomic site, considers the first 20 nearest neighbors. The distances to these neighbors are normalized based on four methods:

  • dist_by_shortest_dist: Normalization by the shortest distance from the site.
  • dist_by_CIF_radius_sum: Normalization by the sum of CIF radii.
  • dist_by_CIF_radius_refined_sum: Normalization by the sum of refined CIF radii.
  • dist_by_Pauling_radius_sum: Normalization by the sum of Pauling radii.

The radius sums are calculated for each element pair involved. For each normalization method, the maximum gap is determined as the largest difference between consecutive normalized distances (i.e., the difference between the nth and (n-1)th neighbors).

This CN gap provides insight into the bonding relevance for each site.


Type Description
dict of dict of dict

A dictionary where each key represents an atomic site, mapping to another dictionary with normalization methods as keys. Each normalization method contains a dictionary with:

  • max_gap (float): The maximum gap in the normalized distances.
  • CN (int): Coordination number based on the normalization method.


>>> cif.CN_max_gap_per_site
    "In1": {
        "dist_by_shortest_dist": {"max_gap": 0.306, "CN": 14},
        "dist_by_CIF_radius_sum": {"max_gap": 0.39, "CN": 14},
        "dist_by_CIF_radius_refined_sum": {"max_gap": 0.341, "CN": 12},
        "dist_by_Pauling_radius_sum": {"max_gap": 0.398, "CN": 14},
    "U1": {
        "dist_by_shortest_dist": {"max_gap": 0.197, "CN": 11},
        "dist_by_CIF_radius_sum": {"max_gap": 0.312, "CN": 11},
        "dist_by_CIF_radius_refined_sum": {"max_gap": 0.27, "CN": 17},
        "dist_by_Pauling_radius_sum": {"max_gap": 0.256, "CN": 17},
    "Rh1": {
        "dist_by_shortest_dist": {"max_gap": 0.315, "CN": 9},
        "dist_by_CIF_radius_sum": {"max_gap": 0.347, "CN": 9},
        "dist_by_CIF_radius_refined_sum": {"max_gap": 0.418, "CN": 9},
        "dist_by_Pauling_radius_sum": {"max_gap": 0.402, "CN": 9},
    "Rh2": {
        "dist_by_shortest_dist": {"max_gap": 0.31, "CN": 9},
        "dist_by_CIF_radius_sum": {"max_gap": 0.324, "CN": 9},
        "dist_by_CIF_radius_refined_sum": {"max_gap": 0.397, "CN": 9},
        "dist_by_Pauling_radius_sum": {"max_gap": 0.380, "CN": 9},

connections_flattened property

Transform site connections into a sorted list of tuples, each containing a pair of alphabetically sorted element symbols and the distance between them.


Type Description
list[tuple[tuple[str, str], float]]

A sorted list of tuples, each containing a pair of alphabetically sorted element symbols and the distance between them.


>>> cif = Cif("path/to/cif/file.cif"))
>>> cif.connections_flattened
[(("In", "Rh"), 2.697), (("In", "Rh"), 2.697)]

radius_sum property

Retrieve the sum of CIF radius, CIF_refined radius, and Pauling C12 radius for the shortest bonding pairs of elements.


Type Description

Dictionary where each key is a radius type and the value is a dictionary with the key being a bond pair of elements and the value being the total radius in Angstroms.


>>> cif.radius_values
>>>  {
    "CIF_radius_sum": {
        "In-In": 3.248,
        "In-Rh": 2.969,
        "In-U": 3.001,
        "Rh-Rh": 2.69,
        "Rh-U": 2.722,
        "U-U": 2.754,
    "CIF_radius_refined_sum": {
        "In-In": 2.657,
        "In-Rh": 2.697,
        "In-U": 2.943,
        "Rh-Rh": 2.737,
        "Rh-U": 2.983,
        "U-U": 3.229,
    "Pauling_radius_sum": {
        "In-In": 3.32,
        "In-Rh": 3.002,
        "In-U": 3.176,
        "Rh-Rh": 2.684,
        "Rh-U": 2.858,
        "U-U": 3.032,

radius_values property

Retrieve CIF radius, CIF_refined radius, and Pauling C12 radius for each element.

This property uses lazy loading to compute or retrieve radius values only when needed, optimizing performance. The CIF radius and Pauling C12 radius are standard values sourced from data/ for each element. In contrast, the CIF_refined radius is calculated based on bonding distances to ensure accuracy across different environments.

  • CIF_radius: The standard radius value commonly determined from elemental .cif files, the approximate size of an atom within a crystal structure.
  • CIF_radius_refined: An optimized radius calculated to ensure that, across all bonding pairs, the sum of the two radii in a bonded pair attempts to matches the shortest unique observed bond distances. This refinement is designed to improve packing efficiency within a coordination polyhedron.
  • Pauling_radius_CN12: The Pauling radius of the element, calculated with a coordination number (CN) of 12, providing a basis for comparison with other radius types.


Type Description
dict[str, dict[str, float]]

A dictionary where each key is an atomic label (e.g., "In", "Rh", "U"), and the corresponding value is a dictionary with radius information in Angstroms:

  • CIF_radius (float): The standard CIF radius.
  • CIF_radius_refined (float): The optimized radius based on CIF radius.
  • Pauling_radius_CN12 (float): The Pauling radius with a coordination number of 12, parsed from literature.


>>> cif.radius_values
    "In": {
        "CIF_radius": 1.624,
        "CIF_radius_refined": 1.328,
        "Pauling_radius_CN12": 1.66,
    "Rh": {
        "CIF_radius": 1.345,
        "CIF_radius_refined": 1.369,
        "Pauling_radius_CN12": 1.342,
    "U": {
        "CIF_radius": 1.377,
        "CIF_radius_refined": 1.614,
        "Pauling_radius_CN12": 1.516,

shortest_bond_pair_distance property

Determine the minimum distance for all possible unique pair of elements. This property uses lazily loaded connections to compute the distance if they are not already available.


Type Description
dict[tuple[str, str], float]

Dictionary where each key is a tuple of element symbols and the float value is the distance between pair of elements in Angstroms.


>>> cif.shortest_bond_pair_distance
    ("In", "In"): 3.244,
    ("In", "Rh"): 2.697,
    ("In", "U"): 3.21,
    ("Rh", "Rh"): 3.881,
    ("Rh", "U"): 2.983,
    ("U", "U"): 3.881,

shortest_distance property

Lazily retrieve the shortest atomic distance within the crystal structure. This property is lazily loaded and ensures all necessary connections are computed beforehand using the @ensure_connections decorator. The computation calculates the minimum distance between any pairs of atoms based on the connection data.


Type Description

The shortest distance between any two connected atoms in the crystal structure, in Angstroms.

shortest_site_pair_distance property

Retrieves the shortest distance from each unique atomic site in the crystal structure. This property uses lazily loaded connections to compute these distances if they are not already available.


Type Description
dict[str, tuple[str, float]]

dictionary where each key is an atomic label and the value is a tuple containing the label of the closest atomic site and the shortest distance to it in Angstroms


>>> cif.shortest_site_pair_distance
    "In1": ("Rh2", 2.697),
    "Rh1": ("In1", 2.852),
    "Rh2": ("In1", 2.697),
    "U1": ("Rh1", 2.984),

__init__(file_path, is_formatted=False, logging_enabled=False)

Initializes an object from a .cif file.


Name Type Description Default
file_path str

Path to the .cif file.

is_formatted bool

If False, preprocess the .cif file to ensure compatibility with the gemmi library. Default is False.

logging_enabled bool

Enables detailed logging during initialization and for distance calculations. Default is False.



Name Type Description
file_path str

Path to the CIF file from which data is loaded.

logging_enabled bool

Enables detailed logging for initialization and distance alculations if set to True.

file_name str

Base name of the CIF file, extracted from file_path.

file_name_without_ext str

File name without its extension, useful for referencing or generating derivative files.

db_source str

Source database (e.g., ICSD, MP, CCDC, PCD) from which the CIF file originates, determined at runtime.

unitcell_lengths list[float]

List of unit cell lengths for the crystal structure, typically in Angstroms.

unitcell_angles list[float]

List of unit cell angles in radians, ordered by alpha, beta, gamma.

site_labels list[str]

Lists all unique atomic site labels.

unique_elements set[str]

Set of unique chemical elements present in the CIF file.

atom_site_info dict[str, any]

Dictionary containing detailed information about each atomic site including element, site occupancy, fractional coordinates, symmetry, and multiplicity.

composition_type int

Number of unique elements present in the .cif file, e.g., 1 for unary, 2 for binary, etc.

tag str

Additional tag associated with the CIF data, parsed from the third line of PCD .cif files.

bond_pairs set[tuple[str, str]]

Set of tuples representing bonded pairs of elements.

site_label_pairs set[tuple[str, str]]

Set of tuples representing pairs of atomic site labels.

bond_pairs_sorted_by_mendeleev set[tuple[str, str]]

Set of bonded pairs sorted according to Mendeleev Numbers.

site_label_pairs_sorted_by_mendeleev set[tuple[str, str]]

Set of site label pairs sorted by Mendeleev Numbers.

site_mixing_type str

Descriptor of the mixing type, categorized into four types: Full occupancy is assigned when a single atomic site occupies the fractional coordinate with an occupancy value of 1. Full occupancy with mixing is assigned when multiple atomic sites collectively occupy the fractional coordinate to a sum of 1. Deficiency without mixing is assigned when a single atomic site occupying the fractional coordinate with a sum less than 1. Deficiency with atomic mixing is assigned when multiple atomic sites occupy the fractional coordinate with a sum less than 1.

is_radius_data_available bool

Indicates whether Pauling and CIF atomic radii are available for all elements in the .cif file.

mixing_info_per_label_pair dict

Dictionary mapping pairs of labels to their mixing information.

mixing_info_per_label_pair_sorted_by_mendeleev dict

Same as mixing_info_per_label_pair, but sorted according to Mendeleev numbers.

unitcell_points list[list[tuple[float, float, float, str]]]

List of points defining the unit cell; each point contains fractional coordinates and a site label.

supercell_points list[list[tuple[float, float, float, str]]]

List of points defining the supercell of the cell For each .cif file, a unit cell is generated by applying the symmetry operations. A supercell is generated by applying ±1 shifts from the unit cell.

unitcell_atom_count int

Total count of atoms within the unit cell.

supercell_atom_count int

Total count of atoms within the generated supercell incorporating ±1, ±1, ±1 translations.

connections None or dict

Initially None, intended to store connection data related to the crystal structure. Connections are computed lazily and are only calculated when first needed by a method or property requiring them.


Compute onnection network, shortest distances, bond counts, and coordination numbers (CN). These prperties are lazily loaded to avoid unnecessary computation during the initialization and pre-processing step.


Name Type Description Default
cutoff_radius float

The distance threshold in Angstroms used to consider two atoms as connected, by default 10.0


plot_polyhedron(site_label, show_labels=True, is_displayed=False, output_dir=None)

Function to plot a polyhedron structure and optionally saves it.


Name Type Description Default
site_label str

Central site label for the polyhedron

show_labels bool

Whether to display vertex labels, by default True

is_displayed bool

Display plot interactively, by default False

output_dir str

Directory to save the plot, by default None



CN_unique_values_by_best_methods: set[str] property


Type Description

Unique coordination number by best methods from all .cif files.

CN_unique_values_by_min_dist_method: set[str] property


Type Description

Unique coordination number values by minimum distance method from all .cif files.

unique_composition_types: set[int] property

Get unique composition types from all .cif files in the folder.


>>> cif_ensemble.unique_composition_types
{1, 3}

unique_elements: set[str] property

Get unique elements from all .cif files in the folder.


>>> cif_ensemble.unique_elements_stats
    "Ce": 1,
    "Eu": 1,
    "Ge": 3,
    "Ir": 1,
    "La": 1,
    "Mo": 3,
    "Ru": 2,

unique_formulas: set[str] property

Get unique formulas from all .cif files in the folder.


Type Description

unique formulas


>>> cif_ensemble.unique_formulas
{"EuIr2Ge2", "CeRu2Ge2", "LaRu2Ge2", "Mo"}

unique_site_mixing_types: set[int] property

Get unique site mixing types from all .cif files in the folder.


>>> cif_ensemble.unique_site_mixing_types
{"deficiency_without_atomic_mixing", "full_occupancy"}

unique_space_group_names: set[str] property

Get unique space groups from all .cif files in the folder.


>>> cif_ensemble.unique_space_group_names
{"I4/mmm", "Im-3m"}

unique_space_group_numbers: set[str] property

Get unique space groups from all .cif files in the folder.


>>> cif_ensemble.unique_space_group_numbers
{139, 229}

unique_structures: set[str] property

Get unique structures from all .cif files in the folder.


>>> cif_ensemble.unique_structures
{"CeAl2Ga2", "W"}

unique_tags: set[str] property

Get unique formulas from all .cif files in the folder.


>>> cif_ensemble.unique_tags
{"hex", "rt", "rt_hex", ""}

__init__(cif_dir_path, add_nested_files=False, preprocess=True, logging_enabled=False)

Initialize a CifEnsemble object, containing a collection of Cif objects.


Name Type Description Default
cif_dir_path str

Path to the folder path containing .cif file(s).

add_nested_files bool

Option to include .cif files contained in sub-directories within cif_dir_path , by default False

preprocess bool

Option to edit .cif files before initializing each into a Cif object, by default True. Preprocess modifies atomic site labels in atom_site_label. Some site labels may contain a comma or a symbol like M due to atomic mixing. It reformats each atom_site_label so it can be parsed into an element type matching atom_site_type_symbol. For PCD databases, addresses in publ_author_address often have an incorrect format requiring manual modifications. It also relocates any ill-formatted files, such as those with duplicate labels in atom_site_label, missing fractional coordinates, or files requiring supercell generation.

logging_enabled bool

Option to log while pre-processing Cif objects, by default False



Name Type Description
dir_path str

Path to the folder containing .cif files

file_paths list[str]

List of file paths to .cif files

cifs list[Cif]

List of Cif objects

file_count int

Number of .cif files in the folder

logging_enabled bool

Option to log while pre-processing Cif objects

copy_cif_files(file_paths, to_directory_path)

Copy a set of CIF files to a destination directory.


Name Type Description Default
file_paths set[str]

Set of file paths to CIF files.

to_directory_path str

Destination directory path.



>>> file_paths = {
>>> dest_dir_path = "tests/data/cif/ensemble_new_dir"
>>> cif_ensemble_test.copy_cif_files(file_paths, dest_dir_path)

generate_CN_by_best_methods_histogram(display=False, output_dir=None)

Generate a histogram of the 'CN_by_best_methods' property from CIF files.

This method creates a histogram based on the 'CN_by_best_methods' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_CN_by_min_dist_method_histogram(display=False, output_dir=None)

Generate a histogram of the 'CN_by_min' property from CIF files.

This method creates a histogram based on the 'CN_by_min' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_composition_type_histogram(display=False, output_dir=None)

Generate a histogram of the 'composition_type' property from CIF files.

This method creates a histogram based on the 'composition_type' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_elements_histogram(display=False, output_dir=None)

Generate a histogram of the 'unique_elements' property from CIF files.

This method creates a histogram based on the 'unique_elements' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_formula_histogram(display=False, output_dir=None)

Generate a histogram of the 'formula' property from CIF files.

This method creates a histogram based on the 'formula' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_site_mixing_type_histogram(display=False, output_dir=None)

Generate a histogram of the 'site_mixing_type' property from CIF files.

This method creates a histogram based on the 'site_mixing_type' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_space_group_name_histogram(display=False, output_dir=None)

Generate a histogram of the 'space_group_name' property from CIF files.

This method creates a histogram based on the 'space_group_name' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_space_group_number_histogram(display=False, output_dir=None)

Generate a histogram of the 'space_group_number' property from CIF files.

This method creates a histogram based on the 'space_group_number' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_structure_histogram(display=False, output_dir=None)

Generate a histogram of the 'structure' property from CIF files.

This method creates a histogram based on the 'structure' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_supercell_size_histogram(display=False, output_dir=None)

Generate a histogram of the 'supercell_count' property from CIF files.

This method creates a histogram based on the 'supercell_count' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


generate_tag_histogram(display=False, output_dir=None)

Generate a histogram of the 'tag' property from CIF files.

This method creates a histogram based on the 'tag' statistics of the CIF files. If 'output_dir' is specified, the histogram image (.png) will be saved to that directory. If 'output_dir' is not specified, the image will be saved to the directory specified by 'self.dir_path'.


Name Type Description Default
display bool

If True, the plot is displayed using Default is False.

output_dir str

The directory path where the histogram should be saved. If None, the histogram is saved in the directory defined by 'self.dir_path'.


move_cif_files(file_paths, to_directory_path)

Move a set of CIF files to a destination directory.


Name Type Description Default
file_paths set[str]

Set of file paths to CIF files.

to_directory_path str

Destination directory path.



>>> file_paths = {
>>> dest_dir_path = "tests/data/cif/ensemble_new_dir"
>>> cif_ensemble_test.move_cif_files(file_paths, dest_dir_path)