Structure Analyzer/Featurizer (SAF)
Structure Analyzer/Featurizer (SAF) is a Python package to generate geometric features of interatomic distances, atomic environment information, and coordination numbers from a folder containing CIF (Crystallographic Information File) files.
Citation
If you use SAF
in your scientific publication, please cite the following:
Digital Discovery. https://doi.org/10.1039/D4DD00332B
as well as the cifkit
package, which is the engine of SAF
for coordination environment analysis:
Journal of Open Source Software. https://doi.org/10.21105/joss.07205
Publications and scientific utility
Structure features include interatomic distances, information on atomic environments, and coordination numbers:
94 binary structural features
134 ternary structural features
182 quaternary structural features
SAF
was originally developed to determine the coordination number and geometry for each crystallographic site in complex structures [1]. Later, we included interactive functionality for experimentalists and data scientists to generate structural features. These features have been used as input data for ML models to predict crystal structures and their properties [2].
In the above Digital Discovery paper, we describe the performance of SAF
in combination with CAF
for generating compositional and structural numerical features for ML applications in crystal classification of binary compounds. The results are shown in Figures 1 and 2 below, we compare the performance of our developments (SAF and CAF) with existing feature generation methods such as JARVIS, MAGPIE, mat2vec, and OLED.

Note
Figure 1: PLS-DA latent value plot using the first two latent value dimensions: (a) JARVIS, (b) MAGPIE, (c) mat2vec, (d) OLED (all sets of features were generated with CBFV), and our developments – (e) CAF and (f) SAF.

Note
Figure 2: SAF + CAF PLS-DA plot.
See also
What’s the differecne between SAF
and CAF
? SAF
generates structural features based on crystal structures (CIF files), while CAF
generates compositional features based on chemical formulas, whileYou can learn more about SAF
in https://bobleesj.github.io/composition-analyzer-featurizer/.
Publications using SAF
Here is a list of publications using SAF
for materials analysis and data-driven materials synthesis:
Getting started
We have a command-line Python application. Please visit the Getting started page to learn how to generate features from a folder containing .cif
files.
Scope
The current version supports the processing of binary, ternary, and quaternary .cif
files containing the following elements:

Note
The Pauling CN 12 radii values for some gases (N
, O
, F
, Cl
, Br
, and I
) as well as Tc
and Sm
were interpolated using Gaussian Process Regression. The CIF radii for the aforementioned gases were compiled as averages of low-temperature structures from Persson’s CIF database.
How to ask for help
Do you have any feature requests? Please feel free to open an issue on GitHub using the
Bug Report or Feature Request
template.Do you have any questions about running the code? Please feel free to reach out to Sangjoon Bob Lee at bobleesj@gmail.com.
Do you want to learn how to publish scientific software?
SAF
is developed and maintained using the Level 5 package standards provided in scikit-package.
How you can contribute to SAF
Did you find SAF helpful? You can show support by starring the GitHub repository and recommending it to colleagues.
Did you find any bugs? Please feel free to report them by creating a new issue so that we can fix them as soon as possible.
See also
Do you want to learn how to use GitHub and develop Python packages to reuse your code? Please feel free to reach out to Sangjoon Bob Lee (bobleesj@gmail.edu). There are resources you can use to get started, such as scikit-package

Contributors
Anton Oliynyk - CUNY Hunter College
Arnab Dutta - IIT Kharagpur
Nikhil Kumar Barua - University of Waterloo
Nishant Yadav - IIT Kharagpur
Sangjoon Bob Lee - Columbia University
Siddha Sankalpa Sethi - IIT Kharagpur
Acknowledgements
scikit-package is used to accelerate maintaining and developing this Python package. cifkit is used to determine the coordination number and environment of each crystallographic site from each .cif
file.