Composition Analyzer/Featurizer (CAF)

PR Welcome GitHub issues PyPI Python Version
Software version 0.0.2
Last updated June 22, 2025.

Composition Analyzer/Featurizer (CAF) offers a user-interactive Python script that provides tools for generating compositional descriptors of binary, ternary, and quaternary compounds from Excel or .cif files. CAF also offers utility functions to filter, sort chemical formulas, and merge Excel files.

Citation

If you use CAF in your scientific publication, please cite the following:

as well as the Oliynyk elemental property dataset (OLED) used in CAF:

Publications and scientific utility

In the above Digital Discovery paper, we describe the performance of CAF in combination with SAF for generating compositional and structural numerical features for ML applications in crystal classification of binary compounds. The results are shown in Figures 1 and 2 below, we compare the performance of our developments (CAF and SAF) with existing feature generation methods such as JARVIS, MAGPIE, mat2vec, and OLED.

PLS-DA latent value plot using the first two latent value dimensions: (a) JARVIS, (b) MAGPIE, (c) mat2vec, (d) OLED (all sets of features were generated with CBFV), and our developments – (e) CAF and (f) SAF.

Note

Figure 1: PLS-DA latent value plot using the first two latent value dimensions: (a) JARVIS, (b) MAGPIE, (c) mat2vec, (d) OLED (all sets of features were generated with CBFV), and our developments – (e) CAF and (f) SAF.

PLS-DA latent value plot using the first two latent value dimensions: (a) JARVIS, (b) MAGPIE, (c) mat2vec, (d) OLED (all sets of features were generated with CBFV), and our developments – (e) CAF and (f) SAF.

Note

Figure 2: SAF + CAF PLS-DA plot.

See also

What’s the differecne between CAF and SAF? CAF generates compositional features based on chemical formulas, while SAF generates structural features based on crystal structures (CIF files). You can learn more about SAF in https://bobleesj.github.io/structure-analyzer-featurizer/.

How CAF works

For a given chemical formula, CAF determines the number of unique elements and categorizes them into binary, ternary, or quaternary compounds. It then generates a set of compositional features based on the chemical formula:

  • 133 binary features

  • 204 ternary features

  • 305 quaternary features

The full lists of CAF features are provided in the Features page.

See also

The CAF features are based on the Oliynyk elemental property dataset (OLED). OLED can be accessed through the bobleesj.utils Python package. You can also download the Excel file from GitHub. Click Download raw file button to download the Excel file.

Publications using CAF or Oliynyk elemental property

Here is a list of publications using CAF for materials analysis and data-driven materials synthesis:

Getting started

You can generate compositional features without writing any code. Please visit Getting started to learn how to generate features.

Scope

There are two constraints. First, formulas of either binary, ternary, or quaternary compounds are supported. Second, formulas containing the elements in blue below are supported:

Elements supported in CAF

5 Options provided in CAF App

The recommended way to generate features is using the CAF interactive application. Beyond generating features from a list of formulas listed in an Excel file under the “Formula” column, there are other utility options that can help you filter, sort, and merge Excel files which are used for generating features and handling data.

process diagram

How to ask for help

  • Do you have any feature requests? Please feel free to open an issue on GitHub using the Bug Report or Feature Request template.

  • Do you have any questions about running the code? Please feel free to reach out to Sangjoon Bob Lee at bobleesj@gmail.com.

  • Do you want to learn how to publish scientific software? CAF is developed and maintained using the Level 5 package standards provided in scikit-package.

How you can contribute to CAF

  • Did you find CAF helpful? You can show support by starring the GitHub repository and recommending it to colleagues.

  • Did you find any bugs? Please feel free to report it by creating a new issue so that we can fix it as soon as possible.-

See also

Do you want to learn how to use GitHub and develop Python package to reuse your code? Please feel free to reach out to Sasngjoon Bob Lee (bobleesj@gmail.com). There are resources you can use to get started such as scikit-package.

scikit-package logo

Authors

Acknowledgements

CAF is built and maintained with scikit-package.