How to use CAF App
The CAF App (GitHub) offers a total of 5 options to help you analyze and manipulate chemical formulas. These options are designed to prepare a dataset for ML applications.
Note
The installation guide is provided in Method 1. Using CAF Application.
Option 1. Filter - analyze chemical formulas in Excel or a folder of .cif
files. Counts unique elements, detects errors, generates a periodic table heatmap, and provides filtering methods.

Option 2. Sort - rearrange chemical formulas in Excel by:
Label (pre-configured for binary/ternary compounds, editable in
data/sort/custom-labels.xlsx
)Index (stoichiometric ratio, then Mendeleev number)
Property (27 elemental properties from the Oliynyk database)
Available columns for sorting:
1. Atomic weight
2. Atomic number
3. Period
4. Group
5. Mendeleev number
6. valence e total
7. unpaired electrons
8. Gilman no. of valence electrons
9. Zeff
10. Ionization energy (eV)
11. CN
12. ratio n closest/CN
13. polyhedron distortion (dmin/dn)
14. CIF radius element
15. Pauling, R(CN12)
16. Pauling EN
17. Martynov Batsanov EN
18. Melting point, K
19. Density, g/mL
20. Specific heat, J/g K
21. Cohesive energy
22. Bulk modulus, GPa
23. Abundance in Earth's crust
24. Abundance in solar system (log)
25. HHI production
26. HHI reserve
27. cost, pure ($/100g)
Option 3. Features - generate compositional features for formulas in Excel, including a composition-normalized vector using hot encoding. The database is based on the Oliynyk (OLED) data (DOI).
133 binary features (
features/binary.py
)204 ternary features (
features/ternary.py
)305 quaternary features (
features/quaternary.py
)Universal set of 112 sorted and 156 unsorted features (
feature/universal.py
)(Optional) Extended features (thousands of columns possible)
Example of feature_binary.xlsx
:
Formula |
index_A |
index_B |
normalized_index_A |
normalized_index_B |
largest_index |
smallest_index |
avg_index |
atomic_weight_weighted_A+B |
---|---|---|---|---|---|---|---|---|
NdSi2 |
1 |
2 |
0.333 |
0.667 |
2 |
1 |
1.5 |
144.242 |
Th2Os |
2 |
1 |
0.667 |
0.333 |
2 |
1 |
1.5 |
464.076 |
Sn5Co2 |
5 |
2 |
0.714 |
0.286 |
5 |
2 |
3.5 |
593.55 |
Option 4. Match - match .cif
files in a folder against an Excel file by the “Entry” column.
Option 5. Merge - combine two Excel/CSV files based on the “Entry” column.
This is useful when you want to enrich composition-based features from the CAF App output with structure-based features from the SAF App.