6. Formation Energy Evaluation

Implementation

This stage is implemented in:

src/dopingflow/formation.py

The public entry point is:

run_formation(...)

Purpose

This stage computes the formation energy of relaxed doped structures using reference energies constructed in Step 00.

It combines:

the relaxed total energy of each doped candidate structure (\(E_{\mathrm{doped}}\))
the relaxed total energy of the pristine supercell (\(E_{\mathrm{pristine}}\))
elemental chemical potentials (\(\mu_i\)) for host and dopant species

Formation energies are written per composition folder to:

formation_energies.csv (summary table)
candidate_*/04_formation/meta.json (per-candidate provenance)

Inputs

This stage uses settings from the following sections of input.toml:

[structure]: provides the output directory containing structure folders.
[doping]: defines the substitution host species.
[scan]: provides the anion species list used to identify dopants.
[formation]: controls skipping and the normalization convention.

It also requires the reference-energy JSON from Step 00:

reference_structures/reference_energies.json

Formation Energy Framework

Substitutional doping model

The workflow assumes substitutional doping on a host sublattice. Dopants are identified as all species that are:

not equal to the host species ([doping].host_species), and
not in the anion list ([scan].anion_species)

The set of dopant counts \(n_i\) is extracted from each candidate POSCAR.

Formation energy definition

The formation energy is defined as:

\[E_{\mathrm{form}} = E_{\mathrm{doped}} - E_{\mathrm{pristine}} + \sum_i n_i \left( \mu_{\mathrm{host}} - \mu_i \right)\]

where:

\(E_{\mathrm{doped}}\) is the relaxed total energy of the doped supercell
\(E_{\mathrm{pristine}}\) is the relaxed total energy of the pristine supercell
\(\mu_{\mathrm{host}}\) is the host chemical potential (per atom)
\(\mu_i\) is the dopant chemical potential (per atom)
\(n_i\) is the number of dopant atoms of species \(i\) in the supercell

This corresponds to replacing \(n_i\) host atoms by \(n_i\) dopant atoms for each dopant species \(i\), while keeping the same supercell size.

Reference energies are taken from Step 00 and must be consistent with the supercell size and host species used here.

Method Summary

For each structure folder inside [structure].outdir:

Load the reference data from:
```
reference_structures/reference_energies.json
```
extracting \(E_{\mathrm{pristine}}\) and chemical potentials \(\mu_i\).
Determine which candidates to evaluate:
1. If selected_candidates.txt exists, only those candidates are used.
2. Otherwise, all candidate_*/02_relax/POSCAR files are used.
For each selected candidate:
1. Read the relaxed energy \(E_{\mathrm{doped}}\) from candidate_*/02_relax/meta.json.
2. Read species counts from candidate_*/02_relax/POSCAR and infer dopant counts under the substitutional model.
3. Evaluate \(E_{\mathrm{form}}\) using the equation above.
4. Apply the requested normalization (see below).
5. Write candidate_*/04_formation/meta.json.
Write formation_energies.csv in the folder, sorted by total formation energy.

Normalization Options

This stage supports three reporting modes controlled by:

[formation]
normalize = "total" | "per_dopant" | "per_host"

The internal formation energy is always computed as a total supercell energy (\(E_{\mathrm{form}}\) in eV). The reported value can be:

total: report \(E_{\mathrm{form}}\) in eV (no normalization)
per_dopant (default): report \(E_{\mathrm{form}} / N_{\mathrm{dop}}\), where \(N_{\mathrm{dop}} = \sum_i n_i\) is the total number of dopant atoms
per_host: report \(E_{\mathrm{form}} / N_{\mathrm{atoms}}\), where \(N_{\mathrm{atoms}}\) is the total number of atoms in the pristine supercell (as stored in the reference JSON)

Note: per_host currently uses the total number of atoms in the pristine supercell. If you later want normalization per host-sublattice site, that quantity can be stored explicitly in the reference JSON and used here.

Outputs

Per-folder summary

For each structure folder, this stage writes:

formation_energies.csv

Columns:

candidate: candidate directory name
E_doped_eV: relaxed total energy of the doped candidate
E_form_eV_total: total formation energy in eV
E_form_<normalize>: normalized formation energy (according to config)
n_dopant_atoms: total dopant atoms \(N_{\mathrm{dop}}\)
dopant_counts: compact dopant count string (e.g. Sb:2;Zr:1)

Rows are sorted by E_form_eV_total (ascending).

Per-candidate metadata

For each evaluated candidate, this stage writes:

candidate_XXX/04_formation/meta.json

This file includes:

full formation-energy definition string from the reference JSON
\(E_{\mathrm{doped}}\), \(E_{\mathrm{pristine}}\)
chemical potentials used for the involved species
inferred dopant counts
total formation energy and the reported normalized value

Reproducibility and Skipping

If:

[formation].skip_if_done = true

and formation_energies.csv already exists for a folder, that folder is skipped.

Given unchanged relaxed energies, POSCARs, reference JSON, and configuration, this stage is deterministic.

Notes and Limitations

This stage assumes substitutional doping and uses a simple species-based dopant identification rule (host vs anions vs dopants).
No charged-defect corrections, finite-size corrections, entropy terms, or competing-phase chemical potential bounds are included.
The absolute values depend on the reference energies and the chosen bulk phases used to define \(\mu_i\).