6. Formation Energy Evaluation
Implementation
This stage is implemented in:
src/dopingflow/formation.py
The public entry point is:
run_formation(...)
Purpose
This stage computes the formation energy of relaxed doped structures using reference energies constructed in Step 00.
It combines:
the relaxed total energy of each doped candidate structure (\(E_{\mathrm{doped}}\))
the relaxed total energy of the pristine supercell (\(E_{\mathrm{pristine}}\))
elemental chemical potentials (\(\mu_i\)) for host and dopant species
Formation energies are written per composition folder to:
formation_energies.csv(summary table)candidate_*/04_formation/meta.json(per-candidate provenance)
Inputs
This stage uses settings from the following sections of input.toml:
[structure]: provides the output directory containing structure folders.[doping]: defines the substitution host species.[scan]: provides the anion species list used to identify dopants.[formation]: controls skipping and the normalization convention.
It also requires the reference-energy JSON from Step 00:
reference_structures/reference_energies.json
Formation Energy Framework
Substitutional doping model
The workflow assumes substitutional doping on a host sublattice. Dopants are identified as all species that are:
not equal to the host species (
[doping].host_species), andnot in the anion list (
[scan].anion_species)
The set of dopant counts \(n_i\) is extracted from each candidate POSCAR.
Formation energy definition
The formation energy is defined as:
where:
\(E_{\mathrm{doped}}\) is the relaxed total energy of the doped supercell
\(E_{\mathrm{pristine}}\) is the relaxed total energy of the pristine supercell
\(\mu_{\mathrm{host}}\) is the host chemical potential (per atom)
\(\mu_i\) is the dopant chemical potential (per atom)
\(n_i\) is the number of dopant atoms of species \(i\) in the supercell
This corresponds to replacing \(n_i\) host atoms by \(n_i\) dopant atoms for each dopant species \(i\), while keeping the same supercell size.
Reference energies are taken from Step 00 and must be consistent with the supercell size and host species used here.
Method Summary
For each structure folder inside [structure].outdir:
Load the reference data from:
reference_structures/reference_energies.json
extracting \(E_{\mathrm{pristine}}\) and chemical potentials \(\mu_i\).
Determine which candidates to evaluate:
If
selected_candidates.txtexists, only those candidates are used.Otherwise, all
candidate_*/02_relax/POSCARfiles are used.
For each selected candidate:
Read the relaxed energy \(E_{\mathrm{doped}}\) from
candidate_*/02_relax/meta.json.Read species counts from
candidate_*/02_relax/POSCARand infer dopant counts under the substitutional model.Evaluate \(E_{\mathrm{form}}\) using the equation above.
Apply the requested normalization (see below).
Write
candidate_*/04_formation/meta.json.
Write
formation_energies.csvin the folder, sorted by total formation energy.
Normalization Options
This stage supports three reporting modes controlled by:
[formation]
normalize = "total" | "per_dopant" | "per_host"
The internal formation energy is always computed as a total supercell energy (\(E_{\mathrm{form}}\) in eV). The reported value can be:
total: report \(E_{\mathrm{form}}\) in eV (no normalization)per_dopant(default): report \(E_{\mathrm{form}} / N_{\mathrm{dop}}\), where \(N_{\mathrm{dop}} = \sum_i n_i\) is the total number of dopant atomsper_host: report \(E_{\mathrm{form}} / N_{\mathrm{atoms}}\), where \(N_{\mathrm{atoms}}\) is the total number of atoms in the pristine supercell (as stored in the reference JSON)
Note:
per_host currently uses the total number of atoms in the pristine supercell.
If you later want normalization per host-sublattice site, that quantity can be
stored explicitly in the reference JSON and used here.
Outputs
Per-folder summary
For each structure folder, this stage writes:
formation_energies.csv
Columns:
candidate: candidate directory nameE_doped_eV: relaxed total energy of the doped candidateE_form_eV_total: total formation energy in eVE_form_<normalize>: normalized formation energy (according to config)n_dopant_atoms: total dopant atoms \(N_{\mathrm{dop}}\)dopant_counts: compact dopant count string (e.g.Sb:2;Zr:1)
Rows are sorted by E_form_eV_total (ascending).
Per-candidate metadata
For each evaluated candidate, this stage writes:
candidate_XXX/04_formation/meta.json
This file includes:
full formation-energy definition string from the reference JSON
\(E_{\mathrm{doped}}\), \(E_{\mathrm{pristine}}\)
chemical potentials used for the involved species
inferred dopant counts
total formation energy and the reported normalized value
Reproducibility and Skipping
If:
[formation].skip_if_done = true
and formation_energies.csv already exists for a folder, that folder is
skipped.
Given unchanged relaxed energies, POSCARs, reference JSON, and configuration, this stage is deterministic.
Notes and Limitations
This stage assumes substitutional doping and uses a simple species-based dopant identification rule (host vs anions vs dopants).
No charged-defect corrections, finite-size corrections, entropy terms, or competing-phase chemical potential bounds are included.
The absolute values depend on the reference energies and the chosen bulk phases used to define \(\mu_i\).