1. Structure Generation
Implementation
This stage is implemented in:
src/dopingflow/generate.py
The public entry point is:
run_generate(...)
Purpose
This stage generates an initial set of doped structures starting from a pristine
unit cell. The output is a directory of subfolders, each containing a VASP
POSCAR and a small metadata file describing the effective composition and
the random seed used for site selection.
The generation step supports two workflows:
Explicit compositions: the user provides exact dopant percentages.
Enumerated compositions: the workflow constructs a systematic set of dopant combinations and doping levels under user-defined constraints.
Inputs
This stage uses settings from three sections of input.toml:
[structure]: provides the pristine structure file and the supercell size.[doping]: defines the doping mode and composition rules.[generate]: controls structure-writing details and reproducible randomness.
Method Summary
Read the pristine structure from
[structure].base_poscar.Build the supercell using
[structure].supercell.Identify all sites matching the substitution host species (
[doping].host_species).For each requested composition:
Convert requested dopant percentages to integer substitution counts by rounding.
Randomly choose host sites to substitute, using a deterministic seed.
Optionally reorder species in the written POSCAR.
Write
POSCARandmetadata.jsonto an output subdirectory.
The output directory is [structure].outdir (default: random_structures).
Composition Handling
Requested vs effective composition
Doping levels are provided as percentages relative to the number of host sites in the generated supercell. Since the number of sites is discrete, requested percentages are converted to integer substitution counts by rounding.
The workflow therefore distinguishes:
requested composition: the percentages from the input
effective composition: the percentages implied by the rounded integer counts
If rounding changes any dopant level, warnings are reported and the effective composition is stored in the metadata.
A basic consistency check is applied:
the total number of substituted atoms may not exceed the number of host sites
the total requested dopant percentage may not exceed 100%
Explicit mode
In explicit mode, each composition is provided directly by the user as a mapping
element -> percent. Each composition produces exactly one structure.
This mode is recommended when:
specific compositions are already known or desired
only a small number of target compositions are needed
Enumerate mode
In enumerate mode, the workflow generates composition dictionaries automatically from:
a list of possible dopant species
an optional list of dopants that must appear
a set of allowed total dopant levels
a set of discrete per-dopant levels
The workflow enumerates combinations of distinct dopants of size:
where \(k_{\max}\) is controlled by the input parameter
[doping].max_dopants_total.
Interpretation:
you may provide many candidate dopant elements in the input
but each generated structure contains at most
max_dopants_totaldistinct dopant species when using enumerate mode
(Explicit mode is not inherently limited unless the user adopts the same constraint.)
Reproducible Random Substitution
The actual substitutional sites are selected randomly, but deterministically:
a composition tag is constructed from the effective composition
a stable seed is derived from the tag and a base seed (
[generate].seed_base)
This ensures that rerunning the workflow with unchanged input produces identical structures, while still providing randomized site selection.
Directory Naming and Collision Handling
Each generated structure is written to:
<outdir>/<composition_tag>/POSCAR
<outdir>/<composition_tag>/metadata.json
The folder tag is constructed from the effective composition. If two different requested compositions round to the same effective composition, a suffix is added:
<tag>__v2, <tag>__v3, ...
This guarantees that all generated structures are preserved.
POSCAR Species Ordering
To control the species order in the written POSCAR, you may provide:
[generate]
poscar_order = [...]
If this list is empty, the structure is written using pymatgen’s default ordering. If non-empty, sites are reordered to match the given preference (and any remaining species are appended).
Outputs
For each generated structure:
POSCAR: doped supercell structuremetadata.json: provenance information, including:host species and number of host sites
requested and effective compositions
rounded substitution counts
seed used for deterministic substitution
composition tag and input file name
Notes and Limitations
This stage performs structure generation only and does not evaluate stability.
Rounding is unavoidable for small supercells; larger supercells reduce rounding error.
Enumerated composition counts can grow combinatorially with the number of dopants and levels; users should choose constraints accordingly.