1. Structure Generation

Implementation

This stage is implemented in:

src/dopingflow/generate.py

The public entry point is:

run_generate(...)

Purpose

This stage generates an initial set of doped structures starting from a pristine unit cell. The output is a directory of subfolders, each containing a VASP POSCAR and a small metadata file describing the effective composition and the random seed used for site selection.

The generation step supports two workflows:

  • Explicit compositions: the user provides exact dopant percentages.

  • Enumerated compositions: the workflow constructs a systematic set of dopant combinations and doping levels under user-defined constraints.

Inputs

This stage uses settings from three sections of input.toml:

  • [structure]: provides the pristine structure file and the supercell size.

  • [doping]: defines the doping mode and composition rules.

  • [generate]: controls structure-writing details and reproducible randomness.

Method Summary

  1. Read the pristine structure from [structure].base_poscar.

  2. Build the supercell using [structure].supercell.

  3. Identify all sites matching the substitution host species ([doping].host_species).

  4. For each requested composition:

    1. Convert requested dopant percentages to integer substitution counts by rounding.

    2. Randomly choose host sites to substitute, using a deterministic seed.

    3. Optionally reorder species in the written POSCAR.

    4. Write POSCAR and metadata.json to an output subdirectory.

The output directory is [structure].outdir (default: random_structures).

Composition Handling

Requested vs effective composition

Doping levels are provided as percentages relative to the number of host sites in the generated supercell. Since the number of sites is discrete, requested percentages are converted to integer substitution counts by rounding.

The workflow therefore distinguishes:

  • requested composition: the percentages from the input

  • effective composition: the percentages implied by the rounded integer counts

If rounding changes any dopant level, warnings are reported and the effective composition is stored in the metadata.

A basic consistency check is applied:

  • the total number of substituted atoms may not exceed the number of host sites

  • the total requested dopant percentage may not exceed 100%

Explicit mode

In explicit mode, each composition is provided directly by the user as a mapping element -> percent. Each composition produces exactly one structure.

This mode is recommended when:

  • specific compositions are already known or desired

  • only a small number of target compositions are needed

Enumerate mode

In enumerate mode, the workflow generates composition dictionaries automatically from:

  • a list of possible dopant species

  • an optional list of dopants that must appear

  • a set of allowed total dopant levels

  • a set of discrete per-dopant levels

The workflow enumerates combinations of distinct dopants of size:

\[k \in \{1, \dots, k_{\max}\}\]

where \(k_{\max}\) is controlled by the input parameter [doping].max_dopants_total.

Interpretation:

  • you may provide many candidate dopant elements in the input

  • but each generated structure contains at most max_dopants_total distinct dopant species when using enumerate mode

(Explicit mode is not inherently limited unless the user adopts the same constraint.)

Reproducible Random Substitution

The actual substitutional sites are selected randomly, but deterministically:

  • a composition tag is constructed from the effective composition

  • a stable seed is derived from the tag and a base seed ([generate].seed_base)

This ensures that rerunning the workflow with unchanged input produces identical structures, while still providing randomized site selection.

Directory Naming and Collision Handling

Each generated structure is written to:

<outdir>/<composition_tag>/POSCAR
<outdir>/<composition_tag>/metadata.json

The folder tag is constructed from the effective composition. If two different requested compositions round to the same effective composition, a suffix is added:

<tag>__v2, <tag>__v3, ...

This guarantees that all generated structures are preserved.

POSCAR Species Ordering

To control the species order in the written POSCAR, you may provide:

[generate]
poscar_order = [...]

If this list is empty, the structure is written using pymatgen’s default ordering. If non-empty, sites are reordered to match the given preference (and any remaining species are appended).

Outputs

For each generated structure:

  • POSCAR: doped supercell structure

  • metadata.json: provenance information, including:

    • host species and number of host sites

    • requested and effective compositions

    • rounded substitution counts

    • seed used for deterministic substitution

    • composition tag and input file name

Notes and Limitations

  • This stage performs structure generation only and does not evaluate stability.

  • Rounding is unavoidable for small supercells; larger supercells reduce rounding error.

  • Enumerated composition counts can grow combinatorially with the number of dopants and levels; users should choose constraints accordingly.