1. Structure Generation ======================== Implementation -------------- This stage is implemented in: :: src/dopingflow/generate.py The public entry point is: :: run_generate(...) Purpose ------- This stage generates an initial set of doped structures starting from a pristine unit cell. The output is a directory of subfolders, each containing a VASP ``POSCAR`` and a small metadata file describing the effective composition and the random seed used for site selection. The generation step supports two workflows: - **Explicit compositions**: the user provides exact dopant percentages. - **Enumerated compositions**: the workflow constructs a systematic set of dopant combinations and doping levels under user-defined constraints. Inputs ------ This stage uses settings from three sections of ``input.toml``: - ``[structure]``: provides the pristine structure file and the supercell size. - ``[doping]``: defines the doping mode and composition rules. - ``[generate]``: controls structure-writing details and reproducible randomness. Method Summary -------------- 1. Read the pristine structure from ``[structure].base_poscar``. 2. Build the supercell using ``[structure].supercell``. 3. Identify all sites matching the substitution host species (``[doping].host_species``). 4. For each requested composition: a. Convert requested dopant percentages to integer substitution counts by rounding. b. Randomly choose host sites to substitute, using a deterministic seed. c. Optionally reorder species in the written POSCAR. d. Write ``POSCAR`` and ``metadata.json`` to an output subdirectory. The output directory is ``[structure].outdir`` (default: ``random_structures``). Composition Handling -------------------- Requested vs effective composition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Doping levels are provided as percentages *relative to the number of host sites* in the generated supercell. Since the number of sites is discrete, requested percentages are converted to integer substitution counts by rounding. The workflow therefore distinguishes: - **requested composition**: the percentages from the input - **effective composition**: the percentages implied by the rounded integer counts If rounding changes any dopant level, warnings are reported and the **effective** composition is stored in the metadata. A basic consistency check is applied: - the total number of substituted atoms may not exceed the number of host sites - the total requested dopant percentage may not exceed 100% Explicit mode ~~~~~~~~~~~~~ In explicit mode, each composition is provided directly by the user as a mapping ``element -> percent``. Each composition produces exactly one structure. This mode is recommended when: - specific compositions are already known or desired - only a small number of target compositions are needed Enumerate mode ~~~~~~~~~~~~~~ In enumerate mode, the workflow generates composition dictionaries automatically from: - a list of possible dopant species - an optional list of dopants that must appear - a set of allowed total dopant levels - a set of discrete per-dopant levels The workflow enumerates combinations of distinct dopants of size: .. math:: k \in \{1, \dots, k_{\max}\} where :math:`k_{\max}` is controlled by the input parameter ``[doping].max_dopants_total``. **Interpretation**: - you may provide many candidate dopant elements in the input - but each generated structure contains at most ``max_dopants_total`` distinct dopant species *when using enumerate mode* (Explicit mode is not inherently limited unless the user adopts the same constraint.) Reproducible Random Substitution -------------------------------- The actual substitutional sites are selected randomly, but deterministically: - a **composition tag** is constructed from the effective composition - a stable seed is derived from the tag and a base seed (``[generate].seed_base``) This ensures that rerunning the workflow with unchanged input produces identical structures, while still providing randomized site selection. Directory Naming and Collision Handling --------------------------------------- Each generated structure is written to: :: //POSCAR //metadata.json The folder tag is constructed from the **effective** composition. If two different requested compositions round to the same effective composition, a suffix is added: :: __v2, __v3, ... This guarantees that all generated structures are preserved. POSCAR Species Ordering ----------------------- To control the species order in the written POSCAR, you may provide: :: [generate] poscar_order = [...] If this list is empty, the structure is written using pymatgen’s default ordering. If non-empty, sites are reordered to match the given preference (and any remaining species are appended). Outputs ------- For each generated structure: - ``POSCAR``: doped supercell structure - ``metadata.json``: provenance information, including: - host species and number of host sites - requested and effective compositions - rounded substitution counts - seed used for deterministic substitution - composition tag and input file name Notes and Limitations --------------------- - This stage performs **structure generation only** and does not evaluate stability. - Rounding is unavoidable for small supercells; larger supercells reduce rounding error. - Enumerated composition counts can grow combinatorially with the number of dopants and levels; users should choose constraints accordingly.