Installation, Usage, and Outputs

This page explains how to install dopingflow and how to run the workflow either step-by-step or using the single orchestration command.

Installation

Clone the repository and install in editable mode:

git clone KazemZh/dopingflow
cd dopingflow
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

Verify the CLI is available:

dopingflow --help

Required Inputs

Refer to Required Input Files page.

Running the Workflow

All commands accept -c/--config to specify the TOML file. If omitted, input.toml in the current directory is used.

Run the full pipeline with one command

To run the complete workflow in order:

dopingflow run-all -c input.toml

This executes:

refs -> generate -> scan -> relax -> filter -> bandgap -> formation -> collect

The surface stage is not included in run-all and must be executed separately:

dopingflow surface -c input.toml

This design allows users to first inspect and validate the final database before generating surface structures.

Resuming and partial runs (run-all)

You can resume from a given stage:

dopingflow run-all -c input.toml --from relax

You can stop at a stage (inclusive). This is useful if you do not want to run bandgap yet:

dopingflow run-all -c input.toml --until filter

You can print the planned steps without running them:

dopingflow run-all -c input.toml --dry-run

You can run only a subset of steps inside a selected range:

dopingflow run-all -c input.toml --from refs --until collect --only refs,generate,scan

Filtering controls inside run-all

The filter stage supports optional overrides (passed through by run-all):

Restrict filtering to a single composition folder:

dopingflow run-all -c input.toml --from relax --until filter --filter-only Sb5_Zr5

Force re-filtering even if outputs exist:

dopingflow run-all -c input.toml --from filter --until filter --force

Override filtering mode by specifying one of:

dopingflow run-all -c input.toml --from filter --until filter --window-mev 50
dopingflow run-all -c input.toml --from filter --until filter --topn 12

Step-by-step execution

Step 00: build and relax thermodynamic reference structures:

dopingflow refs-build -c input.toml

Step 01: structure generation:

dopingflow generate -c input.toml

Step 02: scan (symmetry-unique enumeration / sampling + ML-based single-point energies):

dopingflow scan -c input.toml

Step 03: relax scanned candidates (ML backend + ASE optimizer):

dopingflow relax -c input.toml

Step 04: filter relaxed candidates:

dopingflow filter -c input.toml

Optional Step 05: predict bandgap (ALIGNN):

Before running bandgap, set the model path:

export ALIGNN_MODEL_DIR=/path/to/your/alignn/model_root
dopingflow bandgap -c input.toml

Step 06: formation energies:

dopingflow formation -c input.toml

Step 07: collect results into one CSV database:

dopingflow collect -c input.toml

Step 08: generate surfaces and optionally relax slabs:

dopingflow surface -c input.toml

Outputs Overview

This section summarizes the main outputs created by each stage.

Step 00 (refs-build)

Writes:

reference_structures/reference_energies.json

This file contains:

relaxed host unit-cell and supercell energies
relaxed reference structure energies
metadata about backend, optimizer, device, and convergence settings
reference information used for formation energy evaluation

Additional outputs:

reference_structures/relaxed/host_unit_relaxed.POSCAR
reference_structures/relaxed/host_supercell_<a>x<b>x<c>_relaxed.POSCAR
reference_structures/relaxed/refs/<name>_relaxed.POSCAR

Step 01 (generate)

Writes a structure folder per composition under [structure].outdir (default: random_structures):

<outdir>/<composition_tag>/POSCAR
<outdir>/<composition_tag>/metadata.json

Step 02 (scan)

Inside each <composition_tag>/ folder, writes:

ranking_scan.csv (top-k single-point energies)
scan_summary.txt (human-readable summary)

Candidate structures:

<composition_tag>/candidate_###/01_scan/POSCAR
<composition_tag>/candidate_###/01_scan/meta.json

The scan stage evaluates structures using a selected ML backend (e.g. M3GNet, UMA, MACE, GRACE).

Step 03 (relax)

For each candidate:

candidate_###/02_relax/POSCAR
candidate_###/02_relax/meta.json

Also writes per composition folder:

ranking_relax.csv

The relaxation stage uses:

ML interatomic potentials for forces
ASE optimizers for structural relaxation

Step 04 (filter)

Writes per composition folder:

ranking_relax_filtered.csv (filtered candidate table)
selected_candidates.txt (names of kept candidates)

Step 05 (bandgap)

Writes per composition folder:

bandgap_alignn_summary.csv

Writes per candidate:

candidate_###/03_band/meta.json

Step 06 (formation)

Writes per composition folder:

formation_energies.csv

Writes per candidate:

candidate_###/04_formation/meta.json

Step 07 (collect)

Writes one flat CSV in the workflow root:

results_database.csv

This file is a compact “database view” across compositions and selected candidates, combining scan/relax/filter/bandgap/formation results where available.

Step 08 (surface)

Generates slab structures from selected candidates and optionally relaxes them.

Input:

results_database.csv (from Step 07)

Writes per candidate:

<outdir>/<composition_tag>/candidate_###/hkl_h_k_l/term_###/

Files per slab:

POSCAR (generated slab)
CONTCAR (relaxed slab, if enabled)
meta.json (slab metadata)

Optional relaxation outputs:

surface_relax.log
surface_relax.traj
surface_relax.json

Global output:

<outdir>/surface_summary.csv

The surface stage uses:

pymatgen for slab generation
the same ML backend abstraction as Step 03 for relaxation
ASE optimizers for slab relaxation

Tips

Use --verbose with any command for more detailed logs:
```
dopingflow run-all -c input.toml --verbose
```
If bandgap is not configured yet (no ALIGNN_MODEL_DIR), stop before bandgap:
```
dopingflow run-all -c input.toml --until filter
```

Run surface generation only after verifying final candidates:

dopingflow collect -c input.toml
dopingflow surface -c input.toml

Start with a single composition and one candidate to validate surface settings:
```
composition_tag = "Sb50"
selection_mode = "id"
candidate_id = 1
```