Workflow Overview

Conceptual Pipeline

The ML Doping Workflow implements a fully automated, multi-stage surrogate pipeline for the exploration of doped crystalline materials.

It combines symmetry-aware structure generation with machine-learned interatomic potentials to efficiently screen large configurational spaces.

The workflow is designed to first identify promising bulk candidates and then optionally extend the analysis to surface structures.

Reference Construction → Enumeration → Screening → Relaxation → Filtering → Band Gap → Formation Energy → Database

Optional Post-Processing:

Database → Surface Generation → Surface Relaxation

Reference construction and relaxation - Relax host structure (unit cell and supercell) - Relax reference phases (metal or oxide mode) - Build thermodynamic reference dataset
Symmetry-reduced dopant enumeration - Generate substitutional doped configurations - Identify symmetry-unique arrangements on the cation sublattice
ML-based energy screening - Evaluate single-point energies using a selected ML backend - Supports: M3GNet, UMA, MACE, GRACE - Exact enumeration or stochastic sampling
Structure relaxation - Relax candidate structures using ML forces - Uses ASE optimizers (e.g. BFGS, FIRE, LBFGS) - CPU or GPU execution
Energy-based filtering - Select low-energy candidates - Window-based or top-N selection strategies
Band gap prediction - Predict electronic band gaps using ALIGNN
Formation energy evaluation - Compute formation energies using reference structures - Supports metal and oxide reference schemes
Database assembly - Aggregate results across all stages - Export a unified CSV database
Surface generation (optional) - Select candidates from the database - Generate slab structures for chosen Miller indices - Enumerate surface terminations - Optionally fix atoms in the slab
Surface relaxation (optional) - Relax slab structures using ML interatomic potentials - Apply atom constraints (e.g. fixed bottom layers) - Use the same backend abstraction as bulk relaxation

The core workflow (Stages 0–7) focuses on bulk screening and database generation.
Surface generation is intentionally decoupled from the main pipeline and is executed separately.
This design allows users to: - inspect and validate bulk candidates before surface modeling - control the number of generated slabs - avoid combinatorial explosion of surface structures

A typical workflow consists of:

Running the full bulk pipeline:
```
dopingflow run-all -c input.toml
```
Inspecting the resulting database:
```
results_database.csv
```
Generating and optionally relaxing surfaces:
```
dopingflow surface -c input.toml
```