Workflow Overview
Conceptual Pipeline
The ML Doping Workflow implements a fully automated, multi-stage surrogate pipeline for the exploration of doped crystalline materials.
It combines symmetry-aware structure generation with machine-learned interatomic potentials to efficiently screen large configurational spaces.
The workflow is designed to first identify promising bulk candidates and then optionally extend the analysis to surface structures.
Pipeline Structure
Reference Construction → Enumeration → Screening → Relaxation → Filtering → Band Gap → Formation Energy → Database
Optional Post-Processing:
Database → Surface Generation → Surface Relaxation
Stages
Reference construction and relaxation - Relax host structure (unit cell and supercell) - Relax reference phases (metal or oxide mode) - Build thermodynamic reference dataset
Symmetry-reduced dopant enumeration - Generate substitutional doped configurations - Identify symmetry-unique arrangements on the cation sublattice
ML-based energy screening - Evaluate single-point energies using a selected ML backend - Supports: M3GNet, UMA, MACE, GRACE - Exact enumeration or stochastic sampling
Structure relaxation - Relax candidate structures using ML forces - Uses ASE optimizers (e.g. BFGS, FIRE, LBFGS) - CPU or GPU execution
Energy-based filtering - Select low-energy candidates - Window-based or top-N selection strategies
Band gap prediction - Predict electronic band gaps using ALIGNN
Formation energy evaluation - Compute formation energies using reference structures - Supports metal and oxide reference schemes
Database assembly - Aggregate results across all stages - Export a unified CSV database
Surface generation (optional) - Select candidates from the database - Generate slab structures for chosen Miller indices - Enumerate surface terminations - Optionally fix atoms in the slab
Surface relaxation (optional) - Relax slab structures using ML interatomic potentials - Apply atom constraints (e.g. fixed bottom layers) - Use the same backend abstraction as bulk relaxation
Design Principles
Modular: Each stage can be executed independently
Backend-agnostic: Multiple ML potentials are supported
Reproducible: Fully controlled via
input.tomlScalable: Supports multiprocessing and GPU execution
Extensible: New models and stages can be added easily
Notes
The core workflow (Stages 0–7) focuses on bulk screening and database generation.
Surface generation is intentionally decoupled from the main pipeline and is executed separately.
This design allows users to: - inspect and validate bulk candidates before surface modeling - control the number of generated slabs - avoid combinatorial explosion of surface structures
Typical Usage
A typical workflow consists of:
Running the full bulk pipeline:
dopingflow run-all -c input.toml
Inspecting the resulting database:
results_database.csv
Generating and optionally relaxing surfaces:
dopingflow surface -c input.toml