5. Bandgap Prediction

Implementation

This stage is implemented in:

src/dopingflow/bandgap.py

The public entry point is:

run_bandgap(...)

Purpose

This stage predicts the electronic band gap of relaxed candidate structures using a locally available ALIGNN model.

Band gaps are evaluated for candidates produced by the previous steps (typically after relaxation, and optionally after filtering) and written to:

a per-folder summary CSV
a per-candidate metadata record

This is a machine-learning inference step; no electronic-structure calculation is performed here.

Inputs

This stage uses settings from the following sections of input.toml:

[structure]: provides the output directory containing structure folders.
[bandgap]: defines graph-construction parameters and skipping behavior.

In addition, this stage requires an environment variable pointing to a local ALIGNN model directory:

ALIGNN_MODEL_DIR=/path/to/your/alignn_model_folder

Method Summary

For each structure folder inside [structure].outdir:

Determine the list of candidate structures to evaluate:
1. If selected_candidates.txt exists, only those candidates are used.
2. Otherwise, all candidate_*/02_relax/POSCAR files are used.
For each selected candidate:
1. Read the relaxed POSCAR (from Step 03).
2. Convert the structure to a graph representation (DGL graph) using a neighbor cutoff and maximum neighbor count.
3. Run a forward pass of the ALIGNN model to obtain the predicted band gap.
4. Write a per-candidate metadata file.
Write a per-folder summary CSV listing band gaps for all evaluated candidates.

Selection of Candidates

This stage supports two selection modes:

Filtered selection (recommended): If selected_candidates.txt exists in the folder, the band gap is computed only for those candidates. This integrates naturally with Step 04 filtering.
Fallback selection: If no selection list is present, all relaxed candidates are used.

The relaxed structures are always taken from:

candidate_*/02_relax/POSCAR

Model and Graph Construction

ALIGNN model loading

The model is loaded from a local directory under ALIGNN_MODEL_DIR. The code searches for a folder containing a config.json and a checkpoint file (e.g. checkpoint_*.pt or *.pt), then loads:

the model configuration (from config.json)
the checkpoint weights

Graph parameters

For each structure, a graph is constructed using:

[bandgap].cutoff: neighbor cutoff distance
[bandgap].max_neighbors: maximum neighbors per atom

These parameters control how local environments are encoded for inference.

The DGL backend is set to PyTorch in this stage (via DGLBACKEND=pytorch), which must be configured before importing DGL/ALIGNN.

Outputs

Per-folder summary

For each structure folder, this stage writes:

bandgap_alignn_summary.csv

This file contains:

candidate: candidate directory name
bandgap_eV_ALIGNN_MBJ: predicted band gap (eV)

The CSV is sorted by band gap (ascending) for convenience.

Per-candidate metadata

For each evaluated candidate, this stage writes:

candidate_XXX/03_band/meta.json

This metadata includes:

predicted band gap value
model provenance (model directory, config path, checkpoint path)
input structure path
structure size and composition
graph parameters used for inference
walltime for prediction

Reproducibility and Skipping

If:

[bandgap].skip_if_done = true

and bandgap_alignn_summary.csv already exists for a folder, that folder is skipped.

Given the same relaxed POSCARs, model checkpoint, and graph parameters, this stage is deterministic.

Notes and Limitations

Predicted band gaps depend on the chosen ALIGNN model and its training domain.
Band gaps are ML predictions and should not be interpreted as DFT-quality values unless the chosen model has been validated for the target material class.
This stage assumes candidate structures are already relaxed; it does not perform geometry optimization.