Quick Start Workflow¶

This is the shortest maintained path from a fresh clone to a trained model and one evaluated checkpoint.

If the environment is not ready yet, start with the installation guide.

1. Prepare the processed split¶

Build the canonical full scaffold split under notebooks/data/processed/:

python scripts/data/prepare_data.py \
    --output-dir notebooks/data/processed \
    --split-mode solute_scaffold \
    --seed 42

This writes:

train.csv
val.csv
test.csv
additional _solute and _solvent split variants
split_manifest.json

2. Train the maintained tuned TGNN baseline¶

Use the tuned TGNN config for the current architecture-comparison baseline:

python scripts/training/train.py \
    --config configs/paper_config_tuned.yaml \
    --train-data notebooks/data/processed/train.csv \
    --val-data notebooks/data/processed/val.csv \
    --test-data notebooks/data/processed/test.csv \
    --checkpoint checkpoints/tgnn_solv_tuned.pt \
    --device cuda

If CUDA is unavailable, replace --device cuda with --device mps or --device cpu.

Resume-safe variant¶

For long or preemptible runs:

python scripts/training/train.py \
    --config configs/paper_config_tuned.yaml \
    --train-data notebooks/data/processed/train.csv \
    --val-data notebooks/data/processed/val.csv \
    --test-data notebooks/data/processed/test.csv \
    --checkpoint checkpoints/tgnn_solv_tuned.pt \
    --checkpoint-every 5 \
    --device cuda

Resume later with:

python scripts/training/train.py \
    --resume checkpoints/tgnn_solv_tuned.pt \
    --checkpoint checkpoints/tgnn_solv_tuned.pt \
    --device cuda

3. Run one inference query¶

from tgnn_solv.inference import load_model, predict_solubility

model, cfg = load_model("checkpoints/tgnn_solv_tuned.pt")
result = predict_solubility(
    model,
    solute_smiles="CC(=O)Nc1ccc(O)cc1",
    solvent_smiles="CCO",
    T=298.15,
)
print(result["ln_x2"], result["T_m"], result["tau_12"])

See the full maintained inference surface in Evaluation & Inference.

4. Evaluate the checkpoint¶

Use the lightweight maintained evaluation CLI:

python scripts/evaluation/evaluate_complete.py \
    --test-data notebooks/data/processed/test.csv \
    --tgnn-checkpoint checkpoints/tgnn_solv_tuned.pt \
    --output results/full_evaluation.json \
    --verbose

This gives you:

test-set regression metrics
figure-ready arrays
error slices such as aqueous and top-solvent subsets
a canonical report payload that can be opened directly in Results & Plots and Benchmark Studio inside the lab
report sidecars:
full_evaluation.manifest.json
full_evaluation.card.json

5. Run a matched no-physics baseline¶

To compare TGNN-Solv against the maintained matched backbone:

python scripts/training/train_directgnn.py \
    --config configs/paper_config_directgnn_tuned.yaml \
    --train-data notebooks/data/processed/train.csv \
    --val-data notebooks/data/processed/val.csv \
    --test-data notebooks/data/processed/test.csv \
    --checkpoint checkpoints/directgnn_tuned.pt \
    --device cuda

That is the main ablation used to test whether the explicit physics bottleneck is helping.

6. Optional: launch the GUI workbench¶

python scripts/launch_lab.py

Useful first places in the UI:

Inference
draw/edit structures, run TGNN or DirectGNN inference, inspect uncertainty and OOD
Results & Plots -> Benchmark studio
compare canonical benchmark bundles, including external, custom, and adapter-based models
Reproduce
launch the maintained core, article, or full reproduction profile

7. Go deeper¶

After the first end-to-end run, the most useful next pages are:

Experiment Lab
launch the maintained GUI for visual training, inference, uncertainty, and lineage workflows
Architecture
understand TGNN-Solv, DirectGNN, GC priors, and Stage 0 pretraining
Training
curriculum phases, pair-aware batching, oracle injection, and resume
Experiments & Benchmarks
medium-budget comparison, full-budget diagnostic run, split studies, and external baselines
Reproducing the Paper
structured core, article, and full workflows
Notebooks & Tutorials
interactive walk-throughs that mirror the maintained code paths