Quick Start Workflow¶
This is the shortest maintained path from a fresh clone to a trained model and one evaluated checkpoint.
If the environment is not ready yet, start with the installation guide.
1. Prepare the processed split¶
Build the canonical full scaffold split under notebooks/data/processed/:
python scripts/data/prepare_data.py \
--output-dir notebooks/data/processed \
--split-mode solute_scaffold \
--seed 42
This writes:
train.csvval.csvtest.csv- additional
_soluteand_solventsplit variants split_manifest.json
2. Train the maintained tuned TGNN baseline¶
Use the tuned TGNN config for the current architecture-comparison baseline:
python scripts/training/train.py \
--config configs/paper_config_tuned.yaml \
--train-data notebooks/data/processed/train.csv \
--val-data notebooks/data/processed/val.csv \
--test-data notebooks/data/processed/test.csv \
--checkpoint checkpoints/tgnn_solv_tuned.pt \
--device cuda
If CUDA is unavailable, replace --device cuda with --device mps or
--device cpu.
Resume-safe variant¶
For long or preemptible runs:
python scripts/training/train.py \
--config configs/paper_config_tuned.yaml \
--train-data notebooks/data/processed/train.csv \
--val-data notebooks/data/processed/val.csv \
--test-data notebooks/data/processed/test.csv \
--checkpoint checkpoints/tgnn_solv_tuned.pt \
--checkpoint-every 5 \
--device cuda
Resume later with:
python scripts/training/train.py \
--resume checkpoints/tgnn_solv_tuned.pt \
--checkpoint checkpoints/tgnn_solv_tuned.pt \
--device cuda
3. Run one inference query¶
from tgnn_solv.inference import load_model, predict_solubility
model, cfg = load_model("checkpoints/tgnn_solv_tuned.pt")
result = predict_solubility(
model,
solute_smiles="CC(=O)Nc1ccc(O)cc1",
solvent_smiles="CCO",
T=298.15,
)
print(result["ln_x2"], result["T_m"], result["tau_12"])
See the full maintained inference surface in Evaluation & Inference.
4. Evaluate the checkpoint¶
Use the lightweight maintained evaluation CLI:
python scripts/evaluation/evaluate_complete.py \
--test-data notebooks/data/processed/test.csv \
--tgnn-checkpoint checkpoints/tgnn_solv_tuned.pt \
--output results/full_evaluation.json \
--verbose
This gives you:
- test-set regression metrics
- figure-ready arrays
- error slices such as aqueous and top-solvent subsets
- a canonical report payload that can be opened directly in
Results & PlotsandBenchmark Studioinside the lab - report sidecars:
full_evaluation.manifest.jsonfull_evaluation.card.json
5. Run a matched no-physics baseline¶
To compare TGNN-Solv against the maintained matched backbone:
python scripts/training/train_directgnn.py \
--config configs/paper_config_directgnn_tuned.yaml \
--train-data notebooks/data/processed/train.csv \
--val-data notebooks/data/processed/val.csv \
--test-data notebooks/data/processed/test.csv \
--checkpoint checkpoints/directgnn_tuned.pt \
--device cuda
That is the main ablation used to test whether the explicit physics bottleneck is helping.
6. Optional: launch the GUI workbench¶
python scripts/launch_lab.py
Useful first places in the UI:
Inference- draw/edit structures, run TGNN or DirectGNN inference, inspect uncertainty and OOD
Results & Plots -> Benchmark studio- compare canonical benchmark bundles, including external, custom, and adapter-based models
Reproduce- launch the maintained
core,article, orfullreproduction profile
7. Go deeper¶
After the first end-to-end run, the most useful next pages are:
- Experiment Lab
- launch the maintained GUI for visual training, inference, uncertainty, and lineage workflows
- Architecture
- understand TGNN-Solv, DirectGNN, GC priors, and Stage 0 pretraining
- Training
- curriculum phases, pair-aware batching, oracle injection, and resume
- Experiments & Benchmarks
- medium-budget comparison, full-budget diagnostic run, split studies, and external baselines
- Reproducing the Paper
- structured
core,article, andfullworkflows - Notebooks & Tutorials
- interactive walk-throughs that mirror the maintained code paths