Script Reference
This document maps scripts and notebooks to their intended role and current
stability.
Maturity Legend
Canonical
- expected reproducible workflow entry point
Stable utility
- maintained and useful, but not necessarily part of the default article profile
Research
- useful experiment runner or analysis tool, but more likely to evolve
Optional
- depends on external stacks such as FastSolv or SolProp
Infrastructure
- internal helper, not a user-facing workflow
Preferred CLI Layout
The preferred human-facing CLI surface is now grouped by purpose:
scripts/data/
scripts/training/
scripts/evaluation/
scripts/experiments/
scripts/external/
Legacy top-level scripts/*.py entry points remain available as compatibility
wrappers because tests, script-to-script imports, and compatibility entrypoints such as reproduce.sh still rely
on them.
Canonical Workflow
| Entry point |
Role |
Status |
Notes |
scripts/data/prepare_data.py |
Build processed splits from raw sources |
Canonical |
Writes all supported split families |
scripts/training/train.py |
Train one TGNN-Solv model |
Canonical |
Three-phase curriculum |
scripts/experiments/run_seeds.py |
Multi-seed wrapper |
Canonical |
Can call other train scripts too |
scripts/evaluation/evaluate_complete.py |
Quick checkpoint evaluation |
Canonical |
Figure-ready arrays |
scripts/experiments/run_split_comparisons.py |
Fair split-wise comparison |
Canonical |
TGNN, DirectGNN, RF modes |
scripts/experiments/reproduce_paper.py |
Structured article-reproduction runner |
Canonical |
Supports core, article, and full profiles |
scripts/experiments/generate_paper_figures.py |
Figure generation |
Canonical |
Consumes result JSONs |
reproduce.sh |
Compatibility shell driver |
Canonical |
Delegates to scripts/experiments/reproduce_paper.py --profile article |
Stable Utilities
| Entry point |
Role |
Status |
Notes |
scripts/training/train_directgnn.py |
Train DirectGNN baseline |
Stable utility |
Supports descriptor augmentation |
scripts/training/train_with_pretrain.py |
Train TGNN-Solv with Stage 0 enabled by default |
Stable utility |
Thin wrapper over train.py --pretrain --run-descriptor-probe; useful for GPS and descriptor-augmented TGNN warm starts too |
scripts/training/run_resume_safe_train.sh |
Resume-safe TGNN wrapper for cloud sessions |
Stable utility |
Wraps train.py --resume |
scripts/evaluation/benchmark_tgnn_solv.py |
Rich benchmark via Evaluator |
Stable utility |
Use when you want more than quick eval |
scripts/evaluation/benchmark_adapter_model.py |
Benchmark a formal Python adapter |
Stable utility |
Preferred custom-model path when you want fit/predict/report in one contract |
scripts/evaluation/analyze_benchmark.py |
Text summary of benchmark JSON |
Stable utility |
Lightweight reporting helper |
scripts/evaluation/compare_models.py |
Compare multiple TGNN checkpoints |
Stable utility |
Wraps benchmark logic |
scripts/training/diagnose_training.py |
Dataset stats and overfit sanity check |
Stable utility |
Good pre-flight tool |
scripts/evaluation/probe_gsol_descriptor_recovery.py |
Ridge linear probe from g_sol to RDKit descriptors |
Stable utility |
Useful for encoder-capacity diagnostics |
scripts/evaluation/run_thermo_stress_suite.py |
Stress slices on canonical prediction bundles |
Stable utility |
Reads predictions.csv, writes slice metrics JSON |
scripts/experiments/run_optuna.py |
Hyperparameter tuning |
Stable utility |
Supports TGNN, GPS TGNN, descriptor-augmented TGNN, and DirectGNN families |
scripts/launch_lab.py |
Launch the maintained Streamlit control surface |
Stable utility |
Preferred GUI entry point |
scripts/gui/launch_lab.py |
Namespaced launcher for the same lab |
Stable utility |
Same behavior, alternate path |
Research Experiment Runners
| Entry point |
Role |
Status |
Notes |
scripts/experiments/run_ablation.py |
Multi-seed ablation sweeps |
Research |
Includes fixed_group_priors and direct_gnn |
scripts/experiments/run_full_budget_experiment.py |
Full-budget TGNN-vs-DirectGNN diagnostic study |
Research |
Exports TGNN intermediates and oracle diagnostics |
scripts/experiments/run_medium_budget_comparison.py |
Full-split medium-budget architecture comparison |
Research |
4 TGNN variants, 2 DirectGNN variants, RF baseline |
scripts/evaluation/validate_physics.py |
Physics-parameter diagnostics |
Research |
Useful for TGNN checkpoint inspection |
scripts/evaluation/error_analysis.py |
Detailed residual analysis |
Research |
Consumes evaluation JSON |
scripts/experiments/learning_curves.py |
Data-efficiency study |
Research |
Multi-fraction, multi-seed |
scripts/experiments/temperature_extrapolation.py |
Temperature extrapolation study |
Research |
Uses a combined dataset CSV |
scripts/experiments/statistical_tests.py |
Paired significance testing |
Research |
Used by the full reproduction profile, but still analysis-oriented |
scripts/experiments/generate_supplementary.py |
Supplementary table generation |
Research |
Consumes produced result JSONs |
scripts/experiments/build_benchmark_release.py |
Freeze a checksum-based benchmark release manifest |
Research |
Best when preparing a paper-ready artifact snapshot |
Optional External Baseline Wrappers
| Entry point |
Role |
Status |
Notes |
scripts/external/run_fastsolv.py |
Predict, train, or compare FastSolv |
Optional |
Preferred FastSolv wrapper |
scripts/external/compare_fastsolv_tgnn.py |
Lightweight TGNN-vs-FastSolv comparison |
Optional |
Older convenience wrapper |
scripts/external/run_solprop.py |
Zero-shot, calibrated, or native-retrained SolProp |
Optional |
Usually run in a separate environment |
Infrastructure
| Entry point |
Role |
Status |
Notes |
scripts/_bootstrap.py |
Adds repo src/ to sys.path for CLIs |
Infrastructure |
Imported by most scripts |
Maintained Library Utilities
Some important maintained surfaces are not exposed as standalone CLIs today.
They are available through the Python API and are demonstrated in notebooks.
| Module / API |
Role |
Notes |
tgnn_solv.pretrain.Pretrainer |
Stage 0 encoder/readout pretraining core |
Used by train.py --pretrain, train_with_pretrain.py, and notebooks/02_train.ipynb |
tgnn_solv.pretrain_pipeline |
Stage 0 checkpoint save/load helpers |
Used by the maintained TGNN training CLI |
tgnn_solv.pretrain.download_zinc250k |
Pretraining SMILES acquisition with fallback |
Falls back to BigSolDB SMILES if needed |
tgnn_solv.inference.load_model |
Checkpoint loading |
Reconstructs config and compatible weights |
tgnn_solv.inference.predict_solubility |
Single-system inference |
Returns intermediates, not only final ln(x2) |
tgnn_solv.inference.temperature_scan |
Multi-temperature inference |
Useful for van't Hoff style inspection |
tgnn_solv.inference.interpret_prediction |
Human-readable prediction report |
Good for manual case review |
tgnn_solv.uncertainty.MCDropoutPredictor |
Single-checkpoint uncertainty |
Covered in notebooks/04_evaluation.ipynb |
tgnn_solv.uncertainty.EnsemblePredictor |
Multi-checkpoint uncertainty |
Now works for both TGNN-Solv and DirectGNN families |
tgnn_solv.uncertainty.calibration_report |
Interval calibration summary |
Accepts MC-dropout or ensemble outputs |
tgnn_solv.domain.ApplicabilityDomain |
Inference-time OOD / AD scoring |
Covered in notebooks/03_inference.ipynb and notebooks/04_evaluation.ipynb |
tgnn_solv.benchmark_adapters |
Formal custom-model adapter contract |
Lets arbitrary models participate in canonical benchmark bundles |
tgnn_solv.artifacts |
Run manifests and benchmark/model cards |
Supplies machine-readable provenance sidecars |
tgnn_solv.stress.build_stress_suite |
Thermodynamic stress slices for benchmark bundles |
Used after predictions.csv already exists |
The same maintained surfaces are also exposed together through
tools/experiment_lab/app.py, but the GUI is an orchestration layer rather
than a separate model implementation.
High-Signal Usage Notes
scripts/experiments/run_seeds.py
- default train script is
scripts/training/train.py
- can also launch
scripts/training/train_directgnn.py
- aggregates
mae, rmse, r2, and pearson_r
scripts/training/train_directgnn.py
- computes descriptor normalization stats automatically when
use_descriptor_augmentation=True
- saves
descriptor_mean and descriptor_std into the checkpoint
- supports
--checkpoint-every and --resume
scripts/training/train.py
- supports
--checkpoint-every and --resume
- optionally runs Stage 0 with
--pretrain
- can warm-start from
--pretrain-checkpoint
- can launch the existing descriptor-recovery probe with
--run-descriptor-probe
- saves reusable Stage 0 encoder/readout checkpoints through
tgnn_solv.pretrain_pipeline
- stores TGNN descriptor normalization stats in the checkpoint when
use_descriptor_augmentation=True
- fits
gc_prior_tm_scale / gc_prior_tm_bias on the training split when
use_gc_priors_crystal=True
- preserves those calibrated GC settings inside the saved config
scripts/experiments/run_ablation.py
- resolves canonical variant aliases
- automatically enables any optional dataset feature paths required by the
selected variants
scripts/experiments/run_full_budget_experiment.py
- trains TGNN-Solv and DirectGNN on matched budgets
- exports
metrics.json, diagnostics.json, and tgnn_intermediates.csv
- passes
--checkpoint-every through to the training CLIs
- resumes from existing per-seed checkpoints when available
scripts/experiments/run_medium_budget_comparison.py
- runs the medium-budget full-scaffold comparison under
results/medium_budget
- derives a no-oracle training config from
paper_config_combined.yaml
- writes
summary.json, comparison_table.md, and per-model artifacts
Notebook Reference
| Notebook |
Role |
Recommended usage |
notebooks/01_prepare_data.ipynb |
Data preparation |
Canonical interactive equivalent of prepare_data.py |
notebooks/02_train.ipynb |
TGNN training walkthrough |
Interactive training plus optional Stage 0 pretraining |
notebooks/03_inference.ipynb |
Inference examples |
Manual inspection, temperature scans, and single-query AD checks |
notebooks/04_evaluation.ipynb |
Evaluation workflow |
Stratified metrics, MC-dropout, calibration, and AD analysis |
notebooks/05_baselines.ipynb |
Baseline experiments |
Exploratory DirectGNN, descriptor, RF, and external-baseline work |
notebooks/06_ablations.ipynb |
Ablation experiments |
Exploratory ablations including maintained split-late comparison |
notebooks/07_temperature.ipynb |
Temperature analysis |
Research notebook for van't Hoff and multi-temperature behavior |
notebooks/08_optuna_tuning.ipynb |
Optuna tuning |
Interactive tuning |