TGNN-Solv¶
Physics-informed graph learning for solid-liquid equilibrium solubility prediction.
TGNN-Solv does not predict solubility directly by default. It predicts crystal
and interaction parameters, solves the SLE equation with an NRTL activity
model, and only then applies a bounded correction. The central question across
this repository is whether that explicit thermodynamic bottleneck helps
relative to the same graph backbone trained directly on ln(x2).
solute_scaffold
Maintained TGNN baseline: paper_config_tuned.yaml
Encoder options: mpnn or gps
Optional TGNN descriptor augmentation available
Matched no-physics baseline: DirectGNN
Structured reproduction: core / article / full
Optional Stage 0 pretraining supported
Resume-safe training supported
Interactive Experiment Lab available
Site Overview¶
Core models¶
The maintained comparison is:
TGNN-SolvTGNN-Solv + descriptorsTGNN-Solv + GPS encoderTGNN-Solv + Stage 0 pretrainingDirectGNNDirectGNN + descriptors- descriptor and Morgan RF baselines
Practical workflows¶
This site documents the maintained paths for:
- data preparation
- training and resume support
- Stage 0 encoder warm starts and GPS / descriptor-augmented TGNN variants
- inference, uncertainty, and OOD checks
- benchmark and experiment runners
- external FastSolv / SolProp comparison
Benchmark Studio, benchmark cards/manifests, and structured reproduction profiles
Interactive tutorials¶
The repository includes notebook walkthroughs for:
- data preparation
- training
- inference
- evaluation
- baselines, ablations, temperature analysis, and tuning
Research Question¶
The repository is organized around one high-level question:
Does an explicit thermodynamic bottleneck help out-of-split solubility prediction relative to a matched graph backbone trained directly on
ln(x2)?
That is why the site consistently presents TGNN-Solv together with:
- the matched no-physics
DirectGNNbaseline - the descriptor-augmented
DirectGNNvariant - RF descriptor baselines
- optional external baselines such as FastSolv and SolProp
Choose a Path¶
Start here if you want a working environment and one end-to-end example.
- Installation
- environment setup, dependencies, sanity checks
- Quick Start Workflow
- prepare data, train one tuned TGNN model, evaluate a checkpoint
- Notebooks & Tutorials
- interactive walkthroughs aligned with the maintained code paths
- Troubleshooting
- the fastest way to debug setup, device, or resume issues
Start here if you want fair comparisons and reproducible result bundles.
- Config Cookbook
- pick the right config for TGNN, DirectGNN, or ablation work
- Results
- understand canonical benchmark bundles, reproduction outputs, and provisional artifacts
- Experiments & Benchmarks
- medium-budget, full-budget, split-comparison, external-baseline, and reproduction workflows
- Reproducing the Paper
- choose between
core,article, andfullmaintained profiles - Model Zoo
- checkpoint conventions and current public-model status
Start here if you want to understand the architecture and its failure modes.
- Architecture
- the TGNN forward path, DirectGNN, GC priors, and Stage 0 pretraining
- Training
- curriculum phases, pair-aware batching, oracle injection, resume
- Evaluation & Inference
- prediction APIs, uncertainty, calibration, and applicability domain
- Applications
- synthesis-route solvent screening, formulation proxies, and solvent-swap use cases
- Experiment Lab
- visual orchestration, DAGs, model editing, planner, lineage, docs, and Benchmark Studio
- Baselines
- what each baseline tests and how to run it
- FAQ
- common conceptual and practical questions
Start here if you want to change code, docs, or experiment scripts.
- Script Reference
- maturity map for scripts and notebooks
- Contributing
- contributor workflow and doc/update policy
- Repository Audit
- current strengths, caveats, and structural risks
- Free GPU / Preemptible Training
- resume-safe execution in cloud notebook environments
Documentation Hub¶
Start Here¶
Use these pages to get from clone to first result:
Guides¶
Use these pages to understand the maintained implementation:
Workflows¶
Use these pages to run benchmark and reproduction paths:
Reference and Project Notes¶
Use these pages when you need targeted answers:
Model Families¶
TGNN-Solv¶
- predicts
T_m,dH_fus,dCp_fus, and NRTL state - solves the SLE equation explicitly
- applies a bounded correction only after the solver
- supports GC crystal priors, descriptor augmentation, GPS encoder variants, and Stage 0 warm starts
DirectGNN¶
- reuses the same backbone and interaction stack
- predicts
ln(x2)directly - acts as the matched no-physics control
- has a stronger descriptor-augmented variant for baseline pressure testing
Descriptor-augmented baselines¶
DirectGNN + descriptors- RF on descriptors
- RF on Morgan fingerprints
- RF hybrid features
Use these to test whether hand-crafted chemistry closes the gap without the TGNN physics bottleneck.
External baselines¶
- FastSolv
- SolProp
These remain optional and environment-sensitive, so they are documented honestly as external comparison surfaces rather than core repo dependencies.
Recommended Reading Sequences¶
Learn the project¶
Run serious comparisons¶
Work interactively¶
- Notebooks & Tutorials
01_prepare_data.ipynb02_train.ipynb03_inference.ipynb04_evaluation.ipynb
Notebook-First Readers¶
If you prefer to learn the repository through interactive walkthroughs, start with:
The site pages and notebooks are intentionally aligned, so the conceptual documentation and the runnable examples describe the same maintained surfaces.
Continue With¶
- New to the repo: Installation → Quick Start Workflow
- Choosing configs or runs: Config Cookbook → Experiments & Benchmarks
- Need metrics and artifacts: Results → Model Zoo