Physics-Informed Solubility Modeling

TGNN-Solv¶

Physics-informed graph learning for solid-liquid equilibrium solubility prediction.

TGNN-Solv does not predict solubility directly by default. It predicts crystal and interaction parameters, solves the SLE equation with an NRTL activity model, and only then applies a bounded correction. The central question across this repository is whether that explicit thermodynamic bottleneck helps relative to the same graph backbone trained directly on ln(x2).

Install TGNN-Solv Run the quick start Open Experiment Lab Browse the notebooks See benchmark workflows

Default strict split: solute_scaffold Maintained TGNN baseline: paper_config_tuned.yaml Encoder options: mpnn or gps Optional TGNN descriptor augmentation available Matched no-physics baseline: DirectGNN Structured reproduction: core / article / full Optional Stage 0 pretraining supported Resume-safe training supported Interactive Experiment Lab available

Site Overview¶

Core models¶

The maintained comparison is:

TGNN-Solv
TGNN-Solv + descriptors
TGNN-Solv + GPS encoder
TGNN-Solv + Stage 0 pretraining
DirectGNN
DirectGNN + descriptors
descriptor and Morgan RF baselines

Practical workflows¶

This site documents the maintained paths for:

data preparation
training and resume support
Stage 0 encoder warm starts and GPS / descriptor-augmented TGNN variants
inference, uncertainty, and OOD checks
benchmark and experiment runners
external FastSolv / SolProp comparison
Benchmark Studio, benchmark cards/manifests, and structured reproduction profiles

Interactive tutorials¶

The repository includes notebook walkthroughs for:

data preparation
training
inference
evaluation
baselines, ablations, temperature analysis, and tuning

Research Question¶

The repository is organized around one high-level question:

Does an explicit thermodynamic bottleneck help out-of-split solubility prediction relative to a matched graph backbone trained directly on ln(x2)?

That is why the site consistently presents TGNN-Solv together with:

the matched no-physics DirectGNN baseline
the descriptor-augmented DirectGNN variant
RF descriptor baselines
optional external baselines such as FastSolv and SolProp

Choose a Path¶

First RunBenchmarkingDeep DiveContributing

Start here if you want a working environment and one end-to-end example.

Installation
environment setup, dependencies, sanity checks
Quick Start Workflow
prepare data, train one tuned TGNN model, evaluate a checkpoint
Notebooks & Tutorials
interactive walkthroughs aligned with the maintained code paths
Troubleshooting
the fastest way to debug setup, device, or resume issues

Start here if you want fair comparisons and reproducible result bundles.

Config Cookbook
pick the right config for TGNN, DirectGNN, or ablation work
Results
understand canonical benchmark bundles, reproduction outputs, and provisional artifacts
Experiments & Benchmarks
medium-budget, full-budget, split-comparison, external-baseline, and reproduction workflows
Reproducing the Paper
choose between core, article, and full maintained profiles
Model Zoo
checkpoint conventions and current public-model status

Start here if you want to understand the architecture and its failure modes.

Architecture
the TGNN forward path, DirectGNN, GC priors, and Stage 0 pretraining
Training
curriculum phases, pair-aware batching, oracle injection, resume
Evaluation & Inference
prediction APIs, uncertainty, calibration, and applicability domain
Applications
synthesis-route solvent screening, formulation proxies, and solvent-swap use cases
Experiment Lab
visual orchestration, DAGs, model editing, planner, lineage, docs, and Benchmark Studio
Baselines
what each baseline tests and how to run it
FAQ
common conceptual and practical questions

Start here if you want to change code, docs, or experiment scripts.

Script Reference
maturity map for scripts and notebooks
Contributing
contributor workflow and doc/update policy
Repository Audit
current strengths, caveats, and structural risks
Free GPU / Preemptible Training
resume-safe execution in cloud notebook environments

Documentation Hub¶

Start Here¶

Use these pages to get from clone to first result:

Guides¶

Use these pages to understand the maintained implementation:

Workflows¶

Use these pages to run benchmark and reproduction paths:

Reference and Project Notes¶

Use these pages when you need targeted answers:

Model Families¶

`TGNN-Solv`¶

predicts T_m, dH_fus, dCp_fus, and NRTL state
solves the SLE equation explicitly
applies a bounded correction only after the solver
supports GC crystal priors, descriptor augmentation, GPS encoder variants, and Stage 0 warm starts

`DirectGNN`¶

reuses the same backbone and interaction stack
predicts ln(x2) directly
acts as the matched no-physics control
has a stronger descriptor-augmented variant for baseline pressure testing

Descriptor-augmented baselines¶

DirectGNN + descriptors
RF on descriptors
RF on Morgan fingerprints
RF hybrid features

Use these to test whether hand-crafted chemistry closes the gap without the TGNN physics bottleneck.

External baselines¶

FastSolv
SolProp

These remain optional and environment-sensitive, so they are documented honestly as external comparison surfaces rather than core repo dependencies.

Notebook-First Readers¶

If you prefer to learn the repository through interactive walkthroughs, start with:

The site pages and notebooks are intentionally aligned, so the conceptual documentation and the runnable examples describe the same maintained surfaces.

Continue With¶

New to the repo: Installation → Quick Start Workflow
Choosing configs or runs: Config Cookbook → Experiments & Benchmarks
Need metrics and artifacts: Results → Model Zoo

TGNN-Solv¶

Site Overview¶

Core models¶

Practical workflows¶

Interactive tutorials¶

Research Question¶

Choose a Path¶

Documentation Hub¶

Start Here¶

Guides¶

Workflows¶

Reference and Project Notes¶

Model Families¶

`TGNN-Solv`¶

`DirectGNN`¶

Descriptor-augmented baselines¶

External baselines¶

Recommended Reading Sequences¶

Learn the project¶

Run serious comparisons¶

Work interactively¶

Notebook-First Readers¶

Continue With¶

TGNN-Solv¶

Site Overview¶

Core models¶

Practical workflows¶

Interactive tutorials¶

Research Question¶

Choose a Path¶

Documentation Hub¶

Start Here¶

Guides¶

Workflows¶

Reference and Project Notes¶

Model Families¶

TGNN-Solv¶

DirectGNN¶

Descriptor-augmented baselines¶

External baselines¶

Recommended Reading Sequences¶

Learn the project¶

Run serious comparisons¶

Work interactively¶

Notebook-First Readers¶

Continue With¶

`TGNN-Solv`¶

`DirectGNN`¶