Skip to content
Physics-Informed Solubility Modeling

TGNN-Solv

Physics-informed graph learning for solid-liquid equilibrium solubility prediction.

TGNN-Solv does not predict solubility directly by default. It predicts crystal and interaction parameters, solves the SLE equation with an NRTL activity model, and only then applies a bounded correction. The central question across this repository is whether that explicit thermodynamic bottleneck helps relative to the same graph backbone trained directly on ln(x2).

Default strict split: solute_scaffold Maintained TGNN baseline: paper_config_tuned.yaml Encoder options: mpnn or gps Optional TGNN descriptor augmentation available Matched no-physics baseline: DirectGNN Structured reproduction: core / article / full Optional Stage 0 pretraining supported Resume-safe training supported Interactive Experiment Lab available

Site Overview

Core models

The maintained comparison is:

  • TGNN-Solv
  • TGNN-Solv + descriptors
  • TGNN-Solv + GPS encoder
  • TGNN-Solv + Stage 0 pretraining
  • DirectGNN
  • DirectGNN + descriptors
  • descriptor and Morgan RF baselines

Practical workflows

This site documents the maintained paths for:

  • data preparation
  • training and resume support
  • Stage 0 encoder warm starts and GPS / descriptor-augmented TGNN variants
  • inference, uncertainty, and OOD checks
  • benchmark and experiment runners
  • external FastSolv / SolProp comparison
  • Benchmark Studio, benchmark cards/manifests, and structured reproduction profiles

Interactive tutorials

The repository includes notebook walkthroughs for:

  • data preparation
  • training
  • inference
  • evaluation
  • baselines, ablations, temperature analysis, and tuning

Research Question

The repository is organized around one high-level question:

Does an explicit thermodynamic bottleneck help out-of-split solubility prediction relative to a matched graph backbone trained directly on ln(x2)?

That is why the site consistently presents TGNN-Solv together with:

  • the matched no-physics DirectGNN baseline
  • the descriptor-augmented DirectGNN variant
  • RF descriptor baselines
  • optional external baselines such as FastSolv and SolProp

Choose a Path

Start here if you want a working environment and one end-to-end example.

Start here if you want fair comparisons and reproducible result bundles.

  • Config Cookbook
  • pick the right config for TGNN, DirectGNN, or ablation work
  • Results
  • understand canonical benchmark bundles, reproduction outputs, and provisional artifacts
  • Experiments & Benchmarks
  • medium-budget, full-budget, split-comparison, external-baseline, and reproduction workflows
  • Reproducing the Paper
  • choose between core, article, and full maintained profiles
  • Model Zoo
  • checkpoint conventions and current public-model status

Start here if you want to understand the architecture and its failure modes.

  • Architecture
  • the TGNN forward path, DirectGNN, GC priors, and Stage 0 pretraining
  • Training
  • curriculum phases, pair-aware batching, oracle injection, resume
  • Evaluation & Inference
  • prediction APIs, uncertainty, calibration, and applicability domain
  • Applications
  • synthesis-route solvent screening, formulation proxies, and solvent-swap use cases
  • Experiment Lab
  • visual orchestration, DAGs, model editing, planner, lineage, docs, and Benchmark Studio
  • Baselines
  • what each baseline tests and how to run it
  • FAQ
  • common conceptual and practical questions

Start here if you want to change code, docs, or experiment scripts.

Documentation Hub

Start Here

Use these pages to get from clone to first result:

Guides

Use these pages to understand the maintained implementation:

Workflows

Use these pages to run benchmark and reproduction paths:

Reference and Project Notes

Use these pages when you need targeted answers:

Model Families

TGNN-Solv

  • predicts T_m, dH_fus, dCp_fus, and NRTL state
  • solves the SLE equation explicitly
  • applies a bounded correction only after the solver
  • supports GC crystal priors, descriptor augmentation, GPS encoder variants, and Stage 0 warm starts

DirectGNN

  • reuses the same backbone and interaction stack
  • predicts ln(x2) directly
  • acts as the matched no-physics control
  • has a stronger descriptor-augmented variant for baseline pressure testing

Descriptor-augmented baselines

  • DirectGNN + descriptors
  • RF on descriptors
  • RF on Morgan fingerprints
  • RF hybrid features

Use these to test whether hand-crafted chemistry closes the gap without the TGNN physics bottleneck.

External baselines

  • FastSolv
  • SolProp

These remain optional and environment-sensitive, so they are documented honestly as external comparison surfaces rather than core repo dependencies.

Work interactively

  1. Notebooks & Tutorials
  2. 01_prepare_data.ipynb
  3. 02_train.ipynb
  4. 03_inference.ipynb
  5. 04_evaluation.ipynb

Notebook-First Readers

If you prefer to learn the repository through interactive walkthroughs, start with:

  1. 01_prepare_data.ipynb
  2. 02_train.ipynb
  3. 03_inference.ipynb
  4. 04_evaluation.ipynb

The site pages and notebooks are intentionally aligned, so the conceptual documentation and the runnable examples describe the same maintained surfaces.

Continue With