Sisyphus

Graph-based whole-body PBPK simulation with native uncertainty propagation

CI

Methodology · Quickstart · Validation · Architecture · Limitations


Sisyphus is a physiologically based pharmacokinetic (PBPK) platform that represents the human body as a typed directed multi-graph, derives ordinary differential equation (ODE) systems from graph topology, and propagates parameter uncertainty natively through Monte Carlo sampling.

The platform accepts a SMILES string and dosing regimen as input and produces PK endpoints (Cmax, Tmax, AUC, t1/2) with 90% prediction intervals. Beyond single-dose prediction, Sisyphus supports multi-dose regimen simulation with steady-state detection, Bayesian therapeutic drug monitoring (TDM) via dispatched simulation-based / importance / iterative Bayesian methods, model-informed precision dosing (MIPD), drug-drug interaction (DDI) modeling, pharmacogenomic phenotype-aware prediction (SLCO1B1, NAT2, UGT1A1), and PK/PD effect estimation. OATP1B1-mediated hepatic uptake is modeled via an extended clearance model (ECM) with closed-form QSSA hepatocyte kinetics.

$ sisyphus predict --smiles "Cn1c(=O)c2c(ncn2C)n(C)c1=O" --dose 100

Drug: Cn1c(=O)c2c(ncn2C)n(C)c1=O
Method: hybrid
Confidence: high
Cmax: 1.1792 mg/L
Tmax: 0.63 h
t½: 2.07 h

Methodology

Physiological model

The body is represented as a 34-compartment directed multi-graph comprising blood pools (arterial, venous, portal vein), perfusion-limited organs (11), permeability-limited organs (4, each split into vascular and extravascular sub-compartments), GI lumen segments (8, compartmental absorption and transit model; Yu & Amidon, 1999), and mass-balance sinks (4). Physiological parameters follow the ICRP Reference Man (ICRP, 2002). Tissue compositions for partition coefficient estimation are taken from Rodgers & Rowland (2005). CYP enzyme abundances follow Shimada et al. (1994).

                      ┌─────────────────────────────────────────────┐
                      │                                             │
   ┌──────┐    ┌──────┴───┐                                   ┌─────┴────┐
   │ lung │───►│ arterial │─► brain ─────────────────────────►│  venous  │
   └──┬───┘    │  blood   │─► heart ─────────────────────────►│  blood   │
      │        │          │─► kidney ────────────────────────►│          │
      │        │          │                                   │          │
      │        │          │─► gut wall ──┐                    │          │
      │        │          │─► spleen  ───┼─► liver ──────────►│          │
      │        │          │─► pancreas ──┘   (portal,CYP450)  │          │
      │        │          │                                   │          │
      │        │          │─► muscle, adipose, skin, bone ───►│          │
      │        └──────────┘                                   └─────┬────┘
      │                                                             │
      └─────────────────────────────────────────────────────────────┘

   stomach ──► duodenum ──► jejunum ──► ileum ──► colon ──► fecal excretion
                  │            │          │
                  └────────────┴──────────┘
                       absorption ──► gut wall

ODE formulation

The ODE system is derived automatically from graph topology. Each edge type dispatches a flux function:

Perfusion-limited transport (FlowFluxSpec):

\[\frac{dA_i}{dt} = Q_i \cdot C_{in} - Q_i \cdot \frac{A_i \cdot R_{B:P}}{V_i \cdot K_{p,i}}\]

Hepatic clearance — Extended Clearance Model (ECM, default) (ClearanceFluxSpec, model="extended"):

The QSSA-derived effective intrinsic clearance, with $PS_{inf,total} = PS_{inf} + J_{max} \cdot f_u / (K_m + f_u \cdot C_{u,hep})$:

\[CL_{int,eff} = \frac{PS_{inf,total} \cdot CL_{int,h}}{PS_{eff} + CL_{int,h}}\]

is embedded in the well-stirred form:

\[CL_{h} = \frac{Q \cdot f_u \cdot CL_{int,eff}}{Q + f_u \cdot CL_{int,eff}}\]

where $PS_{inf}$ is passive sinusoidal influx, $PS_{eff}$ is sinusoidal efflux, $J_{max}/K_m$ are active uptake (OATP1B1, …) Michaelis–Menten parameters, and $CL_{int,h}$ is the sum of metabolic + biliary intrinsic clearance. Derivation: hepatocyte mass balance $PS_{inf,total} \cdot C_{u,blood} = (PS_{eff} + CL_{int,h}) \cdot C_{u,hep}$ at steady state (QSSA). For drugs without active transporter kinetics, the model reduces to the classical well-stirred form:

\[CL_{int,organ} = \sum_j \left( E_j \cdot k_j \right) \cdot S_{IVIVE}\] \[CL_{organ} = \frac{Q \cdot f_u \cdot CL_{int,organ}}{Q + f_u \cdot CL_{int,organ}}\]

where $E_j$ is the enzyme abundance (total pmol in organ), $k_j$ is the per-enzyme intrinsic clearance (μL/min/pmol), and $S_{IVIVE}$ converts units (60/106, μL/min → L/h). This formulation is enzyme-level: clearance at any organ is computed from its local enzyme profile, not from organ identity (Houston, 1994; Yang et al., 2007; Shitara et al., 2013 / Yoshikado et al., 2017 for the ECM/QSSA hepatocyte form implemented here).

Permeability-limited distribution (DiffusionFluxSpec):

\[\frac{dA_{vasc}}{dt} = Q \cdot C_{art} - Q \cdot C_{vasc} - PS \cdot (C_{u,vasc} - C_{u,tissue})\] \[\frac{dA_{tissue}}{dt} = PS \cdot (C_{u,vasc} - C_{u,tissue})\]

GI absorption (AbsorptionFluxSpec):

\[k_a = \frac{2.88 \cdot P_{eff} \cdot f_{ka}}{r}\]

where $P_{eff}$ is effective permeability (×10−4 cm/s), $f_{ka}$ is the segment-specific absorption fraction, and $r$ is particle radius (μm).

Tissue:plasma partition coefficients are computed via the Rodgers & Rowland method (Rodgers & Rowland, 2005, 2006), with the Berezhkovskiy correction for acids (Berezhkovskiy, 2004).

Prediction pipeline

The full pipeline combines mechanistic simulation with data-driven prediction:

  1. SMILES → molecular profile: RDKit descriptors, structural pKa classification, applicability domain assessment
  2. ADME prediction: Pre-trained XGBoost models for fu,p, CLint, RB:P, VDss (trained on TDC datasets; Huang et al., 2021), with DrugBank experimental fu,p enrichment where available
  3. IVIVE: CLint decomposition into per-enzyme affinities, Kp calculation
  4. PBPK simulation: 34-state ODE system solved via LSODA (Petzold, 1983)
  5. ML direct prediction: XGBoost Cmax model (trained on 1,128 drugs from multi-source clinical PK data)
  6. CL/F analytical track: closed-form 1-compartment Cmax estimate using XGBoost CL/F + Vd predictions and ka from Engine Tmax / Peff. Decorrelates with Engine+ML residuals via different input channels.
  7. VDss analytical track: 1-compartment Cmax using XGBoost VDss (volume-of-distribution-at-steady-state) predictor. Conditional activation based on applicability. Added 2026-04 as the orthogonal fourth track.
  8. Meta-learner: Compound-type-adaptive geometric blend of all four tracks via LOOCV-calibrated weights. Base compounds: engine 0.60 / ML 0.40 / CLF 0.00; non-base: engine 0.35 / ML 0.50 / CLF 0.15. VDss track weight 0.20 when activated; other weights scaled by ×0.80 so the four-track sum remains unity.

Uncertainty propagation

All parameters are represented as Distribution(mean, cv, dist_type). Monte Carlo sampling draws N realizations from all parameter distributions simultaneously, solves the ODE for each, and aggregates the resulting PK endpoints into distributional summaries with prediction intervals. The graph topology is compiled once; only parameter values change across MC iterations (“compile once, parameterize many”).

Multi-dose regimen simulation

Multi-dose pharmacokinetics are computed by an event-driven solver that wraps the single-dose ODE engine. Dose events are injected into the state vector between integration segments; the ODE right-hand side is not modified. This preserves the identity-blind engine invariant while supporting arbitrary dosing schedules (repeated oral, IV infusion, mixed regimens).

Steady-state detection applies a trough variation criterion (<5%) across the last three dosing intervals. The solver reports Css,max, Css,min, accumulation ratio (AR = Css,max / Cmax,first), and dose number at which steady state is reached.

Therapeutic drug monitoring

Given observed plasma concentrations, Sisyphus refines population-level parameter distributions into individual posteriors using a dispatched Bayesian router (data/sbi/method_routing.json) that selects one of three methods per drug:

Effective sample size (ESS = $1/\sum w_i^2$) is monitored for importance-based methods. Morphine routes SBI with a likelihood reweighting kernel (P6, 2026-04-19) to compensate for hierarchical variance deficiency. All three methods share a unified output contract: per-PK-parameter posterior Distribution. This mechanism bypasses the CLint prediction ceiling (R² ≈ 0.24): observed drug levels directly correct inaccurate population priors, reducing posterior CV by >50% in validation (see TDM validation).

Model-informed precision dosing

MIPD recommends adjusted doses to achieve a target steady-state concentration:

  1. Bayesian update at the observed dose yields a posterior Css distribution.
  2. Linear dose scaling: $dose_{new} = dose_{current} \times (C_{ss,target} / C_{ss,posterior})$
  3. Clamp to clinical dose range and round to a practical increment (default 25 mg).

Linear scaling assumes non-saturable metabolism, which holds for most drugs at therapeutic concentrations. For drugs with known nonlinear pharmacokinetics (e.g., phenytoin), this approximation should be used with caution.

Drug-drug interactions

DDI is modeled by adjusting enzyme abundances in the body graph prior to ODE compilation. The engine sees modified abundances and computes clearance as usual — no engine modifications are required.

Competitive inhibition:

\[E_{eff} = \frac{E_{base}}{1 + [I] / K_i}\]

Enzyme induction (Emax model):

\[E_{eff} = E_{base} \cdot \left(1 + \frac{E_{max} \cdot [I]}{EC_{50} + [I]}\right)\]

where $E_{base}$ is baseline enzyme abundance (pmol), $[I]$ is perpetrator plasma concentration, $K_i$ is the inhibition constant, and $E_{max}$/$EC_{50}$ are induction parameters. Preset perpetrators: ketoconazole, fluconazole, quinidine (CYP inhibitors) and rifampin (CYP3A4 inducer).

Pharmacodynamic effects are computed from the concentration-time profile via an effect compartment with sigmoid Emax response:

Effect-site equilibration:

\[\frac{dC_e}{dt} = k_{e0} \cdot (C_p - C_e)\]

Sigmoid Emax response:

\[E = E_0 + \frac{E_{max} \cdot C_e^n}{EC_{50}^n + C_e^n}\]

where $k_{e0}$ is the effect-site equilibration rate constant (h−1), $E_0$ is the baseline effect, and $n$ is the Hill coefficient. This is implemented as analytical post-processing on the PK solution, not as additional ODE states, preserving engine isolation. Preset PD models are provided for midazolam (sedation) and warfarin (INR response).

Quickstart

Installation

pip install -e ".[dev,ml,chem]"

Pre-trained XGBoost models (fu,p, CLint, RB:P, VDss, Cmax) are required in models/adme/ and models/direct_pk/. Re-training scripts are provided in scripts/.

CLI

Six commands cover the clinical pharmacology workflow:

# Single-dose PK prediction
sisyphus predict --smiles "Cn1c(=O)c2c(ncn2C)n(C)c1=O" --dose 100

# Multi-dose regimen simulation (atorvastatin 40 mg QD × 14 days)
sisyphus simulate --smiles "CC(C)c1n(CC(O)CC(O)CC(=O)O)c(=O)..." \
    --dose 40 --interval 24 --doses 14

# TDM Bayesian update (midazolam 5 mg, observed 0.015 mg/L at t=1 h)
sisyphus tdm --smiles "c1ccc2c(c1)C(=NC(=O)N2)c1ccccc1F" \
    --dose 5 --obs "1.0:0.015"

# DDI prediction (midazolam + ketoconazole inhibition)
sisyphus ddi --smiles "c1ccc2c(c1)C(=NC(=O)N2)c1ccccc1F" \
    --dose 5 --inhibitor ketoconazole

# MIPD dose recommendation (target Css,max = 0.02 mg/L)
sisyphus dose-adjust --smiles "c1ccc2c(c1)C(=NC(=O)N2)c1ccccc1F" \
    --dose 5 --obs "1.0:0.015" --target-css 0.02

# Holdout benchmark (add --compute-pi for empirical 90% PI coverage; diagnostic only)
sisyphus benchmark --holdout

All commands accept --verbose for debug-level logging.

Python API

from sisyphus.pipeline.predict import predict

result = predict("Cn1c(=O)c2c(ncn2C)n(C)c1=O", dose_mg=100.0)

print(result.pk.cmax.mean)    # 1.18 mg/L
print(result.method)          # "hybrid"
print(result.confidence)      # "high"

Engine-only mode (known compound parameters)

For validation or mechanistic studies, the engine can be driven directly from compound YAML files, bypassing ADME prediction:

from pathlib import Path
import numpy as np
from sisyphus.graph.builder import build_from_yaml
from sisyphus.compounds import load_compound
from sisyphus.engine.compiler import ODECompiler, ResolvedParams
from sisyphus.engine.solver import solve
from sisyphus.pk.endpoints import compute_endpoints
import sisyphus.engine.flux  # register flux functions

graph = build_from_yaml(Path("data/physiology/reference_man.yaml"))
drug = load_compound(Path("data/compounds/midazolam.yaml"))

compiled = ODECompiler().compile(graph)
rng = np.random.default_rng(42)
params = ResolvedParams(graph.sample(rng), drug.sample(rng))

y0 = np.zeros(compiled.n_states)
y0[compiled.state_index[drug.administration_node]] = drug.dose_mg

result = solve(compiled, params, y0, t_span=(0, 24))
pk = compute_endpoints(result)

print(f"Cmax: {pk.cmax.mean:.4f} mg/L")  # 0.0069 mg/L (matches Omega ±0.5%)

Monte Carlo uncertainty

from sisyphus.engine.uncertainty import UncertaintyEngine

ue = UncertaintyEngine()
mc = ue.propagate_fast(compiled, graph, drug, n_samples=1000)

print(mc.pk.cmax)                # Distribution(mean≈1.18, cv≈0.3)
print(mc.cmax_90ci)              # (0.57, 1.59) mg/L
print(len(mc.cmax_samples))      # 1000 individual realizations

Validation

Engine validation against Omega PBPK

Four drugs with known compound parameters were simulated and compared against the Omega PBPK ODE engine (35-state hardcoded model):

Drug Dose Sisyphus Cmax (mg/L) Omega Cmax (mg/L) Relative error
Midazolam 2 mg PO 0.006911 0.006943 0.5%
Caffeine 100 mg PO 1.7151 1.7139 0.1%
Warfarin 10 mg PO 0.4917 0.4922 0.1%
Propranolol 80 mg PO 0.1353 0.1355 0.1%

Mass balance error < 10−12 for all simulations.

Holdout benchmark (SMILES → Cmax)

External validation on a Murcko scaffold-stratified holdout set (N=107, seed=42, never used in training or model selection). The holdout set integrates observed concentration–time profiles from the Open Systems Pharmacology (OSP) repository, curated literature PK data, and FDA DailyMed labels. Performance is reported using AAFE (Absolute Average Fold Error; Obach et al., 1997) with bootstrap 95% confidence intervals (10,000 resamples on log10(fold error) ):
\[AAFE = 10^{\operatorname{mean}\left(\left|\log_{10}\frac{C_{max,pred}}{C_{max,obs}}\right|\right)}\]
Track AAFE 95% CI %2-fold %3-fold N
Meta-learner (production) 2.784 [2.41, 3.25] 43.9% 62.6% 107
Engine only 4.458 [3.69, 5.43] 27.1% 41.1% 107
ML only 3.010 [2.56, 3.56] 43.0% 57.9% 107
Meta, in-domain 2.833 [2.39, 3.38] 42.0% 60.5% 81
Reproducibility note (2026-05-09 audit-driven update; B-03 refresh 2026-05-20; B-03.x literature-IVIVE 2026-05-25; B-02 Phase 2 UGT registry 2026-05-27). These numbers reflect a public-clone deterministic state: a fresh git clone followed by pip install -r requirements-lock.txt reproduces the headline AAFE to 3 sig figs. Per-drug Cmax bit-identity additionally requires matching the exact numerics stack (Python minor version, BLAS implementation, libomp build) used to generate the canonical data/training/4track_holdout_predictions.json; on differing environments per-drug Cmax can drift up to ~12% on individual drugs (largest observed: trazodone engine 0.291 → 0.256) without materially shifting aggregate AAFE. The previous cache (Meta 2.679 [2.30, 3.14], In-domain 2.733) was generated on a local-developer environment that conditionally loaded two artifacts not present in this repository: a proprietary DrugBank export (data/drugbank/, academic license required, gitignored) and a residual-correction logP XGBoost model (models/adme/logp_correction.json, gitignored). Both shifted Cmax predictions silently for the drugs they covered. Removing the silent shift moves Meta AAFE from local-artifact values near 2.68 to the public-clone 2.75–2.77 range. The 2026-05-20 B-03 refresh adds clopidogrel to the prodrug registry (parent-observation scoring + per-enzyme CES1/CYP yields) and fixes a parallel double-counting bug in cyp_clearance_overrides.lookup_metabolic_fraction (full InChIKey missed against the non-isomeric clinical_pk.json SMILES, leaving the default XGBoost-derived hepatic CL running alongside the explicit ProdrugActivationEdges); the fix shifts Meta AAFE 2.751 → 2.772 (+0.7%) and the In-domain Meta from 2.837 → 2.862, both well inside the bootstrap CI. The 2026-05-25 B-03.x refresh replaces the B-03 placeholder enzyme affinities (0.030 each for CES1/CYP3A4/CYP2C9) with literature-IVIVE values derived from Subash 2025 (Mol Pharm PMC12673578) rCES1 Vmax/Km + Boberg 2017 (PMC5267516) CES1 hepatic abundance + Kazui 2010 (DMD 38:92–99) 85/15 inactive/active fate split partition — CES1 0.0586, CYP3A4 0.0322, CYP2C9 0.0817 μL/min/pmol, preserving the literature 85/15 fate split and yielding a 1.92× total CLint scale-up over the placeholder; the resulting clopidogrel parent Meta fold tightens 5.15× → 4.67×, aggregate Meta AAFE shifts 2.772 → 2.769 (Δ=−0.0025, within bootstrap CI noise), and the clopidogrel registry disposition flips from ceiling_accepted to literature_applied. Bootstrap 95% CIs (10,000 resamples on log10(fold) , seed=20260422, computed 2026-05-20 after the B-03 fix) are above; the refreshed Meta CI [2.37, 3.26] overlaps the previous [2.30, 3.14], confirming the cumulative artifact + double-count shift remains within sampling uncertainty. The 2026-05-27 B-02 Phase 2 refresh activates the previously-disabled UGT path via 2 literature-curated substrate registries (data/enzymes/{ugt2b7,ugt1a9}_substrates.json, 8 seed drugs: morphine, codeine, ketorolac, indomethacin via UGT2B7; dapagliflozin, etodolac, bexagliflozin, glasdegib via UGT1A9) plus 2 abundance entries in data/physiology/reference_man.yaml (UGT2B7 2.43e6 pmol, UGT1A9 8.10e5 pmol; Achour 2017 PMC5328673 / Margaillan 2015 reference range). The cycle ships with no DrugBank dependency for the UGT path — closes the DE-36 reproducibility blocker. Gate-D 99-of-107 bit-identical verified (only the 8 seeds shift on the same numerics stack). Meta AAFE shifts 2.692 → 2.698 (Δ = +0.0067, 1.6% of the new bootstrap CI half-width [2.32, 3.17] — well within sampling noise; see data/validation/4track_ci_2026-05-27_B02.json). 6 of 8 seeds improve (under-predicted drugs move toward observation); 2 of 8 (morphine, codeine) worsen because their pre-B-02 over-prediction relied on CYP-default over-extraction that the correct UGT path now partly displaces — a secondary diagnostic finding logged as DE-38, with docs/_internal/backlog.md §B-13 (local-only) scoping the Phase 2.x abundance/IVIVE recalibration to address it (subsequently shipped as PR #49, metric-neutral). The headline table values above reflect the post-B-02 same-numerics-stack state and supersede the prior 2.772 headline (which was tied to a different numerics stack — Python/BLAS — and produced an aggregate-AAFE drift of ~0.08 unrelated to any cycle change, per the established ~12% per-drug numerics-stack drift). Prospective slice refresh subsequently shipped (N=14, PR #56) and was then expanded + decontaminated to N=28 (2026-06-01); the delta was not small — the expanded set reverses the earlier favorable reading (see Prospective validation below). Artifacts: data/validation/4track_ci_2026-05-12_v0.4.json (B-03 era), data/validation/4track_ci_2026-05-27_B02.json (current). Subsequent metric-neutral cycles — B-13 gut-UGT correction (2026-05-29, ΔMeta AAFE ≈ −3×10⁻⁵) and B-14 hepatic-UGT IVIVE (2026-05-30, no-op / DE-40) — left the headline at 2.698. FLUX-1 (2026-06-04, PR #65) then moved the headline to the table’s current 2.784: the engine fix corrected a flow-limitation double-count that capped hepatic/gut extraction at E→0.5 (a real, triple-verified bug), but fixing it regresses the benchmark (the wrong formula was load-bearing as calibration). Of the 2.698→2.784 move, ~+0.8% is the FLUX-1 effect itself and the rest is a stack refresh (the cache was regenerated on the current CI stack; the prior 2.698 predated it). This is a correctness-first change — correct physics over a higher-but-wrong number. Cache regenerated on the canonical CI stack via .github/workflows/flux1-regen.yml; CIs data/validation/4track_ci_2026-06-04_flux1.json.

The 4-track meta-learner combines mechanistic PBPK (Engine), data-driven XGBoost Cmax (ML), a closed-form CL/F analytical (CLF), and a conditional VDss analytical track. Weights are compound-type-adaptive and LOOCV-calibrated: base compounds blend Engine 0.60 / ML 0.40; other compounds use Engine 0.35 / ML 0.50 / CLF 0.15, with VDss 0.20 added when applicability criteria are satisfied (and the remaining tracks re-scaled by ×0.80 so the total is 1.0). In-domain AAFE (N=79) excludes 28 drugs flagged as out-of-applicability-domain (prodrugs, high-MW, extreme lipophilicity, extended-release formulation mismatch). The current public-clone cache (data/training/4track_holdout_predictions.json) carries in-domain N=79; an earlier numerics stack briefly reported N=81 after the logp residual correction pushed two borderline drugs across the high-lipophilicity AD threshold, but that correction is no longer applied in the public-clone state.

Prospective validation (FDA NMEs approved 2024–2025, single-active-ingredient oral small molecules, production-clean — none appears in any shipped-model training input or the engine reference; all re-scored on one numerics stack under public-clone state, 2026-06-01, N=28):

Slice AAFE 95% CI %2-fold %3-fold N
All 3.21 [2.42, 4.37] 25.0% 53.6% 28
In-domain 3.20 [2.06, 5.23] 37.5% 56.2% 16
— existing 12 (decontaminated, rescored) 2.52 50.0% 12
— new 16 (2024–2025 NMEs) 3.84 6.2% 16

Prospective generalization is worse than retrospective (3.21 > 2.698 overall), reversing the earlier “favorable direction” reading. That reading (N=15, 2.402 < 2.698) was a small-sample / curation artifact — the exact under-powering the cherry-picking audit flagged. The expanded set was built two ways:

The new 2024–2025 NMEs are markedly harder for the engine (AAFE 3.84, only 6% within 2-fold; worst: mirdametinib 30× and sevabertinib 18× under-prediction, both FDA-label-verified). The reversal is robust: dropping the two worst folds still leaves overall 2.76 (> 2.698) and the median fold is 2.72. The N=28 CI [2.42, 4.37] is wide and still overlaps the retrospective in-domain Meta CI, so the gap is directional, not yet statistically separated. Bootstrap CIs 2026-06-01 (10,000 resamples on log10(fold) , seed=20260422) via scripts/bootstrap_4track_ci.py. Artifacts: data/validation/prospective_N28_public_only_2026-06-01.json (per-drug folds + full methodology/exclusion record), data/validation/prospective_ci_2026-06-01_N28.json; scored via scripts/score_prospective_candidates.py. Superseded N=14/N=15 caches retained for audit trail.

Cherry-picking caveat. The 107-holdout has been used for ~47 configuration feedback cycles (track weights, routing, meta-learner variants). A quantitative audit (docs/claude/cherry_picking_audit_2026-04-22.md) scores aggregate risk 4.65/10 (moderate). The retrospective-contamination estimate (2.85–3.10 from the audit) overlaps the upper tail of the current public-clone Meta bootstrap CI ([2.32, 3.17], point estimate 2.698), meaning the headline cannot statistically reject the null hypothesis that tuning inflated AAFE. A secondary permanent holdout (N50) is planned per docs/claude/cherry_picking_process_v1.md.

Multi-dose regimen validation

Three drugs were simulated at clinical dosing regimens and compared against FDA-label steady-state Cmax values:

Drug Regimen Predicted Css,max (mg/L) FDA label Css,max (mg/L) Fold error
Atorvastatin 40 mg QD 0.027 0.029 0.93
Metformin 500 mg BID 0.55 1.0 0.55
Warfarin 5 mg QD 0.34 1.4 0.24

Atorvastatin (CYP3A4-mediated, moderate protein binding) was predicted within 7% of the label value. Metformin (renal elimination-dominant, fu ≈ 1.0) and warfarin (fu = 0.01, highly protein-bound) show expected under-prediction consistent with the model’s known limitations in renal clearance modeling and CLint over-prediction for highly bound compounds. Accumulation ratio direction was correct for all three drugs. Steady-state detection operated correctly in all cases.

TDM validation

Bayesian update was validated in two stages: a single-drug functional test, then a multi-drug benchmark across diverse pharmacokinetic profiles. The tables below report the Importance Sampling baseline (legacy); production now uses a dispatched SBI/IS/IBIS router (data/sbi/method_routing.json) with 12/13 production drugs routing to SBI after SBC-gate validation. SBI provides millisecond-scale inference with equivalent or better CV reduction; detailed SBC + per-drug coverage reports are tracked separately.

Single-drug validation (midazolam, 5 mg PO, one observation at t = 1 h, 10% assay CV):

Metric Prior Posterior
Cmax CV 44.3% 19.8%
ESS 586.6 (29.3% of 2,000)
CV reduction 55.3%

Multi-drug benchmark (5 holdout drugs, synthetic patient observations scaled from engine C(t) profiles to observed Cmax, 10% log-normal assay noise, seed = 42):

Drug Type 1-obs CV reduction 2-obs CV reduction 3-obs CV reduction 1-obs ESS
Morphine base 76.3% 77.0% 74.6% 428
Amantadine base 74.0% 74.5% 74.7% 514
Ketorolac acid 87.7% 92.6% 90.1% 2.8
Clozapine neutral 68.7% 76.2% 76.9% 482
Rivaroxaban neutral 83.8% 93.3% 98.3% 7.1

Across all 15 runs (5 drugs × 3 observation scenarios): mean CV reduction 81.2%, mean error reduction 79.8%, 90% CI coverage 67%. A single observation suffices to reduce Cmax CV by 70–88% for drugs where the population prior fold error is below 2.5×. For drugs with larger prior errors (ketorolac, fold error 3.25×) or multi-observation scenarios, effective sample size degrades below 10, indicating particle degeneracy in the importance sampler. Sequential Bayesian methods (ensemble Kalman filter, particle filter) would be required for these cases.

Timepoint sensitivity analysis (morphine, single observation): t = 1.0 h (near Tmax) yielded maximal CV reduction (76.3%); observations beyond 4 h post-dose provided diminishing information (CV reduction 34%). Seed sensitivity across three random seeds was 0.8%, confirming robustness at N = 2,000 prior samples.

Performance

Operation Time Configuration
Full prediction (SMILES → Cmax) 414 ms Deterministic, single core
ODE solve (full fidelity) 106 ms LSODA, rtol=10−8, atol=10−10
ODE solve (MC fast path) 33 ms LSODA, rtol=10−4, atol=10−6
MC N=1,000 33.5 s Pure Python RHS (no JIT compilation)
RHS evaluation 31 μs 54 flux specs per call

Single-patient deterministic prediction completes in <500 ms, compatible with interactive clinical decision support workflows. MC propagation at N=1,000 requires ~34 s due to pure Python ODE evaluation; JIT compilation (e.g., via Numba) is an optimization path not yet pursued.

Test suite

871 passed / 0 failed / 27 skipped / 3 xfailed (901 outcomes, full sweep 2026-06-01, 3:34; skip count includes artifact-conditional markers per tests/_artifact_helpers.py) covering graph construction, ODE compilation, flux functions (including ECM + V3 windowed IV-Cmax + ProdrugActivation + OneCompartmentElimination), solver correctness, mass balance (incl. two-species analytical 2C cascade), ADME prediction, meta-learner calibration, multi-dose regimen, TDM Bayesian update via SBI/IBIS/IS dispatch, MIPD dose recommendation, DDI, PK/PD, applicability-domain detection, prodrug-activation registry + pipeline integration, pharmacogenomic phenotype scaling (SLCO1B1, NAT2, UGT1A1), UGT2B7/UGT1A9 public substrate registry (B-02 Phase 2), gut-UGT abundance + hepatic-UGT IVIVE regressions (B-13/B-14), MMPK holdout-leak and JAX-RHS flux-drop guards (PR #53), and holdout benchmark reproducibility.

Persistent xfails (3): 3 statin Cmax tests under ECM (rosuvastatin, atorvastatin Peff over-prediction; fluvastatin under-prediction FE 4.79 in the ECM-forced gate / FE 1.54 in the production no-ECM predict() path — issue #21 closed post-PR #29 as not-ECM-applicable per Niemi 2009 CYP2C9-dominance; the gate failure is the intended signal that ECM should not be activated for fluvastatin). The 3 prodrug 3-fold clinical validation gates (sepiapterin, tebipenem-pivoxil, fostamatinib) that were previously xfailed per spec § 3.3 mechanistic-A doctrine are now pytest.skip-gated under the public-clone state (DrugBank-conditional disposition data not present in this clone), preserving the mechanistic-A semantics without polluting the xfail count. remdesivir was promoted out of xfail in PR #43 (fold 2.78 < 3.0 gate).

No failing tests. A previously documented “known failing” entry for test_irinotecan_returns_active_sn38_cmax (claimed SN-38 Cmax 9.71 mg/L vs gate < 1.0 mg/L) was a documentation error introduced 2026-05-08 in commit 71be8d0 — the test has always passed since its introduction in PR #34 (verified 2026-05-16: SN-38 Cmax 0.0466 mg/L, within Slatter 2000 clinical 0.03-0.10 mg/L range; the same value is recorded in docs/claude/experiment-log.md under the 2026-05-08 entry). The previously listed cached-AAFE assertion and v3 enzyme-leak audit failures were resolved in PR #43 (cached test refreshed to 2.751 under public-clone state; subsequently advanced to 2.772 via the B-03 clopidogrel registry fix, 2.769 via B-03.x literature-IVIVE, and 2.698 via the B-02 Phase 2 UGT public registry activation, then 2.784 via the FLUX-1 flow-limitation fix — current pinned test is test_cached_holdout_aafe_is_2p784; v3 leak audit now passes). Headline AAFE (cache 2.784) is re-runnable via scripts/run_engine_benchmark.py.

Architecture

SMILES + dose
    │
    ▼
 predict ──► DrugOnGraph (enzyme-level, all values are Distribution)
                  │
                  ▼
             engine ◄── BodyGraph (from YAML)
             (compile graph → ODE → solve → MC propagate)
                  │
                  ▼
               pk (Cmax, AUC, t½ from SimResult)
                  │
    ml ───────────┤
    (direct PK)   │
                  ▼
             pipeline (meta-learner → final PredictionResult)
                  │
       ┌──────────┼──────────┐
       ▼          ▼          ▼
    regimen      ddi       pkpd
   (multi-dose, (enzyme    (effect
    TDM, MIPD)  adj.)     compartment)
Layer Responsibility Depends on
graph/ BodyGraph, node/edge types, YAML builder core
engine/ ODE compiler, flux registry (including ECM), solver, MC core, graph
predict/ SMILES → chemistry → ADME → DrugOnGraph + transporter DB core
ml/ XGBoost Cmax, CL/F, VDss predictors, 4-track meta-learner core
pk/ SimResult → PKEndpoints (route-aware) core
regimen/ Multi-dose solver, TDM method dispatch (SBI/IS/IBIS), MIPD core, engine, graph, sbi
sbi/ Simulation-based inference training + amortized posterior core, engine
pipeline/ Orchestrator wiring all layers all layers
ddi.py Drug-drug interactions (competitive inhibition, Emax induction) core, graph
pkpd.py PK/PD effect modeling (effect compartment, sigmoid Emax) core

Layer isolation. No cross-layer imports outside designated dependencies. predict does not import engine. engine does not import predict. regimen wraps engine without modifying it. Shared data types live in core.py.

Design principles

  1. Identity-blind engine. The ODE compiler and flux functions operate on node/edge types, never on identities. No string matching on organ names, enzyme names, or drug names exists in engine/. Replacing all organ names with random strings produces identical numerical results.

  2. Distribution-native. All physiological and drug parameters are Distribution objects. Point estimates are represented as Distribution(mean=x, cv=0). The uncertainty system is not an add-on; it is the system’s native representation.

  3. Compile once, parameterize many. Graph topology is compiled into an ODE skeleton once. MC iterations change only parameter values, not structure. 1,000 MC samples = 1 compilation + 1,000 ODE solves.

Extending the Model

The architecture is designed so that new compartments, routes, populations, interaction models, and clinical workflows require zero changes to the ODE engine. This was validated empirically: subcutaneous injection, pediatric physiology, tumor compartment, DDI, PK/PD, multi-dose regimen, and TDM were each implemented with 0 lines changed in src/sisyphus/engine/.

New organ (tumor compartment)

nodes:
  - name: tumor
    type: organ
    volume: 0.05
    composition: {fn: 0.013, fp: 0.010, fw: 0.700, pH: 6.8}
edges:
  - {source: arterial_blood, target: tumor, type: flow, flow_fraction: 0.005}
  - {source: tumor, target: venous_blood, type: flow}

New route (subcutaneous injection)

graph.add_node(Node(name="sc_depot", node_type="lumen", volume=Distribution(0.01)))
graph.add_edge(AbsorptionEdge(source="sc_depot", target="venous_blood",
                               ka_fraction=Distribution(1.0)))

New population (pediatric)

Allometrically scaled physiology (cardiac output ∝ BW0.75) with ontogeny-adjusted enzyme abundances (e.g., CYP3A4 at 50% of adult at age 5). Same graph structure, different YAML parameters.

Drug-drug interactions

Competitive CYP inhibition via pre-simulation enzyme abundance adjustment:

from sisyphus.ddi import apply_inhibition, KETOCONAZOLE

inhibited_graph = apply_inhibition(graph, KETOCONAZOLE)
# Midazolam AUC increases 12x (clinical reference: ~15x)

PK/PD modeling

Effect compartment with sigmoid Emax response, computed as post-processing on the concentration-time profile:

from sisyphus.pkpd import compute_effect, PDModel

pd = PDModel(ke0=0.5, emax=100.0, ec50=0.05, hill=2.0)
effect = compute_effect(sim_result, pd)

Limitations

Project Structure

src/sisyphus/
├── core.py              # Distribution, TissueComposition, data contracts
├── descriptors.py       # Morgan fingerprints + RDKit descriptors
├── compounds.py         # Compound YAML → DrugOnGraph loader
├── ddi.py               # Drug-drug interaction modeling
├── pkpd.py              # PK/PD effect compartment + Emax
├── cli.py               # Command-line interface (6 commands)
│
├── graph/               # Body graph definition and construction
│   ├── types.py         # Node, Edge type hierarchy (frozen dataclasses)
│   ├── body.py          # BodyGraph (add/remove/validate/sample)
│   ├── builder.py       # YAML → BodyGraph with flow conservation check
│   └── presets.py       # reference_man() (reference_woman YAML not shipped)
│
├── engine/              # ODE compilation and solving (identity-blind)
│   ├── compiler.py      # ODECompiler, CompiledODE, ResolvedParams
│   ├── flux.py          # FluxSpec implementations (8 transport types,
│   │                    #   incl. ECM ActiveTransport, ProdrugActivation,
│   │                    #   OneCompartmentElimination)
│   ├── params_jax.py    # JAX parameter resolution (experimental)
│   ├── result.py        # SimResult dataclass
│   ├── rhs_jax.py       # JAX right-hand side (experimental)
│   ├── solver.py        # LSODA wrapper (solve, solve_mc)
│   ├── solver_jax.py    # JAX solver (experimental)
│   └── uncertainty.py   # Monte Carlo propagation
│
├── predict/             # SMILES → drug parameterization
│   ├── chemistry.py     # Molecular profiling, pKa, AD assessment
│   ├── adme.py          # XGBoost ADME property prediction
│   ├── ivive.py         # In vitro → in vivo extrapolation, Kp
│   ├── hepatic_fu_correction.py # hepatic intracellular fu correction registry
│   ├── drugbank.py      # DrugBank experimental enrichment (fup, logP)
│   ├── phenotype.py     # Pharmacogenomic phenotype (e.g., SLCO1B1)
│   ├── registry.py      # Prodrug activation registry (SMILES-keyed)
│   ├── cyp_clearance_overrides.py # metabolic_fraction registry (OATP1B1 substrates, ECM)
│   ├── non_cyp_substrates.py # NAT2 / UGT1A1 substrate-class registry (SMILES-keyed)
│   └── transporter_db.py # OATP1B1 + hepatic ECM kinetic parameters
│
├── ml/                  # Data-driven PK prediction
│   ├── features.py      # Feature vector construction
│   ├── models.py        # XGBoost Cmax predictor
│   ├── clf_predictor.py # CL/F analytical track
│   ├── registry.py      # Model manifest + feature-schema hash (H2)
│   ├── surrogate.py     # Optional ML surrogate solver (experimental; relocated from engine/ in PR #39)
│   └── ensemble.py      # 4-track meta-learner (_W_VDSS=0.20)
│
├── pk/                  # PK endpoint extraction
│   ├── endpoints.py     # SimResult → PKEndpoints (route-aware t_min_h)
│   ├── nca.py           # Non-compartmental analysis (AUC, t½)
│   └── analytical.py    # Closed-form 1-cpt and 2-cpt solutions
│
├── regimen/             # Multi-dose and clinical pharmacology
│   ├── types.py         # DosingEvent, DosingRegimen (frozen dataclasses)
│   ├── solver.py        # Event-driven multi-dose solver
│   ├── profile.py       # Steady-state detection, Css metrics
│   ├── tdm.py           # Bayesian TDM dispatcher (SBI/IS/IBIS routing)
│   ├── tdm_sbi.py       # SBI (neural posterior) TDM method
│   ├── tdm_ibis.py      # Iterative Bayesian Importance Sampling
│   ├── tdm_enkf.py      # Ensemble Kalman filter (sequential alternative)
│   └── dosing.py        # MIPD dose recommendation
│
├── sbi/                 # Amortized neural posterior estimation
│   ├── priors.py        # Drug/physiology prior distributions
│   ├── simulator.py     # Forward model for SBI training
│   ├── amortizer.py     # Neural posterior trainer (NSF)
│   ├── multi_drug.py    # Multi-drug amortizer + continuous (bw, age)
│   ├── physiology_generator.py  # Achour correlated physiology sampler
│   └── sbc.py           # Simulation-Based Calibration gate
│
├── physiology/          # Achour correlated-physiology registry (source package)
│   └── correlation_registry.py  # log-space correlation matrices + correlated-lognormal sampling
│
├── validation/          # Benchmarking infrastructure
│   ├── reference.py     # Clinical PK reference loader (331 drugs)
│   ├── benchmark.py     # Holdout benchmark runner (--compute-pi from H3)
│   ├── metrics.py       # AAFE, fold error, PI coverage
│   └── oatp_generalization.py  # ECM substrate generalization classifier
│
└── pipeline/            # End-to-end orchestration
    ├── predict.py       # SMILES → PredictionResult (route-conditional V3)
    └── config.py        # Pipeline configuration

data/
├── physiology/          # BodyGraph YAML definitions
├── compounds/           # Curated compound configurations
└── reference/           # Clinical PK reference data, holdout split

models/                  # Pre-trained XGBoost models (committed; ~31MB)

Predecessor

Sisyphus inherits validated data assets from Omega PBPK (591 commits) but not its architecture:

Inherited (data) Not inherited (architecture)
331-drug clinical reference 35-state hardcoded ODE system
Scaffold-stratified holdout split Organ-specific CLint fields
ICRP physiology values Sequential ADME → IVIVE chain
Pre-trained XGBoost models Point-estimate pipeline
Rodgers & Rowland tissue compositions Post-hoc hybrid selector

Key empirical findings from Omega that informed Sisyphus:

References

How to Cite

If you use Sisyphus in your research, please cite:

Yoon, J. M. (2026). Sisyphus (0.1.0): Graph-based whole-body PBPK
simulation with native uncertainty propagation.
https://github.com/jam-sudo/Sisyphus

The git tag v1.0.0 (commit d6ee9a6) records an earlier feature-branch milestone that has been superseded by the current main line under a more conservative measurement regime. See CHANGELOG.md for details.

Requirements

License

MIT