The first fully differentiable tumor digital twin

A neuro-symbolic architecture that bridges abstract biology and continuous physics. One shared encoder, two paradigms: mechanistic interpretation (V1) and gradient-based treatment optimization (V2).

View Architecture See Evidence

15 Provisional Patents Filed (63/967,576 · 63/974,083 · 63/974,099 · 63/988,460 · 63/988,475 · 63/988,480 · 63/991,254 · 63/991,263 · 64/029,329 · 64/029,334 · 64/029,335 · 64/029,336 · 64/029,337 · 64/036,627 · 64/036,630)

0.704

Human C-index

0.687

PDX C-index

0.74

Uncertainty-Gated C-index

0.91

PDX Trajectory R²

Data Modalities

Cancer Types

The Data Paradox

Why neither data source can work alone

Building a tumor digital twin requires both biological depth and temporal density. No single data source has both — each fills the other's fatal gap.

Human Clinical Data

Strength: Biological Depth

RNADNAMethylationCNVHistologyClinical

6 complete modalities per patient

Fatal Gap: Temporal Sparsity

Measurements every 6-12 weeks. Only 3-4 data points per patient. Cannot learn continuous tumor dynamics from snapshots.

PDX Mouse Data

Strength: Temporal Density

Measurements every 2-3 days. Dense longitudinal time-series that reveal growth dynamics, drug response curves, and resistance emergence.

Fatal Gaps: Fidelity & Completeness

Mouse stroma replaces human stroma — contaminates signal
Typically RNA-only (no methylation, no CNV)
Immunodeficient host — no immune dynamics

Human data teaches the biology

Feature completeness across 6 modalities. The biological target for domain alignment.

Each fills the other's fatal gap

Clinical predictions rely on PDX-learned physics. PDX relies on human-learned feature imputation.

PDX data teaches the physics

Dense time-series reveal growth dynamics needed to train neural ODEs.

Dual-Path Architecture

Two paths, one foundation — hover to explore each path

Patient Data

RNADNACNVMethWSI

H-BDVAE v5.10(frozen)

4 modality encoders → Product-of-Experts → z_full (328d)

PATH AHuman / Clinical

z_full (328d)

Full multi-modal latent — direct input

+ WSI (1536d)

UNI2-h histopathology via late gated fusion

Specialist Hypernet v3.2

ConditionedGatingModelV3 + FiLM + physics bottleneck

0.704

C-index

0.670

Strat C

77:23

omics:WSI

Best for: Clinical decision support, tumor boards

PATH BPDX / Mouse

z_rna (201d)

RNA-derived portion of VAE latent

Domain Separation Network

SharedEncoder → z_shared (201d) — keeps biology

PrivateEncoder → z_private — species stroma, discarded

GRL + Discriminator — adversarial confusion

ConditionalPrior → z_meth (48d) + z_cnv (32d) — imputation

281d + WSI → Translator Hypernet

DSNHypernetwork + FiLM + BatchNorm

0.687

C-index

0.654

Strat C

68:32

omics:WSI

Best for: Preclinical drug development, PDX translation

Physics-Constrained ODE Parameters

ρ [0, 0.3]β [0, 1]ω > 0N₀ Σ=1σ > 0

0.00% violation rate · Triangulated validation (physics + fidelity + utility)

Emulator (5ms)| Neural ODE (45ms)

V1: Static Analysis

Drivers, Drug Sensitivity, Pathway Reports

V2: Dynamic Simulation

Trajectories, Dose Optimization, Resistance

Two paradigms, one foundation

Both paths share the same VAE encoder, serving different clinical and research needs

Version 1 — Modular Analyst

"Tell me what is happening"

CausalDriver-GAT

Per-gene driver probabilities via GATv2 (AUROC 0.93)

TxResponse

Drug sensitivity with 50 interpretable pathway concepts

Best for

Interpretable reportsDriver discoveryPathway insightsFast inference

Version 2 — Neuro-Symbolic Simulator

"Tell me when progression occurs"

Hypernetwork v3.2

Latent → physics parameters (C: 0.704 / 0.687)

Neural ODE + Emulator

Continuous trajectories, 5ms inference (R² 0.997)

EvoSim

Stochastic clonal evolution with resistance modeling

Best for

Trajectory predictionDose optimizationResistance modelingClinical trials

FOUNDATION

Multi-Modal VAE v5.10 “TME Boost”

The H-BDVAE compresses all available tumor data into a unified biological latent state (328d) that captures underlying disease biology while factoring out technical artifacts. Uses Product-of-Experts fusion for graceful handling of missing modalities.

Supported modalities

RNA expression— 2,579 genes, log1p-transformed
DNA mutations— 500 genes via DeepSomatic + CADD scoring
Copy number variation— 1,886 genes, z-score standardized
Methylation— 1,000 probes, beta values
Histology (WSI)— UNI2-h embeddings — enters via Hypernet late fusion, not VAE

Key innovations

Additive Decoder Architecture — interpretable reconstruction
Pathway-guided factorization — 50 MSigDB Hallmark gene sets
Solved latent collapse — z_meth variance 0.607 (vs ~0 baseline)

Latent Space Structure (328-dim)

z_prolif (proliferation)1-dim

z_pathway (50 pathways × 4)200-dim

z_ctx_clean (context)31-dim

z_residual (non-pathway)16-dim

z_meth (epigenetic)48-dim

z_cnv_spatial (chromosomal)32-dim

Orthogonality (R²)

< 0.001

Prolif Correlation

0.96

PATENT PENDING

Safety & Robustness

PESD

Missing Modality Imputation

Coming Soon

Probabilistic Encoder Self-Distillation will train a student encoder to match a teacher that sees all modalities. Currently, missing modalities are handled via Product-of-Experts zero-masking.

Meth Imputation

r = 0.862

CNV Imputation

r = 0.967

CALIBRATED

Uncertainty Quantification

MC Dropout + Isotonic Calibration

Per-horizon risk calibration (ICI < 0.01)

Information Sufficiency Score

Continuous [0,1] abstention gate per cancer type

OOD Detection

PCA + Ledoit-Wolf Mahalanobis distance (1.1% flagged)

Physics Constraints (Blocker)

All ODE parameters satisfy biological constraints: ρ∈[0,0.3], β∈[0,1], ω>0. 0.00% violation rate. Violations fail regardless of C-index.

Triangulated Validation

Three independent checks — physics compliance, fidelity to data, and clinical utility — must all pass before any prediction is served.

The Model Pipeline

Thirteen interconnected models from data encoding to treatment design

Foundation

H-BDVAE v5.10

Multi-modal encoder with Product-of-Experts fusion

328-dim latentR² < 0.001

Transfer

Domain Separation Network

Strips mouse stroma, imputes missing epigenetics from RNA

201d → 281dMeth r=0.86

V1 Static

CausalDriver-GAT

Context-aware driver gene identification via GATv2

AUROC 0.93AUPRC 0.99

V1 Static

TxResponse

Drug sensitivity with concept bottleneck architecture

50 concepts3-phase trained

V2 Dynamic

Hypernetwork v3.2

Dual-path physics-informed parameter generation

Path A: C 0.704Path B: C 0.687

V2 Dynamic

Neural ODE

Continuous tumor dynamics with Dopri5 solver

PDX R² 0.91Emulator 5ms

V2 Dynamic

EvoSim

ODE-coupled stochastic ensemble with outcome distributions

100-run ensembleResistance timing

Treatment Design Layer

Additive modules that leverage the foundation pipeline for actionable treatment insights

Treatment Design

Treatment Optimizer

Counterfactual treatment ranking with TARNet + GDSC

ρ 0.727Shadow mode

Treatment Design

Combination Discovery

Zero-shot combo prediction via orthogonal clonal targeting

ρ 0.8001,209 pairs

Treatment Design

Schedule Optimization

PK/PD-constrained dosing with toxicity-aware scheduling

42% dose reductionvs concurrent

Treatment Design

Synthetic Lethality

28 curated pairs + trained ML classifier (ρ=0.776)

28 pairs + ML v237/37 tests

Treatment Design

Immunogenic Variants

TME-aware variant prioritization for immunotherapy

11 hallmarks34/34 tests

Treatment Design

Methylation Decoder

Epigenetic reconstruction and TSG silencing detection

R² 0.51919 TSGs

Explore All Model Cards

Glass Box, Not Black Box

Every DNAI prediction decomposes into a chain of inspectable, biologically-named computations. From raw gene expression through 328 named latent dimensions, six physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.

Explore Prediction Traceability

See the evidence

Review validation metrics, benchmark comparisons, and model performance

View Evidence Contact Us