The first fully differentiable tumor digital twin

A neuro-symbolic architecture that bridges abstract biology and continuous physics. One shared encoder, two paradigms: mechanistic interpretation (V1) and gradient-based treatment optimization (V2).

15 Provisional Patents Filed (63/967,576 · 63/974,083 · 63/974,099 · 63/988,460 · 63/988,475 · 63/988,480 · 63/991,254 · 63/991,263 · 64/029,329 · 64/029,334 · 64/029,335 · 64/029,336 · 64/029,337 · 64/036,627 · 64/036,630)
0.704
Human C-index
0.687
PDX C-index
0.74
Uncertainty-Gated C-index
0.91
PDX Trajectory R²
5
Data Modalities
33
Cancer Types
The Data Paradox

Why neither data source can work alone

Building a tumor digital twin requires both biological depth and temporal density. No single data source has both — each fills the other's fatal gap.

Human Clinical Data

Strength: Biological Depth
RNADNAMethylationCNVHistologyClinical

6 complete modalities per patient

Fatal Gap: Temporal Sparsity

Measurements every 6-12 weeks. Only 3-4 data points per patient. Cannot learn continuous tumor dynamics from snapshots.

PDX Mouse Data

Strength: Temporal Density

Measurements every 2-3 days. Dense longitudinal time-series that reveal growth dynamics, drug response curves, and resistance emergence.

Fatal Gaps: Fidelity & Completeness
  • Mouse stroma replaces human stroma — contaminates signal
  • Typically RNA-only (no methylation, no CNV)
  • Immunodeficient host — no immune dynamics
Human data teaches the biology

Feature completeness across 6 modalities. The biological target for domain alignment.

Each fills the other's fatal gap

Clinical predictions rely on PDX-learned physics. PDX relies on human-learned feature imputation.

PDX data teaches the physics

Dense time-series reveal growth dynamics needed to train neural ODEs.

Dual-Path Architecture

Two paths, one foundation — hover to explore each path

Patient Data
RNADNACNVMethWSI
H-BDVAE v5.10(frozen)
4 modality encoders → Product-of-Experts → z_full (328d)
PATH AHuman / Clinical
z_full (328d)

Full multi-modal latent — direct input

+ WSI (1536d)

UNI2-h histopathology via late gated fusion

Specialist Hypernet v3.2

ConditionedGatingModelV3 + FiLM + physics bottleneck

0.704
C-index
0.670
Strat C
77:23
omics:WSI
Best for: Clinical decision support, tumor boards
PATH BPDX / Mouse
z_rna (201d)

RNA-derived portion of VAE latent

Domain Separation Network
SharedEncoder → z_shared (201d) — keeps biology
PrivateEncoder → z_private — species stroma, discarded
GRL + Discriminator — adversarial confusion
ConditionalPrior → z_meth (48d) + z_cnv (32d) — imputation
281d + WSI → Translator Hypernet

DSNHypernetwork + FiLM + BatchNorm

0.687
C-index
0.654
Strat C
68:32
omics:WSI
Best for: Preclinical drug development, PDX translation
Physics-Constrained ODE Parameters
ρ [0, 0.3]β [0, 1]ω > 0N₀ Σ=1σ > 0
0.00% violation rate · Triangulated validation (physics + fidelity + utility)
Emulator (5ms)| Neural ODE (45ms)
V1: Static Analysis
Drivers, Drug Sensitivity, Pathway Reports
V2: Dynamic Simulation
Trajectories, Dose Optimization, Resistance

Two paradigms, one foundation

Both paths share the same VAE encoder, serving different clinical and research needs

Version 1 — Modular Analyst

"Tell me what is happening"

CausalDriver-GAT
Per-gene driver probabilities via GATv2 (AUROC 0.93)
TxResponse
Drug sensitivity with 50 interpretable pathway concepts
Best for
Interpretable reportsDriver discoveryPathway insightsFast inference
Version 2 — Neuro-Symbolic Simulator

"Tell me when progression occurs"

Hypernetwork v3.2
Latent → physics parameters (C: 0.704 / 0.687)
Neural ODE + Emulator
Continuous trajectories, 5ms inference (R² 0.997)
EvoSim
Stochastic clonal evolution with resistance modeling
Best for
Trajectory predictionDose optimizationResistance modelingClinical trials
FOUNDATION

Multi-Modal VAE v5.10 “TME Boost”

The H-BDVAE compresses all available tumor data into a unified biological latent state (328d) that captures underlying disease biology while factoring out technical artifacts. Uses Product-of-Experts fusion for graceful handling of missing modalities.

Supported modalities

  • RNA expression2,579 genes, log1p-transformed
  • DNA mutations500 genes via DeepSomatic + CADD scoring
  • Copy number variation1,886 genes, z-score standardized
  • Methylation1,000 probes, beta values
  • Histology (WSI)UNI2-h embeddings — enters via Hypernet late fusion, not VAE

Key innovations

  • Additive Decoder Architecture — interpretable reconstruction
  • Pathway-guided factorization — 50 MSigDB Hallmark gene sets
  • Solved latent collapse — z_meth variance 0.607 (vs ~0 baseline)

Latent Space Structure (328-dim)

z_prolif (proliferation)1-dim
z_pathway (50 pathways × 4)200-dim
z_ctx_clean (context)31-dim
z_residual (non-pathway)16-dim
z_meth (epigenetic)48-dim
z_cnv_spatial (chromosomal)32-dim
Orthogonality (R²)
< 0.001
Prolif Correlation
0.96
PATENT PENDING

Safety & Robustness

PESD

Missing Modality Imputation

Coming Soon

Probabilistic Encoder Self-Distillation will train a student encoder to match a teacher that sees all modalities. Currently, missing modalities are handled via Product-of-Experts zero-masking.

Meth Imputation
r = 0.862
CNV Imputation
r = 0.967
CALIBRATED

Uncertainty Quantification

1
MC Dropout + Isotonic Calibration
Per-horizon risk calibration (ICI < 0.01)
2
Information Sufficiency Score
Continuous [0,1] abstention gate per cancer type
3
OOD Detection
PCA + Ledoit-Wolf Mahalanobis distance (1.1% flagged)

Physics Constraints (Blocker)

All ODE parameters satisfy biological constraints: ρ∈[0,0.3], β∈[0,1], ω>0. 0.00% violation rate. Violations fail regardless of C-index.

Triangulated Validation

Three independent checks — physics compliance, fidelity to data, and clinical utility — must all pass before any prediction is served.

The Model Pipeline

Thirteen interconnected models from data encoding to treatment design

Treatment Design Layer

Additive modules that leverage the foundation pipeline for actionable treatment insights

Glass Box, Not Black Box

Every DNAI prediction decomposes into a chain of inspectable, biologically-named computations. From raw gene expression through 328 named latent dimensions, six physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.

Explore Prediction Traceability

See the evidence

Review validation metrics, benchmark comparisons, and model performance