The first fully differentiable tumor digital twin

A neuro-symbolic architecture that bridges abstract biology and continuous physics. One shared encoder, two paradigms: mechanistic interpretation (V1) and gradient-based treatment optimization (V2).

8 Provisional Patents Filed (63/967,576 · 63/974,083 · 63/974,099 · 63/988,460 · 63/988,475 · 63/988,480 · 63/991,254 · 63/991,263)
0.704
Human C-index
0.687
PDX C-index
0.74
Uncertainty-Gated C-index
0.91
PDX Trajectory R²
5
Data Modalities
33
Cancer Types
The Data Paradox

Why neither data source can work alone

Building a tumor digital twin requires both biological depth and temporal density. No single data source has both — each fills the other's fatal gap.

Human Clinical Data

Strength: Biological Depth
RNADNAMethylationCNVHistologyClinical

6 complete modalities per patient

Fatal Gap: Temporal Sparsity

Measurements every 6-12 weeks. Only 3-4 data points per patient. Cannot learn continuous tumor dynamics from snapshots.

PDX Mouse Data

Strength: Temporal Density

Measurements every 2-3 days. Dense longitudinal time-series that reveal growth dynamics, drug response curves, and resistance emergence.

Fatal Gaps: Fidelity & Completeness
  • Mouse stroma replaces human stroma — contaminates signal
  • Typically RNA-only (no methylation, no CNV)
  • Immunodeficient host — no immune dynamics
Human data teaches the biology

Feature completeness across 6 modalities. The biological target for domain alignment.

Each fills the other's fatal gap

Clinical predictions rely on PDX-learned physics. PDX relies on human-learned feature imputation.

PDX data teaches the physics

Dense time-series reveal growth dynamics needed to train neural ODEs.

Dual-Path Architecture

Two paths, one foundation — hover to explore each path

Patient Data
RNADNACNVMethWSI
H-BDVAE v5.10(frozen)
4 modality encoders → Product-of-Experts → z_full (328d)
PATH AHuman / Clinical
z_full (328d)

Full multi-modal latent — direct input

+ WSI (1536d)

UNI2-h histopathology via late gated fusion

Specialist Hypernet v3.2

ConditionedGatingModelV3 + FiLM + physics bottleneck

0.704
C-index
0.670
Strat C
77:23
omics:WSI
Best for: Clinical decision support, tumor boards
PATH BPDX / Mouse
z_rna (201d)

RNA-derived portion of VAE latent

Domain Separation Network
SharedEncoder → z_shared (201d) — keeps biology
PrivateEncoder → z_private — species stroma, discarded
GRL + Discriminator — adversarial confusion
ConditionalPrior → z_meth (48d) + z_cnv (32d) — imputation
281d + WSI → Translator Hypernet

DSNHypernetwork + FiLM + BatchNorm

0.687
C-index
0.654
Strat C
68:32
omics:WSI
Best for: Preclinical drug development, PDX translation
Physics-Constrained ODE Parameters
ρ [0, 0.3]β [0, 1]ω > 0N₀ Σ=1σ > 0
0.00% violation rate · Triangulated validation (physics + fidelity + utility)
Emulator (5ms)| Neural ODE (45ms)
V1: Static Analysis
Drivers, Drug Sensitivity, Pathway Reports
V2: Dynamic Simulation
Trajectories, Dose Optimization, Resistance

Two paradigms, one foundation

Both paths share the same VAE encoder, serving different clinical and research needs

Version 1 — Modular Analyst

"Tell me what is happening"

CausalDriver-GAT
Per-gene driver probabilities via GATv2 (AUROC 0.93)
TxResponse
Drug sensitivity with 50 interpretable pathway concepts
Best for
Interpretable reportsDriver discoveryPathway insightsFast inference
Version 2 — Neuro-Symbolic Simulator

"Tell me when progression occurs"

Hypernetwork v3.2
Latent → physics parameters (C: 0.704 / 0.687)
Neural ODE + Emulator
Continuous trajectories, 5ms inference (R² 0.997)
EvoSim
Stochastic clonal evolution with resistance modeling
Best for
Trajectory predictionDose optimizationResistance modelingClinical trials
FOUNDATION

Multi-Modal VAE v5.10 “TME Boost”

The H-BDVAE compresses all available tumor data into a unified biological latent state (328d) that captures underlying disease biology while factoring out technical artifacts. Uses Product-of-Experts fusion for graceful handling of missing modalities.

Supported modalities

  • RNA expression2,579 genes, log1p-transformed
  • DNA mutations500 genes via DeepSomatic + CADD scoring
  • Copy number variation1,886 genes, z-score standardized
  • Methylation1,000 probes, beta values
  • Histology (WSI)UNI2-h embeddings — enters via Hypernet late fusion, not VAE

Key innovations

  • Additive Decoder Architecture — interpretable reconstruction
  • Pathway-guided factorization — 50 MSigDB Hallmark gene sets
  • Solved latent collapse — z_meth variance 0.607 (vs ~0 baseline)

Latent Space Structure (328-dim)

z_prolif (proliferation)1-dim
z_pathway (50 pathways × 4)200-dim
z_ctx_clean (context)31-dim
z_residual (non-pathway)16-dim
z_meth (epigenetic)48-dim
z_cnv_spatial (chromosomal)32-dim
Orthogonality (R²)
< 0.001
Prolif Correlation
0.96
PATENT PENDING

Safety & Robustness

PESD

Missing Modality Imputation

Coming Soon

Probabilistic Encoder Self-Distillation will train a student encoder to match a teacher that sees all modalities. Currently, missing modalities are handled via Product-of-Experts zero-masking.

Meth Imputation
r = 0.862
CNV Imputation
r = 0.967
CALIBRATED

Uncertainty Quantification

1
MC Dropout + Isotonic Calibration
Per-horizon risk calibration (ICI < 0.01)
2
Information Sufficiency Score
Continuous [0,1] abstention gate per cancer type
3
OOD Detection
PCA + Ledoit-Wolf Mahalanobis distance (1.1% flagged)

Physics Constraints (Blocker)

All ODE parameters satisfy biological constraints: ρ∈[0,0.3], β∈[0,1], ω>0. 0.00% violation rate. Violations fail regardless of C-index.

Triangulated Validation

Three independent checks — physics compliance, fidelity to data, and clinical utility — must all pass before any prediction is served.

Glass Box, Not Black Box

Every DNAI prediction decomposes into a chain of inspectable, biologically-named computations. From raw gene expression through 328 named latent dimensions, six physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.

Explore Prediction Traceability

See the evidence

Review validation metrics, benchmark comparisons, and model performance