Every cancer is unique. Our platform turns raw molecular and imaging data into a living simulation of that specific tumor — step by step, fully transparent.
It begins with data — the kind generated by modern sequencing labs. Gene expression (which genes are turned on and how loudly), DNA mutations (which genes are broken), copy number variation (which genes have been duplicated or deleted), and methylation (which genes have been chemically silenced). For a single patient, this amounts to roughly 6,000 individual measurements across four data types.
This is too much for any human to synthesize. It's also too noisy and high-dimensional for most AI systems to handle well. So the first thing we do is compress it.
Our foundation model — a hierarchical Bayesian VAE trained on 9,415 patients across 33 cancer types — takes all four data streams and distills them into a structured biological fingerprint of 328 numbers. But these aren't arbitrary numbers. Each one has a specific biological meaning.
One dimension captures how fast the tumor is growing, correlated at 0.96 with standard proliferation markers. Two hundred dimensions encode activity across 50 known cancer pathways — inflammation, immune response, DNA repair, metabolic reprogramming — with four dimensions per pathway. Forty-eight dimensions capture epigenetic patterns. Thirty-two encode chromosomal instability.
The key property: similar tumors end up near each other in this space, regardless of superficial differences. Two patients with completely different mutation profiles but the same underlying disease mechanism will occupy neighboring positions. And if a hospital only has RNA sequencing and nothing else, the model still produces a reliable fingerprint with appropriately wider uncertainty. Foundation model distillation captures biological signals beyond known pathways, and a regional methylation density decoder (R²=0.762) captures epigenetic patterns more faithfully than individual probe reconstruction.
This 328-dimensional fingerprint is the foundation everything else builds on.
Molecular data tells us what's happening inside cells. But pathology slides and radiology scans reveal something sequencing cannot — the tumor's physical architecture. How immune cells surround or infiltrate the tumor mass. Whether it's invading blood vessels. How different subpopulations are spatially arranged.
We integrate histopathology (whole-slide images) and radiology (CT/MRI) through late gated fusion. Rather than forcing imaging through the same encoder as molecular data — which degrades both signals — each is processed by a specialist model, then a learned gate decides how much to trust each source for each individual patient.
We're transparent about the limits. Histopathology embeddings encode scanner and staining protocols, so cross-institution transfer is unreliable. Our production system suppresses imaging from unvalidated sites by default.
The biological fingerprint splits into two complementary paradigms — the static path's driver and drug sensitivity analysis feeds into the dynamic path's tumor simulation
Driver identification matches patient mutations against 633 known drivers from IntOGen and 95 COSMIC Cancer Gene Census genes, then determines which are actively driving THIS patient's cancer using pathway context and expression evidence. Drug sensitivity prediction shows which pathways mediate the response — so clinicians can evaluate whether the recommendation makes biological sense.
Real tumor subpopulations are identified from sequencing data through clonal deconvolution (not abstract clones — real subpopulations derived from variant allele frequencies). A Resistance Sentinel preserves minor resistant subclones that would otherwise be lost. Each clone is annotated with its driver mutations and knowledge-grounded drug sensitivity. A hypernetwork generates personalized physics parameters, a neural ODE simulates treatment response, and a hybrid stochastic simulator auto-switches between continuous SDE math and exact Gillespie SSA when clone populations are small — producing distributions of possible evolutionary outcomes including resistance emergence timing and clone fate probabilities.
Our domain separation network strips mouse-specific artifacts from preclinical data, retaining only tumor biology that transfers to humans. It fills in missing data types (like methylation) by learning statistical relationships, allowing drug responses observed in mice to directly inform patient predictions.
Explore DSNKnowing which genes are mutated isn't enough — we need to know which biological pathways those mutations are actually activating. A KRAS mutation only matters if the downstream MAPK signaling cascade is actually firing. Our Mechanistic Evidence Engine runs parallel to the VAE, analyzing raw gene expression to determine exactly which pathways are driving the cancer.
Integrates 50 MSigDB Hallmark + 68 KEGG cancer/signaling + 51 Reactome pathways, with robust scaling against a reference of 9,415 patients. Determines which biological programs are actively signaling — not just expressed. Validated: KRAS signaling is significantly higher in KRAS-mutant patients (p=8.5×10⁻²⁹).
Starting from each mutated driver, the engine traces downstream through 1,743 directed causal edges (SIGNOR database) to map the full signaling cascade: KRAS → RAF → MEK → ERK. Each node in the chain is checked for druggability — identifying exactly where to intervene.
Active pathways and druggable nodes are matched to 130 curated variant-drug associations covering 75 genes and 114 drugs from OncoKB and CIViC evidence. Ranked by evidence tier (Level 1 = FDA-approved). Known resistance mutations automatically override sensitivity predictions. The engine abstains when evidence is insufficient.
Beyond predicting what will happen, the platform helps identify what to do about it. Six specialized modules work as an additive layer on top of the core pipeline, with knowledge-grounded drug sensitivity from OncoKB and CIViC databases. Treatment labels extracted for 9,415 patients enable causal treatment effect estimation across regimens.
What emerges is not a single prediction but a comprehensive computational model of an individual patient's cancer. It knows which mutations are driving the disease, which pathways are active, how the tumor microenvironment is configured, how fast it's growing, how it will respond to specific treatments over time, where resistance is likely to emerge, and which therapeutic vulnerabilities it has created for itself.
Every prediction decomposes into an inspectable chain of biologically named computations. From raw gene expression through 328 named latent dimensions, through physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.
This is what we mean by a digital twin. Not a metaphor. A simulation.
We publish our metrics honestly — including where models fail. Treatment optimization runs in shadow mode until externally validated. Cross-site imaging is suppressed by default. Every prediction carries calibrated uncertainty and the system abstains rather than guessing when evidence is insufficient. ISS-driven expert routing performs intelligent data quality assessment before generating predictions. Shift-aware conformal prediction provides honest uncertainty bounds under distribution shift, and GroupDRO training ensures robustness across hospitals by default.
Explore a demo patient through the full pipeline, or get in touch to discuss a validation partnership.