Prediction Traceability

Every number has a name

DNAI predictions are not opaque scores. They are auditable chains of biologically-named computations — from raw gene expression through named pathways, physics-constrained parameters, to time-resolved trajectories with calibrated uncertainty.

0d
Named latent dimensions
0
Physics parameters with units
0
MSigDB Hallmark pathways
0
Inspectable pipeline stages

The trust problem in oncology AI

Most AI platforms produce a single score — response probability: 0.72 — with no way to inspect what drove that number, which biological assumptions it encodes, or where it might be wrong.

When clinicians and researchers cannot trace a prediction back to its source, adoption fails. We have seen this repeatedly: technically capable systems that never leave the pilot stage because no one trusts them enough to act.

Silent degradationmissing data degrades predictions with no warning
Impossible outputsunconstrained models predict negative growth or >100% drug effect
No audit trailimpossible to reproduce or verify predictions after the fact
Post-hoc explanationsexplanations bolted on after the model was trained
The Pipeline

Five inspectable stages

A DNAI prediction flows through five stages. At each stage, the computation is decomposable — you can pause, inspect intermediate values, and verify they make biological sense.

Stage 1

Input Data

Named genes, named variants

Up to 6 data modalities per patient. Every gene is HGNC-standardized. The gene list is version-locked with SHA-256 verification — no silent changes between model versions.

RNA
2,579 genes
DNA Mutations
500 cancer genes
Copy Number
1,886 segments
Methylation
1,000 CpG probes
Histology (WSI)
1,536d embedding
Imaging
128d RadEncoder
SHA-256 checksum verified at load — exact gene list reproducible across versions
Stage 2

Structured Latent Space

328 named dimensions

The VAE does not produce an opaque embedding. Its 328 dimensions are structurally partitioned into biologically-named groups. Each is inspectable.

z_prolif1d
0%

Proliferation rate (correlates with Ki67, r=0.96)

z_pathway200d
61%

50 MSigDB Hallmark pathways × 4 dims — named biological processes

z_ctx31d
9%

Biological context for driver gene identification

z_residual16d
5%

Residual biology not assigned to named pathways

z_meth48d
15%

Methylation patterns — epigenetic regulatory state

z_cnv32d
10%

Chromosomal spatial structure

Disentanglement guarantee: Proliferation (z_prolif) is verified independent from context (z_ctx) — linear probe R² < 0.001. The platform cannot confuse "growing fast" with "has specific driver mutations."
Stage 3

Domain Separation

What's tumor, what's artifact

For preclinical PDX data (Path B), the DSN decomposes the signal into two explicit branches. For direct human predictions (Path A), this stage is bypassed.

Shared (kept)

Species-invariant tumor biology — 201 dimensions. Cancer subtype accuracy >90%.

Domain acc ~58% (species-blind)
Private (discarded)

Species-specific stroma artifacts — 64–128 dims. Mouse contamination removed.

Domain acc ~100% (captured)
Stage 4

Physics Parameters

Six numbers with physiological units

The Hypernet transforms the latent into six named ODE parameters. Each has a physical unit, a valid range enforced by architecture, and a plain-English meaning.

ρ[0, 0.3]
Growth rate
How fast the tumor doubles
day⁻¹
β[0, 1]
Drug sensitivity
Cell-kill efficiency
dimensionless
K>0
Carrying capacity
Maximum tumor burden
relative
ω≥0
Immune clearance
Immune elimination rate
day⁻¹
N₀sum=1
Initial populations
Clonal composition
fraction
σ>0
Noise scale
Biological stochasticity
dimensionless
Physics constraint guarantee: Bounds are enforced by activation functions (sigmoid, softplus, softmax) — not post-hoc clipping. The model architecturally cannot produce impossible values.
Stage 5

Trajectory + Uncertainty

Time-resolved predictions with confidence

The ODE parameters produce time-resolved trajectories — not a single score, but a curve showing how the tumor evolves over time, with calibrated uncertainty.

B(t) vs time
Resistance
Sensitive Resistant

Trajectory outputs

Total tumor burden B(t) over time
Per-clone population dynamics f_k(t)
Resistance emergence timing
Dose-response optimization surface

Uncertainty layers

MC Dropout20 stochastic passes → mean ± CI
Input completenessMissing modalities → wider intervals
Rare cancer flagFew training samples → higher uncertainty
The "I don't know" guarantee: When uncertainty exceeds a threshold, the platform reports "Data Insufficient" rather than presenting a deceptively precise number. The uncertainty-gated subset achieves C-index 0.74 on external CPTAC data — a 4-point lift over unfiltered, demonstrating the model knows what it doesn't know.
Built-In Safety

The platform checks its own work

Before any prediction reaches a clinician, six automatic checks run in the background — providing context, catching problems, and explaining exactly what the model does and does not know.

"Patients Like Yours"

Every prediction comes with context from similar cases

The platform finds the most biologically similar patients in its training data — not just the same cancer type, but the same pattern of active biological pathways. This grounds every prediction in real patient outcomes.

What a clinician sees: "This prediction is based on 47 similar breast cancers with the same pathway profile. The most similar patients had a median survival of 14.2 months."

Turns "the model says" into "patients like yours show"
Biological similarity
Your patient Similar (n=47) All patients

When the model says "I don't know"

It tells you exactly why — and what would help

Most AI systems either give a confident answer or a vague warning. DNAI gives specific, actionable reasons for uncertainty — so clinicians know exactly what is missing and how to improve the prediction.

RARE_TYPE
Rare cancer type
Few training examples for this cancer — predictions are less certain
MISSING_WSI
Missing histology
No tissue slide provided — confidence intervals are wider
OOD_LATENT
Unusual biology
This patient's profile is unlike any in the training data
LOW_DENSITY
Few similar patients
Not enough biologically similar cases to make a confident call
Example: "Confidence: Moderate. MISSING_WSI — no histology slide provided, confidence interval widened 40%. Action: provide tissue slide to improve prediction."

What drove this prediction?

For every prediction, the platform shows which biological pathways had the most influence on each output — traced through the actual computation, not added as an afterthought.

DNA RepairDrug sensitivity
Cell Migration (EMT)Growth rate
P53 PathwayDrug sensitivity
MYC TargetsGrowth rate
Blood Vessel FormationImmune clearance

Example: DNA Repair drives 85% of the drug sensitivity prediction — consistent with BRCA1 loss making cells vulnerable to DNA-damaging drugs.

Have we seen this before?

Before making any prediction, the platform checks whether this patient's biology falls within its experience. If a patient is unlike anything it has been trained on, it flags this upfront — before the prediction even runs.

Within experience Novel — flagged
Training coverage

Patients whose biology falls far outside the training population are automatically flagged with wider uncertainty bounds — the platform will not overstate its confidence on unfamiliar cases.

Tumor Board Report

A one-page downloadable summary of the complete prediction — designed for printing and discussion at tumor board meetings. Everything a clinician needs to evaluate and discuss the recommendation.

Patient summary and available data
Top biological pathways driving the prediction
Physics parameters with confidence ranges
Tumor trajectory chart with resistance timing
Uncertainty flags and recommended actions
Audit hash for regulatory traceability

Stability Check

The platform automatically tests whether small, natural variations in the input data would change the prediction. If the answer flips because of measurement noise, something is wrong — and the platform catches it.

Small expression changes (±5%)
Growth rate stablepass
Remove 1 of 50 pathways
Drug sensitivity stablepass
Use 80% of tissue slide
Risk score stablepass

If a prediction is sensitive to small input changes, it is flagged as unstable — the platform will not present a fragile result as confident.

End-to-end trace example

A BRCA1-mutant breast cancer patient treated with carboplatin — traced through every stage.

Stage
Value
What you see
1. Input
RNA (2,579), CNV, BRCA1 p.E1735fs, meth, WSI
All inputs named, versioned, checksummed
2. Latent
z_prolif=1.8, DNA_Repair↓, EMT↑
High proliferation, impaired DNA repair (BRCA1 loss), mesenchymal shift
3. Separation
(Path A — bypassed for human data)
Direct from VAE — no domain separation needed
4. Parameters
ρ=0.11, β=0.72, ω=0.04, N₀=[0.7, 0.3]
Moderate growth, good initial response, 2 clonal populations
5. Trajectory
Clone 1 declines → Day 90. Clone 2 dominant → Day 140
Predicts initial response then resistance at ~Day 140
Uncertainty
Risk CI: [0.35, 0.52], timing CI: [Day 120, Day 165]
Moderate confidence — recommend monitoring at Day 100

At every stage, a clinician can ask "why?" and get a biologically grounded answer — not "because the neural network said so."

Glass box vs. black box

AspectTypical Oncology AIDNAI
OutputSingle score (0.72)Time-resolved trajectory with clonal dynamics
Intermediate valuesOpaque 512d embedding328 named dimensions (pathways, proliferation, methylation)
ParametersNone visible6 named ODE parameters with physical units
Why this prediction?Post-hoc feature importanceBiologically-named pathway activations at source
Missing dataSilent degradationExplicit reporting + uncertainty adjustment
Impossible predictionsCan output negative growthArchitecturally impossible — physics in activation functions
UncertaintyRarely providedMC Dropout CI on every output + 'Data Insufficient' flag
Audit trailMinimalSHA-256 hashed I/O, versioned models, 21 CFR Part 11

Three traceability guarantees

Structural

Every latent dimension maps to a named biological concept. There are no hidden layers that learn uninterpretable features. The architecture forces biological structure.

Physics

Every ODE parameter has a physical unit and a valid range enforced by architecture. The model cannot hallucinate biologically impossible dynamics.

Uncertainty

Every prediction has a confidence interval. Missing data, rare cancers, and model disagreement increase reported uncertainty — transparently.

Built for regulatory review

The audit infrastructure is designed to support FDA Class II SaMD requirements and 21 CFR Part 11 electronic records compliance.

Input Hashing

SHA-256 hash of every input tensor ensures exact reproducibility

Model Versioning

Every checkpoint tracked (VAE v5.10, Hypernet v3.2, DSN v1.0)

Deterministic Mode

Reproducible inference — sampling replaced with mean outputs for audit

Request Logging

Timestamp, model version, input hash, output hash, duration for every call

See the trace-through live

Request a demo to see how a real patient prediction decomposes at every stage — from raw omics to trajectory.