Prediction Traceability

Every number has a name

DNAI predictions are not opaque scores. They are auditable chains of biologically-named computations — from raw gene expression through named pathways, physics-constrained parameters, to time-resolved trajectories with calibrated uncertainty.

See the Trace-Through Back to Architecture

Named latent dimensions

Physics parameters with units

MSigDB Hallmark pathways

Inspectable pipeline stages

The trust problem in oncology AI

Most AI platforms produce a single score — response probability: 0.72 — with no way to inspect what drove that number, which biological assumptions it encodes, or where it might be wrong.

When clinicians and researchers cannot trace a prediction back to its source, adoption fails. We have seen this repeatedly: technically capable systems that never leave the pilot stage because no one trusts them enough to act.

Silent degradation — missing data degrades predictions with no warning

Impossible outputs — unconstrained models predict negative growth or >100% drug effect

No audit trail — impossible to reproduce or verify predictions after the fact

Post-hoc explanations — explanations bolted on after the model was trained

The Pipeline

Five inspectable stages

A DNAI prediction flows through five stages. At each stage, the computation is decomposable — you can pause, inspect intermediate values, and verify they make biological sense.

Stage 1

Input Data

Named genes, named variants

Up to 6 data modalities per patient. Every gene is HGNC-standardized. The gene list is version-locked with SHA-256 verification — no silent changes between model versions.

RNA

2,579 genes

DNA Mutations

500 cancer genes

Copy Number

1,886 segments

Methylation

1,000 CpG probes

Histology (WSI)

1,536d embedding

Imaging

128d RadEncoder

SHA-256 checksum verified at load — exact gene list reproducible across versions

Stage 2

Structured Latent Space

328 named dimensions

The VAE does not produce an opaque embedding. Its 328 dimensions are structurally partitioned into biologically-named groups. Each is inspectable.

z_prolif1d

Proliferation rate (correlates with Ki67, r=0.96)

z_pathway200d

61%

50 MSigDB Hallmark pathways × 4 dims — named biological processes

z_ctx31d

Biological context for driver gene identification

z_residual16d

Residual biology not assigned to named pathways

z_meth48d

15%

Methylation patterns — epigenetic regulatory state

z_cnv32d

10%

Chromosomal spatial structure

Disentanglement guarantee: Proliferation (z_prolif) is verified independent from context (z_ctx) — linear probe R² < 0.001. The platform cannot confuse "growing fast" with "has specific driver mutations."

Stage 3

Domain Separation

What's tumor, what's artifact

For preclinical PDX data (Path B), the DSN decomposes the signal into two explicit branches. For direct human predictions (Path A), this stage is bypassed.

Shared (kept)

Species-invariant tumor biology — 201 dimensions. Cancer subtype accuracy >90%.

Domain acc ~58% (species-blind)

Private (discarded)

Species-specific stroma artifacts — 64–128 dims. Mouse contamination removed.

Domain acc ~100% (captured)

Stage 4

Physics Parameters

Six numbers with physiological units

The Hypernet transforms the latent into six named ODE parameters. Each has a physical unit, a valid range enforced by architecture, and a plain-English meaning.

ρ[0, 0.3]

Growth rate

How fast the tumor doubles

day⁻¹

β[0, 1]

Drug sensitivity

Cell-kill efficiency

dimensionless

K>0

Carrying capacity

Maximum tumor burden

relative

ω≥0

Immune clearance

Immune elimination rate

day⁻¹

N₀sum=1

Initial populations

Clonal composition

fraction

σ>0

Noise scale

Biological stochasticity

dimensionless

Physics constraint guarantee: Bounds are enforced by activation functions (sigmoid, softplus, softmax) — not post-hoc clipping. The model architecturally cannot produce impossible values.

Stage 5

Trajectory + Uncertainty

Time-resolved predictions with confidence

The ODE parameters produce time-resolved trajectories — not a single score, but a curve showing how the tumor evolves over time, with calibrated uncertainty.

B(t) vs time

Resistance

Sensitive Resistant

Trajectory outputs

Total tumor burden B(t) over time

Per-clone population dynamics f_k(t)

Resistance emergence timing

Dose-response optimization surface

Uncertainty layers

MC Dropout — 20 stochastic passes → mean ± CI

Input completeness — Missing modalities → wider intervals

Rare cancer flag — Few training samples → higher uncertainty

The "I don't know" guarantee: When uncertainty exceeds a threshold, the platform reports "Data Insufficient" rather than presenting a deceptively precise number. The uncertainty-gated subset achieves C-index 0.74 on external CPTAC data — a 4-point lift over unfiltered, demonstrating the model knows what it doesn't know.

Built-In Safety

The platform checks its own work

Before any prediction reaches a clinician, six automatic checks run in the background — providing context, catching problems, and explaining exactly what the model does and does not know.

"Patients Like Yours"

Every prediction comes with context from similar cases

The platform finds the most biologically similar patients in its training data — not just the same cancer type, but the same pattern of active biological pathways. This grounds every prediction in real patient outcomes.

What a clinician sees: "This prediction is based on 47 similar breast cancers with the same pathway profile. The most similar patients had a median survival of 14.2 months."

Turns "the model says" into "patients like yours show"

Biological similarity

Your patient Similar (n=47) All patients

When the model says "I don't know"

It tells you exactly why — and what would help

Most AI systems either give a confident answer or a vague warning. DNAI gives specific, actionable reasons for uncertainty — so clinicians know exactly what is missing and how to improve the prediction.

RARE_TYPE

Rare cancer type

Few training examples for this cancer — predictions are less certain

MISSING_WSI

Missing histology

No tissue slide provided — confidence intervals are wider

OOD_LATENT

Unusual biology

This patient's profile is unlike any in the training data

LOW_DENSITY

Few similar patients

Not enough biologically similar cases to make a confident call

Example: "Confidence: Moderate. MISSING_WSI — no histology slide provided, confidence interval widened 40%. Action: provide tissue slide to improve prediction."

What drove this prediction?

For every prediction, the platform shows which biological pathways had the most influence on each output — traced through the actual computation, not added as an afterthought.

DNA Repair→ Drug sensitivity

Cell Migration (EMT)→ Growth rate

P53 Pathway→ Drug sensitivity

MYC Targets→ Growth rate

Blood Vessel Formation→ Immune clearance

Example: DNA Repair drives 85% of the drug sensitivity prediction — consistent with BRCA1 loss making cells vulnerable to DNA-damaging drugs.

Have we seen this before?

Before making any prediction, the platform checks whether this patient's biology falls within its experience. If a patient is unlike anything it has been trained on, it flags this upfront — before the prediction even runs.

Within experience Novel — flagged

Training coverage

Patients whose biology falls far outside the training population are automatically flagged with wider uncertainty bounds — the platform will not overstate its confidence on unfamiliar cases.

Tumor Board Report

A one-page downloadable summary of the complete prediction — designed for printing and discussion at tumor board meetings. Everything a clinician needs to evaluate and discuss the recommendation.

Patient summary and available data

Top biological pathways driving the prediction

Physics parameters with confidence ranges

Tumor trajectory chart with resistance timing

Uncertainty flags and recommended actions

Audit hash for regulatory traceability

Stability Check

The platform automatically tests whether small, natural variations in the input data would change the prediction. If the answer flips because of measurement noise, something is wrong — and the platform catches it.

Small expression changes (±5%)

Growth rate stablepass

Remove 1 of 50 pathways

Drug sensitivity stablepass

Use 80% of tissue slide

Risk score stablepass

If a prediction is sensitive to small input changes, it is flagged as unstable — the platform will not present a fragile result as confident.

End-to-end trace example

A BRCA1-mutant breast cancer patient treated with carboplatin — traced through every stage.

Stage

Value

What you see

1. Input

RNA (2,579), CNV, BRCA1 p.E1735fs, meth, WSI

All inputs named, versioned, checksummed

2. Latent

z_prolif=1.8, DNA_Repair↓, EMT↑

High proliferation, impaired DNA repair (BRCA1 loss), mesenchymal shift

3. Separation

(Path A — bypassed for human data)

Direct from VAE — no domain separation needed

4. Parameters

ρ=0.11, β=0.72, ω=0.04, N₀=[0.7, 0.3]

Moderate growth, good initial response, 2 clonal populations

5. Trajectory

Clone 1 declines → Day 90. Clone 2 dominant → Day 140

Predicts initial response then resistance at ~Day 140

Uncertainty

Risk CI: [0.35, 0.52], timing CI: [Day 120, Day 165]

Moderate confidence — recommend monitoring at Day 100

At every stage, a clinician can ask "why?" and get a biologically grounded answer — not "because the neural network said so."

Glass box vs. black box

Aspect	Typical Oncology AI	DNAI
Output	Single score (0.72)	Time-resolved trajectory with clonal dynamics
Intermediate values	Opaque 512d embedding	328 named dimensions (pathways, proliferation, methylation)
Parameters	None visible	6 named ODE parameters with physical units
Why this prediction?	Post-hoc feature importance	Biologically-named pathway activations at source
Missing data	Silent degradation	Explicit reporting + uncertainty adjustment
Impossible predictions	Can output negative growth	Architecturally impossible — physics in activation functions
Uncertainty	Rarely provided	MC Dropout CI on every output + 'Data Insufficient' flag
Audit trail	Minimal	SHA-256 hashed I/O, versioned models, 21 CFR Part 11

Three traceability guarantees

Structural

Every latent dimension maps to a named biological concept. There are no hidden layers that learn uninterpretable features. The architecture forces biological structure.

Physics

Every ODE parameter has a physical unit and a valid range enforced by architecture. The model cannot hallucinate biologically impossible dynamics.

Uncertainty

Every prediction has a confidence interval. Missing data, rare cancers, and model disagreement increase reported uncertainty — transparently.

Built for regulatory review

The audit infrastructure is designed to support FDA Class II SaMD requirements and 21 CFR Part 11 electronic records compliance.

Input Hashing

SHA-256 hash of every input tensor ensures exact reproducibility

Model Versioning

Every checkpoint tracked (VAE v5.10, Hypernet v3.2, DSN v1.0)

Deterministic Mode

Reproducible inference — sampling replaced with mean outputs for audit

Request Logging

Timestamp, model version, input hash, output hash, duration for every call

See the trace-through live

Request a demo to see how a real patient prediction decomposes at every stage — from raw omics to trajectory.

Request Demo View Architecture