Translates mouse model data into human-relevant predictions
Near-chance domain discrimination (target: ~0.50)
Shared encoder preserves biological utility
Correlation between imputed and real methylation latents
Correlation between imputed and real CNV latents
When we use preclinical mouse model (PDX) data, this model strips out the mouse-specific biological signal and keeps only the tumor biology that transfers to humans. It learns to separate what's universal about the tumor from what's an artifact of the host species — so drug responses observed in mice can inform predictions for patients. It also fills in missing data types (like methylation) that aren't typically measured in mouse experiments.
z_rna201RNA-derived portion of VAE latent (z_prolif + z_pathway)
z_shared201Domain-invariant tumor biology representation
z_meth_imputed48Imputed methylation latent from ConditionalPrior
z_cnv_imputed32Imputed CNV latent from ConditionalPrior
Residual shared encoder with learnable scale α
Domain confusion via gradient reversal
ConditionalPrior always-samples (never collapses to mean)
1e-32e-30→2.0 over 30 epochs64 (balanced domain sampling)10 epochs (discriminator only)200 epochsTrained on 128 PDX RNA-seq samples (prostate, GSE184427) + 9,415 TCGA. Phase 1: discriminator warmup (10 epochs). Phase 2: main training (200 epochs) with GRL ramp, separate optimizers, cosine annealing. All claims validated: domain confusion, biological utility, imputation quality.