CROSS-REFERENCES

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/967,576, filed January 25, 2026, entitled “SYSTEM AND METHOD FOR PHYSICS-CONSTRAINED SIM-TO-REAL TRANSFER LEARNING IN COMPUTATIONAL ONCOLOGY.” This application is related to commonly owned disclosures directed to complementary aspects of the same platform architecture, including (i) runtime enforcement of physical and stability constraints in hybrid neural-ODE solvers, and (ii) uncertainty-calibrated missing modality imputation and separated-state dynamics for multi-modal digital twins. Where there is any inconsistency, the present disclosure controls.

I.FIELD OF THE INVENTION

The present invention relates generally to artificial intelligence, machine learning, and computational biology. More specifically, the invention relates to systems and methods for stabilizing adversarial training pipelines in small-sample regimes ( $N < D$ ) by utilizing fixed biological ontology structures to modulate backward gradient flow via custom automatic differentiation primitives, thereby preventing variance collapse and effective dimensionality reduction during cross-species transfer learning.

II.BACKGROUND OF THE INVENTION

1.The PDX Paradox

The development of predictive models in oncology often relies on “model systems” -- surrogates for human biology such as Patient-Derived Xenografts (PDX), cell lines, or organoids. While these systems preserve certain genomic characteristics of the target human biology, they introduce systematic “domain shifts.” In many oncology datasets, the feature dimensionality ( $D$ ) exceeds the number of available samples ( $N$ ). For example, a typical PDX cohort may contain $N = 128$ samples, while the latent representation of a foundation model may have $D = 201$ dimensions or more. Standard statistical learning theory requires $N \gg D$ for reliable estimation of parameters.

Existing solutions that employ “learned attention” or “feature weighting” introduce $D$ or $D^2$ additional learnable parameters. In the PDX regime, this results in immediate overfitting, where the attention mechanism memorizes batch effects rather than learning biological structure.

2."Fighting Biology" and Training Collapse

Domain Adversarial Neural Networks (DANN) commonly utilize a Domain Separation Network (DSN) architecture with a Gradient Reversal Layer (GRL). The GRL forces a “Shared Encoder” to learn a representation that is indistinguishable between the source domain (e.g., mouse) and the target domain (e.g., human) by reversing the gradients from a domain discriminator. The conventional approach treats all feature dimensions as statistically equivalent candidates for alignment, operating under the assumption that any difference between domains is“noise” or “bias” that must be eliminated.

Because tumor microenvironment (TME) features -- immune signals, stroma, vasculature -- naturally differ between species, a standard discriminator easily identifies the domain based on these features. The resulting adversarial gradients force the encoder to erase this biological signal to satisfy the domain confusion objective.

3.Concrete Computational Pathologies

This erasure results in three concrete computational pathologies:

Variance Collapse: The variance of biologically critical modalities (e.g., imputed methylation) drops to near-zero as the model outputs constant values to satisfy the discriminator.
Dimensionality Collapse: The effective rank of the latent space plummets (e.g., from 201 to 5), reducing a rich biological representation to a few dominant dimensions.
Physical Implausibility: Downstream physics-based models (e.g., Neural ODEs) trained on these collapsed representations generate physically impossible parameters (e.g., tumor doubling times of several years).

There exists a need for a training stability mechanism that prevents these collapse modes without introducing learnable degrees of freedom.

III.SUMMARY OF THE INVENTION

The present invention provides a system and method for Semantically-Selective Adversarial Domain Adaptation that functions as a closed-loop training stability mechanism. The invention introduces a “Gradient Modulation Layer” implemented as a custom automatic differentiation (autograd) primitive that decouples the forward and backward passes of the neural network and enforces a specific ordering of operations using a fixed, non-trainable semantic mask.

Gradient Modulation Layer: A custom autograd function that acts as an identity in the forward pass (discriminator sees full features) and performs ordered gradient modulation in the backward pass: element-wise masking by a fixed semantic weight mask before adversarial sign reversal. The mask contains zero learnable parameters, satisfying the constraints of the PDX Paradox ( $N < D$ ).
Fixed Semantic Weight Mask (W): A 1D tensor of dimension $D$ stored as a non-trainable registered buffer. Values are derived from a biological ontology: weight 1.0 for tumor-intrinsic pathways (cell cycle, DNA repair) and weight 0.3 for microenvironment-dependent pathways (immune response, angiogenesis). The buffer persists in the model state dictionary but is explicitly excluded from optimizer updates.
Stability Monitor: A closed-loop controller that computes three collapse metrics during training: Effective Rank (via SVD), Imputation Variance, and Per-Pathway Domain Accuracy. Triggers automated remediation actions (halt, reduce $\lambda$ , freeze discriminator, increase TME weights) if thresholds are breached.

Distinction from Other Selective Alignment Approaches:

Some selective alignment systems choose which samples or latent subspaces to align based on empirical risk signals or phenotype-linked proxies. By contrast, the embodiments described here define the selection and blocking mechanism structurally by a semantic ontology and implement it via an explicit gradient-routing primitive. The semantic mask is a fixed or constrained operator derived from curated pathway structure, and the training system uses a custom autograd operation to enforce that only specific semantically grouped coordinates can receive alignment gradients.

IV.BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the system architecture, highlighting the custom autograd primitive that decouples forward feature flow from backward gradient modulation using a registered buffer.

FIG. 2 is a schematic diagram of the ontology mapping process, showing the conversion of biological conservation tiers into a fixed, non-trainable semantic weight mask of dimension $D$ .

FIG. 3 is a flow chart of the Stability Monitor control loop, detailing the computation of Effective Rank, Imputation Variance, and Per-Pathway Domain Accuracy metrics and the triggering of remediation actions.

FIG. 1: System Architecture with Gradient Modulation Layer

Block diagram showing the custom autograd primitive that decouples forward feature flow (identity) from backward gradient modulation (mask then reverse) using a fixed semantic weight mask stored as a registered buffer.

FIG. 2: Ontology Mapping to Fixed Semantic Weight Mask

Biological pathways are classified into conservation tiers (Intrinsic vs. TME-dependent) using MSigDB Hallmark gene sets. Tier assignments are converted to scalar weights and replicated across contiguous latent blocks to form the mask tensor W.

FIG. 3: Stability Monitor Control Loop

The Stability Monitor computes Effective Rank (SVD), Imputation Variance, and Per-Pathway Domain Accuracy at each epoch. If any metric breaches its threshold, automated remediation actions are triggered from a closed set.

V.DETAILED DESCRIPTION OF THE INVENTION

[0001] Glossary

[0001]As used herein, the following terms have the following meanings:

Foundation Model: A large-scale machine learning model (e.g., VAE) trained on broad data (e.g., Pan-Cancer TCGA) to generate latent representations applicable to multiple downstream tasks.
Latent Representation (z): A compressed vector space representation of input data, typically having dimension $D$ .
Registered Buffer: A tensor stored within a neural network module that persists in the model's state dictionary and moves with the model across computing devices (CPU/GPU), but is explicitly excluded from the optimizer's parameter groups (i.e., has no gradient history and is not updated by gradient descent).
Semantic Mask: A tensor of weights derived from an ontology (e.g., biological pathways) used to modulate feature or gradient importance.
Structural Gradient Exclusion: The mechanism of preventing gradient flow through specific network paths based on fixed structural definitions rather than learned weights.
Admissible Parameter Envelope: A defined range of parameter values (e.g., growth rates, drug kill rates) considered biologically or physically valid.
Reliability Gate (Abstention): A mechanism that prevents the system from issuing a prediction if data quality or model confidence falls below a threshold.

[0002] Architecture Consistency and Latent Dimensionality

[0002]In various embodiments, the dimensionality $D$ of the latent representation may differ based on the specific foundation model version or biological application. References to specific dimensions (e.g., 201) are non-limiting examples provided for clarity. For instance, internal version identifiers (e.g., ‘v5.10’) identify one specific implementation where $D = 328$ or a subset thereof (e.g., $D = 201$ for the RNA partition), but do not limit the claims to those specific values. The principles of ontology-guided masking apply to any $D$ -dimensional structured latent space.

[0003] System Architecture Overview

[0003]Referring to FIG. 1, the system comprises a pipeline for processing biological data from heterogeneous domains (Source: PDX/Mouse; Target: TCGA/Human):

Foundation Model (VAE): A hierarchical Variational Autoencoder configured to compress raw multi-omics data into a structured latent space ( $z_{\text{rna}} \in \mathbb{R}^{D}$ ).
Pathway-Aware Domain Separation Network (DSN): A domain adaptation module comprising a “Shared Encoder” that transforms the VAE latent space into a domain-invariant shared representation ( $z_{\text{shared}}$ ).
Gradient Modulation Layer: A specialized computational unit implemented as a custom autograd function that scales adversarial gradients using a fixed semantic mask stored in a registered buffer.
Stability Monitor: A control loop that monitors variance and effective rank metrics during training to detect and prevent collapse.

[0004] The Gradient Modulation Layer (Custom Autograd Implementation)

[0004]The core inventive step is the specific implementation of the Gradient Modulation Layer, which enforces a strict ordering of operations unavailable in standard hooks.

Hardware/Software Interaction:

The layer utilizes a Fixed Semantic Weight Mask ( $W \in \mathbb{R}^{D}$ ) stored as a registered buffer (e.g., self.register_buffer(‘mask’, W)). This specific implementation ensures:

Persistence: The mask is saved/loaded with the model weights.
Immutability: The mask is excluded from model.parameters() and the optimizer. It contains zero learnable degrees of freedom.
Structure: The mask is a 1D tensor of length $D$ . In a preferred embodiment where $D = 201$ , indices 1-200 are partitioned into 50 contiguous blocks of 4 dimensions, each mapped to a specific pathway identifier. Index 0 corresponds to proliferation.

Mathematical Operations and Implementation Constraint:

The layer is defined as a function $f: \mathbb{R}^{D} \to \mathbb{R}^{D}$ with input $x$ (the shared features) and output $y$ (input to discriminator). Let $L$ be the total loss function and $\lambda$ be the adversarial adaptation weight.

Forward Pass (Identity):

y = x

The forward pass is strictly identity. The discriminator receives the exact feature values produced by the encoder. No multiplicative masking is applied to the features themselves.

Backward Pass (Ordered Modulation):

The gradient of the loss with respect to the input $x$ is computed via the following ordered steps executed within the custom backward routine:

Intercept: Receive gradient $\nabla_y = \frac{\partial L}{\partial y}$ from the discriminator.
Modulate: Compute element-wise product with the fixed mask: $G' = \nabla_y \odot W$ .
Reverse: Apply the adversarial reversal and scaling: $\nabla_x = -\lambda \cdot G'$ .
Return: Pass $\nabla_x$ to the shared encoder.

\frac{\partial L}{\partial x} = -\lambda \cdot \left( \frac{\partial L}{\partial y} \odot W \right)

Significance of Implementation:

While algebraically commutative, the implementation within the backward() static method creates a fused operation. This prevents the “raw” reversed gradient ( $-\lambda \nabla_y$ ) from ever being materialized in the computational graph. Consequently, no external gradient hooks or optimizer steps can observe or operate on the unmasked adversarial signal, ensuring that TME-specific signals are dampened before they can drive encoder updates. This distinguishes the invention from optimizer-level gradient scaling (which operates on the final accumulated gradient) or loss function reweighting (which scales the scalar loss before backpropagation).

[0005] Stability Monitor and Control Loop

[0005]To address the “Fighting Biology” failure mode, the system implements a closed-loop controller (FIG. 3) that monitors training in real-time.

Collapse Metrics and Thresholds:

1. Effective Rank ( $r_{\text{eff}}$ )

Computed on a batch of $z_{\text{shared}}$ vectors of size $B$ (e.g., $B = 128$ ). Singular Value Decomposition (SVD) is performed on the batch matrix:

r_{\text{eff}} = |\{i : \sigma_i > 0.01 \cdot \sigma_1\}|

Threshold: Collapse is flagged if $r_{\text{eff}} < T_{\text{rank}}$ (e.g., $T_{\text{rank}} = 100$ for $D = 201$ and $B \geq 128$ ).

2. Imputation Variance ( $v_{\text{imp}}$ )

Computed on the imputed methylation block (e.g., 48 dimensions). The mean variance per feature across the batch:

v_{\text{imp}} = \frac{1}{K} \sum_{j=1}^{K} \text{Var}(M_{:,j})

Threshold: Collapse is flagged if $v_{\text{imp}} < 0.01$ . (Normal variance is $\approx 1.0$ ).

3. Per-Pathway Domain Accuracy ( $\text{Acc}_{\text{domain}}$ )

A diagnostic procedure run periodically. For each pathway block (e.g., 4 dimensions), a logistic regression classifier ( $L_2$ regularized, $C = 1.0$ ) is trained via 5-fold cross-validation to predict the domain.

Threshold: For TME-designated pathways (weight $\leq 0.3$ ), collapse is flagged if mean $\text{Acc}_{\text{domain}} < 0.55$ (indicating the signal is indistinguishable from random chance, which is undesirable for TME features that should differ).

Remediation Actions:

If any threshold is breached, the Stability Monitor triggers a remediation action selected from a closed set:

Halt: Terminate training to prevent saving a corrupted model.
Reduce $\lambda$ : Decrease the global adversarial weight (e.g., $\lambda \leftarrow 0.5 \lambda$ ).
Freeze Discriminator: Temporarily set requires_grad=False for discriminator layers to allow the encoder to recover variance.
Increase TME Weights: Switch to a fallback mask profile with higher weights for TME pathways (e.g., $0.3 \to 0.5$ ). This involves replacing the values in the registered buffer with values from a precomputed fallback mask.

[0006] Ontology Mapping and Mask Construction

[0006]Referring to FIG. 2, the Fixed Semantic Weight Mask is constructed prior to training via a deterministic mapping process:

Pathway Block Identification: The $D$ -dimensional latent space is partitioned into contiguous blocks (e.g., 50 blocks of 4 dimensions).
Tier Retrieval: For each block, the system retrieves a conservation tier from a stored ontology table (e.g., derived from MSigDB Hallmark gene sets).
Buffer Population: The scalar weights are replicated across the dimensions of their respective blocks to form the mask tensor $W$ .

Conservation Tier	Example Pathways	Mask Weight
Tier 1 (Intrinsic)	Cell Cycle, DNA Repair, MYC Targets, E2F Targets	1.0
Tier 2 (TME-Dependent)	Inflammatory Response, Angiogenesis, IL6/JAK/STAT3, Complement	0.3

Ontology-derived conservation tier assignments and corresponding mask weights.

[0007] Experimental Validation

[0007]In one non-limiting evaluation, ontology-guided masking and selective gradient routing improved transfer robustness under domain shift. Reported numerical values, dataset counts, and metric deltas reflect a specific experimental configuration and may vary with cohort composition, modality availability, and hyperparameters.

Comparative Performance Metrics:

Effective Rank Recovery

5 → 201

40× recovery

Methylation Variance

0.0002 → 0.955

4,775× recovery

Tumor Doubling Time

939d → 4.49d

Implausible → Valid

Additional Learnable Params

Fixed mask (vs >40K attention)

Full Comparative Results:

Metric	Standard DANN	With Invention	Factor
Effective latent rank	5/201 (2.5%)	201/201 (100%)	40× recovery
Meth. imputation variance	0.0002	0.955	4,775× recovery
Tumor doubling time	939 days	4.49 days	Corrected
Downstream Strat. C-index	0.654	0.669	+1.5pp
Additional learnable params	>40,000 (attention)	0 (fixed mask)	Eliminated
Training stability (573-PDX)	Collapsed (5 dims)	Stable (201 dims)	Collapse prevented

Comparative performance: Standard DANN vs. Ontology-Guided Autogradient Modulation.

The “Standard” model suffered from dimensionality collapse ( $r_{\text{eff}} = 5$ ), effectively destroying the biological utility of the representation. The “Invention” maintained full rank ( $r_{\text{eff}} = 201$ ) and restored methylation variance, enabling the downstream Neural ODE to predict physically plausible tumor growth rates.

VI.CLAIMS

What is claimed is:

Claim 1. (Method)

A computer-implemented method for stabilizing domain adaptation training in a neural network to prevent variance collapse, the method comprising:

instantiating a neural network comprising a shared encoder, a domain discriminator, and a gradient modulation layer;
loading a fixed semantic weight mask into a non-trainable registered buffer within the gradient modulation layer, wherein said fixed semantic weight mask is a 1D tensor of dimension $D$ corresponding to a $D$ -dimensional latent representation, wherein the latent representation is partitioned into contiguous pathway blocks, and wherein said mask is excluded from optimizer parameter updates;
assigning scalar weights to the fixed semantic weight mask prior to training based on a stored ontology mapping table that classifies each pathway block into a conservation tier, wherein said scalar weights are not learned from training data;
executing a training loop comprising:
- performing a forward pass wherein the gradient modulation layer transmits a shared feature vector from the shared encoder to the domain discriminator unchanged;
- computing a domain classification loss via the domain discriminator; and
- performing a backward pass wherein the gradient modulation layer is implemented as a custom automatic differentiation function comprising:
  - a forward method that returns the shared feature vector unchanged; and
  - a backward method that receives a gradient tensor from the domain discriminator, computes an element-wise product of the gradient tensor with the fixed semantic weight mask retrieved from the non-trainable registered buffer, multiplies the result by a negative scalar adaptation parameter, and returns the resulting modulated gradient to an automatic differentiation engine for propagation to the shared encoder;

wherein the element-wise product and sign reversal are performed within the backward method of the custom automatic differentiation function and not via gradient hooks, optimizer-level gradient scaling, or loss function reweighting.

Claim 2.

The method of claim 1, wherein $D$ equals 201, and wherein the latent representation is partitioned into 50 pathway blocks of 4 dimensions each and a proliferation coordinate.

Claim 3.

The method of claim 1, wherein the training loop operates on a source domain dataset having a sample count ( $N$ ) and the shared feature vector has a dimensionality ( $D$ ), wherein $N < D$ , and wherein the fixed semantic weight mask comprises zero learnable parameters, thereby preventing the gradient modulation layer from overfitting to batch effects in the source domain dataset.

Claim 4.

The method of claim 1, wherein the stored ontology mapping table assigns a weight of 1.0 to pathway blocks identified as tumor-intrinsic and a weight of 0.3 or less to pathway blocks identified as tumor microenvironment-dependent.

Claim 5.

The method of claim 1, wherein the non-trainable registered buffer is configured to persist within a state dictionary of the neural network and to be transferred to a graphics processing unit (GPU) alongside trainable parameters of the neural network.

Claim 6.

The method of claim 1, wherein the custom automatic differentiation function is implemented as a subclass of an autograd function that overrides a default chain rule execution to decouple the forward pass from the backward pass.

Claim 7.

The method of claim 1, wherein the conservation tier is determined based on cross-species biological conservation between a human target domain and a murine source domain.

Claim 8. (Control Method)

A method for controlling a domain adaptation training pipeline to prevent effective dimensionality collapse, comprising:

training a shared encoder to map biological data to a latent space partitioned into pathway blocks;
utilizing a gradient modulation layer that scales backward gradients by a fixed, non-learnable mask stored in a registered buffer;
during training, periodically executing a stability monitor configured to compute three collapse indicators:
- an effective rank metric ( $r_{\text{eff}}$ ) of the latent space, computed by performing a singular value decomposition (SVD) on a batch of latent vectors of size $B$ and counting singular values that exceed 1% of a maximum singular value;
- an imputation variance metric ( $v_{\text{imp}}$ ) for a subset of the latent space corresponding to an imputed methylation block; and
- a per-pathway domain accuracy metric ( $\text{Acc}_{\text{domain}}$ ) computed by training a logistic regression classifier via cross-validation on individual pathway blocks;
comparing the collapse indicators to predetermined thresholds; and
triggering an automated remediation action if any of said thresholds are breached, wherein the remediation action is selected from the group consisting of: reducing a gradient reversal scaling factor $\lambda$ , replacing the fixed mask with a precomputed fallback mask having increased weights, freezing weights of a domain discriminator, and halting training.

Claim 9.

The method of claim 8, wherein the batch size $B$ is greater than or equal to 128, and wherein the threshold for the effective rank metric is 100 dimensions.

Claim 10.

The method of claim 8, wherein the per-pathway domain accuracy metric is computed using 5-fold cross-validation on pathway blocks of 4 dimensions each.

Claim 11. (System)

A system for stable cross-species transfer learning, comprising:

a hardware processor; and
a memory storing a neural network model and a non-trainable registered buffer containing a fixed semantic weight mask;

wherein the fixed semantic weight mask is a 1D tensor of dimension $D$ , partitioned into contiguous pathway blocks, with values determined prior to training based on a biological conservation ontology;

wherein the neural network model comprises a gradient modulation layer configured to execute instructions that cause the processor to:

in a forward pass, transmit input features to a discriminator without modification; and
in a backward pass, execute a custom automatic differentiation sequence that (i) computes a Hadamard product of an incoming gradient and the fixed semantic weight mask stored in the non-trainable registered buffer, and (ii) subsequently multiplies said product by a negative scalar ( $-\lambda$ );

wherein the computation of the Hadamard product occurs within the custom automatic differentiation sequence and not via a gradient hook; and

wherein the system further comprises a stability monitor configured to compute an effective rank of the input features via singular value decomposition on a batch of features and halt execution if the effective rank drops below a threshold.

Claim 12.

The system of claim 11, wherein $D$ equals 201, the batch of features has a size of at least 128, and the threshold is 100 dimensions.

Claim 13.

The system of claim 11, wherein the fixed semantic weight mask contains zero learnable parameters and is excluded from optimizer updates.

Claim 14.

The system of claim 11, wherein the stability monitor is further configured to compute a mean variance of a subset of the input features corresponding to imputed methylation data and trigger a remediation action if the mean variance drops below 0.01.

Claim 15.

The system of claim 11, wherein the input features comprise a proliferation dimension and a plurality of pathway blocks, and wherein the fixed semantic weight mask assigns a weight of 1.0 to the proliferation dimension and weights less than or equal to 0.3 to pathway blocks associated with immune response.

Claim 16.

The system of claim 11, wherein the neural network model further comprises a hypernetwork configured to receive the input features and generate parameters for a differential equation solver, and wherein the prevention of variance collapse ensures said parameters remain within biologically plausible bounds.

Claim 17.

The system of claim 16, wherein the biologically plausible bounds comprise a tumor doubling time of greater than 24 hours.

Claim 18. (Computer-Readable Medium)

A non-transitory computer-readable medium storing instructions that, when executed by a computing device, cause the computing device to perform operations for stabilizing a machine learning training pipeline, the operations comprising:

initializing a gradient modulation layer having a custom automatic differentiation function;
registering a fixed semantic weight mask as a persistent buffer within the layer, said mask being excluded from optimization and having a dimension of $D$ ;
processing a batch of biological data through an encoder to produce latent vectors;
passing the latent vectors through the gradient modulation layer to a discriminator via a forward pass that applies an identity function;
calculating a gradient of a discriminator loss;
backpropagating the gradient through the gradient modulation layer via a backward pass that scales the gradient by the fixed semantic weight mask and subsequently reverses its sign by multiplying by a negative adaptation parameter;
monitoring an effective rank of the latent vectors defined as a count of singular values exceeding 1% of a maximum singular value; and
terminating training if the effective rank falls below a threshold.

Claim 19.

The non-transitory computer-readable medium of claim 18, wherein $D$ equals 201 and the threshold for the effective rank is 100 dimensions.

Claim 20.

The non-transitory computer-readable medium of claim 18, wherein the fixed semantic weight mask comprises zero trainable parameters, thereby preventing the encoder from overfitting to batch effects in the biological data when a sample count is less than a feature dimension.

VII.ABSTRACT OF THE DISCLOSURE

A system and method for preventing training collapse in domain adaptation for computational biology are disclosed. The system addresses specific failure modes -- variance collapse, dimensionality collapse, and physical implausibility -- that occur when adversarial training is applied to high-dimensional, small-sample biological data (the “PDX Paradox,” where $N < D$ ). The method employs a gradient modulation layer implemented as a custom automatic differentiation primitive that decouples the forward and backward passes. In the backward pass, gradients are scaled by a fixed semantic weight mask stored in a non-trainable registered buffer before being reversed. A stability monitor actively tracks effective latent rank (via SVD), imputation variance, and per-pathway domain accuracy, triggering remediation actions if metrics fall below critical thresholds (e.g., rank < 100, variance < 0.01). Experimental results demonstrate the restoration of methylation variance ( $4{,}775 \times$ recovery) and effective latent rank ( $40 \times$ recovery), enabling the generation of physically plausible parameters for downstream Neural ODE models.

[End of Application]

System and Method for Preventing Training Collapse in Domain Adaptation via Ontology-Guided Autogradient Modulation