has been in manufacturing two months. Accuracy is 92.9%.
Then transaction patterns shift quietly.
By the point your dashboard turns pink, accuracy has collapsed to 44.6%.
Retraining takes six hours—and wishes labeled information you gained’t have till subsequent week.
What do you do in these six hours?
TL;DR
Drawback: Mannequin drifts, retraining unavailable
Answer: Self-healing adapter layer
Key concept: Replace a small part, not the total mannequin
System conduct:
- Spine stays frozen
- Adapter updates in actual time
- Updates run asynchronously (no downtime)
- Symbolic guidelines present weak supervision
- Rollback ensures security
Outcome: +27.8% accuracy restoration — with an specific recall tradeoff defined inside.
This text is a few ReflexiveLayer: a small architectural part that sits contained in the community and adjusts to shifted distributions whereas the spine stays frozen. The adapter updates in a background thread so inference by no means stops. Mixed with a symbolic rule engine for weak supervision and a mannequin registry for rollback, it recovered 27.8 share factors of accuracy on this experiment with out touching the spine weights as soon as.
The outcomes are sincere: restoration is actual however comes with a recall tradeoff that issues in fraud detection. Each are defined in full.
Full code, all 7 variations, manufacturing stack, monitoring export, all plots: https://github.com/Emmimal/self-healing-neural-networks/
Why customary approaches fall quick right here
When a mannequin begins degrading, the standard playbook is one in all three issues: retrain on contemporary labeled information, use an ensemble that features a lately educated mannequin, or roll again to a earlier checkpoint.
All customary approaches assume you might have one thing you could not:
- Labeled information
- Time to retrain
- A checkpoint that works on the brand new distribution
Rollback is particularly deceptive.
Rolling again to scrub weights on a shifted distribution doesn’t repair the issue—it repeats it.
What I needed was one thing that might function within the hole: no new labeled information, no downtime, no rollback to a distribution that now not exists. That constraint formed the structure.
Whereas this experiment focuses on fraud detection, the identical constraint seems in any manufacturing system the place retraining is delayed—advice engines, threat scoring, anomaly detection, or real-time personalization.
The structure: one frozen spine, one trainable adapter
The important thing design selection is the place to place the trainable capability. Quite than making the entire community adaptable, I isolate adaptation to a single part, the ReflexiveLayer, sandwiched between the frozen spine and the frozen output head.
Right here’s the structure in a single look:
class ReflexiveLayer(nn.Module):
def __init__(self, dim):
tremendous().__init__()
self.adapter = nn.Sequential(
nn.Linear(dim, dim), nn.Tanh(),
nn.Linear(dim, dim)
)
self.scale = nn.Parameter(torch.tensor(0.1))
def ahead(self, x):
return x + self.scale * self.adapter(x)
The residual connection (x + self.scale * self.adapter(x)) is doing vital work right here. The scale parameter begins at 0.1, so the adapter begins as a near-zero perturbation. The spine sign passes by way of nearly unmodified. As therapeutic accumulates, scale can develop, however the unique spine output is at all times current within the sign. The adapter can solely add correction; it can’t overwrite what the spine realized.
The adapter can’t overwrite the mannequin—it may possibly solely appropriate it.
The total mannequin inserts the ReflexiveLayer between the spine and output head:
class SelfHealingMLP(nn.Module):
def __init__(self, input_dim=10, hidden_dim=64):
tremendous().__init__()
self.spine = nn.Sequential(
nn.Linear(input_dim, hidden_dim), nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim), nn.ReLU()
)
self.reflexive = ReflexiveLayer(hidden_dim)
self.output_head = nn.Sequential(
nn.Linear(hidden_dim, 1), nn.Sigmoid()
)
def freeze_for_healing(self):
for p in self.spine.parameters():
p.requires_grad = False
for p in self.output_head.parameters():
p.requires_grad = False
def unfreeze_all(self):
for p in self.parameters():
p.requires_grad = True
Throughout a heal occasion, freeze_for_healing() is named first. Solely the ReflexiveLayer receives gradient updates. After therapeutic, unfreeze_all() restores the total parameter graph in case a full retrain is finally run.
One factor value noting in regards to the parameter counts: the mannequin has 13,250 parameters whole, and the ReflexiveLayer holds 8,321 of them (two 64×64 linear layers plus the scalar scale). That’s 62.8% of the entire. The spine, which maps 10 enter options up by way of 64 hidden models throughout two layers, holds solely 4,864. So the adapter is just not “small” in parameter rely. It’s architecturally centered: its job is restricted to remodeling the spine’s hidden representations, and the residual connection plus frozen spine guarantee it can’t destroy what was realized throughout coaching.
The explanation this cut up issues: catastrophic forgetting (the tendency of neural networks to lose beforehand realized conduct when up to date on new information) is restricted as a result of the spine is at all times frozen throughout therapeutic. The gradient circulate throughout heal steps solely touches the adapter, so the foundational representations can’t degrade no matter what number of heal occasions happen.
Two alerts that resolve when to heal
Therapeutic triggered too regularly wastes compute. Therapeutic triggered too late lets degradation accumulate. The system makes use of two unbiased alerts.
Sign one: FIDI (Function-based Enter Distribution Inspection)
FIDI displays the rolling imply of characteristic V14, the characteristic the community independently recognized as its strongest fraud sign in Neuro-Symbolic AI Experiment. It computes a z-score towards calibration statistics from coaching:
FIDI | μ=-0.363 σ=1.323 threshold=1.0
V14 clear | imply=-0.377 pct<-1.5 = 18.8%
V14 drift | imply=-2.261 pct<-1.5 = 77.4%
When the z-score exceeds 1.0, the incoming information now not appears to be like just like the coaching distribution. On this experiment the z-score crosses the brink at batch 3 and stays elevated. The drifted V14 distribution has a imply 1.9 customary deviations beneath calibration, and this drift is utilized as a continuing shift for all 25 batches. The system appropriately detects it and by no means returns to HEALTHY.
Sign two: symbolic conflicts
The SymbolicRuleEngine encodes one area rule: if V14 < -1.5, the transaction is probably going fraud. A battle happens when the neural community assigns a low fraud likelihood (beneath 0.30) to a transaction the rule flags. When 5 or extra conflicts seem in a batch, a heal is triggered even with no vital z-score.
The 2 alerts complement one another. FIDI is delicate to total distribution shift in V14’s imply. Battle counting is delicate to model-rule disagreement on particular samples and may catch localized degradation {that a} distribution-level z-score may miss. The dataset has 15.0% fraud (150 fraud transactions within the 1,000-sample take a look at set).

Async therapeutic: weight updates that don’t interrupt inference
Essentially the most production-critical design choice right here is that therapeutic by no means blocks inference. A background thread processes heal requests from a queue. An RLock (reentrant lock) protects the shared mannequin state.
class AsyncHealingEngine:
def __init__(self, mannequin):
self.mannequin = mannequin
self._lock = threading.RLock()
self._queue = queue.Queue()
self._worker = threading.Thread(
goal=self._heal_worker, daemon=True
)
self._worker.begin()
def predict(self, X):
with self._lock: # temporary lock, only a ahead move
self.mannequin.eval()
with torch.no_grad():
return self.mannequin(X)
def request_heal(self, X, y, symbolic, batch_idx, fraud_frac=0.0):
self._queue.put({ # non-blocking, returns instantly
"X": X.clone(), "y": y.clone(),
"symbolic": symbolic,
"batch_idx": batch_idx,
"fraud_frac": fraud_frac,
})
request_heal() returns instantly. The inference thread by no means waits. The heal employee picks up the job, acquires the lock, runs the gradient steps, and releases. The daemon=True flag ensures the background thread exits when the primary course of terminates with out leaving orphaned threads.
What occurs throughout a heal
The heal combines three loss elements into one goal:
total_loss = 0.70 * real_loss + 0.24 * consistency_loss + 0.03 * entropy
(The coefficients come from alpha=0.70 and lambda_lag=0.80, so the consistency time period is (1 - 0.70) * 0.80 = 0.24.)
Actual information loss (floor reality)
Actual information loss is weighted binary cross-entropy towards the incoming batch labels. The fraud weight scales with the noticed fraud fraction amongst conflicted samples:
fraud_frac = 0% -> pos_weight = 1.0 (no adjustment)
fraud_frac = 10% -> pos_weight = 2.0
fraud_frac = 20% -> pos_weight = 3.0
fraud_frac >= 30% -> pos_weight = 4.0 (cap)
The situation fraud_frac >= 0.10 acts as a gate: beneath that, the mannequin adapts symmetrically. On batches the place conflicted transactions change into principally official, aggressive fraud weighting would push the adapter within the improper route. This gating prevents that.
Consistency loss (symbolic steering)
Consistency loss is binary cross-entropy towards the symbolic rule engine’s predictions. Even with out ground-truth labels, the symbolic rule offers a steady weak supervision sign that retains the adapter aligned with area information somewhat than overfitting to no matter sample occurs to dominate the present batch. That is the neuro-symbolic anchor described in Hybrid Neuro-Symbolic Fraud Detection and Neuro-Symbolic AI Experiment.
Entropy minimization (confidence restoration)
Entropy minimization (weight 0.03) pushes predictions towards extra assured values. Beneath drift, fashions usually turn into unsure throughout many transactions somewhat than confidently improper about particular ones. Name it decision-boundary paralysis. Minimizing entropy counteracts this with out dominating the opposite loss phrases.
Solely 5 gradient steps are taken per heal. A 100-sample batch is just not sufficient information to securely take giant gradient steps. 5 steps nudge the adapter towards the brand new distribution with out committing to any single batch’s sign.
The shadow mannequin: an sincere counterfactual
Any on-line adaptation system wants a solution to a fundamental query: is the difference truly serving to? To measure this, a frozen copy of the baseline mannequin (the “shadow mannequin”) runs in parallel each batch and by no means adapts. The elevate metric is solely:
acc_lift = healed_accuracy - shadow_accuracy
On this experiment, elevate is optimistic on each one of many 25 batches, starting from +0.050 to +0.360. The shadow mannequin offers the sincere baseline: what you’ll get in the event you did nothing.

Understanding the total outcomes actually
The ultimate analysis runs on the total 1,000-sample drifted take a look at set in any case 25 streaming batches:
Stage Acc Prec Recall F1
------------------------------------------------------------------
Clear Baseline 92.9% 0.784 0.727 0.754
Beneath Drift, No Therapeutic 44.6% 0.194 0.853 0.316
Shadow, Frozen 44.6% 0.194 0.853 0.316
Manufacturing Self-Healed 72.4% 0.224 0.340 0.270
The accuracy restoration is real. The healed mannequin reaches 72.4% on information the baseline collapses on, a 27.8 share level enchancment over any frozen different.
As seen within the manufacturing logs, the healed mannequin catches fewer whole frauds (Recall 0.34) however stops the ‘false optimistic explosion’ that happens when a drifted mannequin loses its choice boundary.
However the recall numbers want rationalization, as a result of a naive learn of this desk could be deceptive.
What “recall 0.853 at 44.6% accuracy” truly means
The confusion matrix for the no-healing mannequin underneath drift:
No-Therapeutic: TP=128 TN=318 FP=532 FN=22
Healed: TP=51 TN=673 FP=177 FN=99
The no-healing mannequin catches 128 out of 150 fraud circumstances (recall 0.853). But it surely additionally generates 532 false positives, flagging 532 official transactions as fraud. Accuracy is 44.6% as a result of almost half the predictions are improper. In a cost fraud system, 532 false positives in a 1,000-transaction batch means the mannequin has successfully misplaced its choice boundary. It’s flagging every thing suspicious. Operations groups drowning in false alarms is commonly the primary signal {that a} manufacturing mannequin has drifted badly.
The healed mannequin catches 51 out of 150 fraud circumstances (recall 0.340) whereas producing solely 177 false positives. It misses extra fraud, however its predictions are much more dependable.
F1 doesn’t seize this tradeoff
F1 treats false positives and false negatives symmetrically. The no-healing mannequin’s F1 is 0.316 and the healed mannequin’s F1 is 0.270. By F1 alone, the no-healing mannequin appears to be like higher. However F1 doesn’t account for the price construction of the issue. In most cost fraud techniques, the price of a false optimistic (a blocked official transaction) is just not zero, and the ratio of price between false positives and false negatives determines which mannequin conduct is preferable.
If lacking a fraud transaction prices $5,000 on common and a false optimistic prices $15 in buyer assist and churn threat, the no-healing mannequin’s conduct is perhaps value its 532 false positives to catch extra fraud. In case your evaluate queue has a tough capability and a false optimistic prices nearer to $200 in operational overhead, the healed mannequin’s 177 false positives and better accuracy are clearly higher.
The purpose is: this can be a deployment choice, not a mannequin high quality choice. The tradeoff exists as a result of the adapter learns that V14’s shifted vary is now not a dependable fraud sign in isolation. That’s the appropriate adaptation for the distribution change utilized. Whether or not it serves your particular deployment context requires understanding your price construction.


Mannequin registry and rollback: the protection internet
Each heal occasion creates two snapshots: one earlier than the heal and one after. Publish-heal snapshots are tagged and kind the pool of rollback candidates. The well being monitor tracks a rolling window of F1 scores and compares them to a baseline established on the first profitable heal.
If rolling F1 drops greater than 8 share factors beneath that baseline, the rollback engine restores the highest-F1 post-heal snapshot. It targets post-heal snapshots particularly, not the unique clear weights.
This distinction issues. In Neuro-Symbolic Fraud Detection: Catching Concept, the drift monitoring strategy demonstrated that rolling again to pre-drift weights on a drifted distribution reproduces the identical failure. The very best obtainable state is whichever post-heal snapshot carried out finest on the drifted information, not the clean-data baseline.
v21 | batch=10 | acc=0.710 | f1=0.408 | post-heal [BEST]
On this experiment, no rollback was triggered throughout 25 batches. The rollback_f1_drop threshold is about conservatively at 0.08 and the heal high quality was constantly above it. That could be a good consequence however not a take a look at of the rollback path. To train it intentionally: set rollback_f1_drop = 0.03 and drift_strength = 3.5. The adapter will begin receiving conflicting replace alerts from noisy late batches, F1 will dip beneath the tightened threshold, and the engine will restore v21. Working this earlier than any manufacturing deployment is worth it.


System state over time
The mannequin strikes by way of 4 states throughout a manufacturing run:
HEALTHY: no drift sign, no symbolic conflicts above threshold. No therapeutic happens.
DRIFTING: FIDI z-score is elevated or battle rely exceeds the minimal. Therapeutic is triggered every batch.
HEALING: the transient state throughout an lively heal occasion. Inference continues on the present weights till the background thread completes and the lock is launched.
ROLLED_BACK: therapeutic degraded efficiency past the configured threshold and the registry restored a previous snapshot.
On this experiment, the system is HEALTHY for batches 1 and a couple of, then enters DRIFTING at batch 3 and stays there for the rest of the run. On condition that the artificial drift is utilized as a everlasting fixed shift (V14 imply strikes by 1.9 customary deviations and stays there), the z-score by no means returns beneath the brink. In an actual deployment with gradual or intermittent drift, you’ll count on to see extra oscillation between states.

Manufacturing monitoring export
After each run, the system exports three information to monitoring_export/:
metrics.csv: one row per batch, with accuracy, F1, precision, recall, z-score, battle rely, acc elevate vs shadow, and system state. This format imports straight into Grafana as a CSV information supply or masses into pandas for ad-hoc evaluation.
occasions.json: one entry per non-trivial motion (heal triggers, rollbacks). Structured for ELK or any log aggregation system.
threshold_config.json: the present rollback thresholds in a standalone file:
{
"rollback_f1_drop": 0.08,
"rollback_acc_drop": 0.10,
"health_window": 5,
"word": "Edit values and restart to tune threat tolerance"
}
Separating thresholds into their very own file means the operations crew can regulate threat tolerance with out touching mannequin code. Mannequin homeowners management structure and coaching parameters. Operations controls alerting and rollback thresholds. These are totally different choices made by totally different folks on totally different timescales.

What this strategy doesn’t resolve
It requires a minimum of one symbolic rule. The consistency loss retains the adapter from overfitting to noisy batches. With out some type of area anchor (a rule, a smooth label, a trainer mannequin), the heal degrades to becoming the adapter on small samples with solely the actual information loss, which produces unstable updates. Should you can’t specific even one area rule, this strategy wants a unique weak supervision supply.
Restoration is bounded by the frozen spine. The spine realized representations from clear information. If drift is extreme sufficient that these representations include no helpful sign, the adapter can’t compensate. On this experiment the spine’s representations stay partially helpful as a result of V14 continues to be probably the most informative characteristic, simply shifted in imply. A drift that introduces a completely new fraud mechanism the spine by no means noticed would exhaust what the adapter can repair. This technique buys time on gradual distributional shift. It doesn’t exchange retraining.
The recall tradeoff is actual and deployment-specific. The healed mannequin reduces false positives considerably however misses extra fraud. It is a consequence of the adapter studying that V14’s new vary is now not a clear fraud sign. Whether or not that tradeoff is suitable will depend on your price construction.
The rollback system was not stress-tested on this run. Zero rollbacks in 25 batches means the heal high quality stayed above the configured threshold all through. That’s not a take a look at of the rollback path. Train it explicitly earlier than counting on it in manufacturing.
How this matches the collection
Hybrid Neuro-Symbolic Fraud Detection embedded analyst-written guidelines straight into the coaching loss. The achieve over a pure neural baseline was actual however smaller than the framing prompt. The symbolic part helps most when coaching information is noisy or label-sparse.
Neural Network Learned Its Own Fraud Rules reversed the route: let the gradient uncover guidelines somewhat than having them offered. The community independently recognized V14 as its strongest fraud sign with out being instructed to search for it. That convergence between gradient findings and area skilled information is what makes V14 monitoring significant.
Neuro-Symbolic Fraud Detection: Catching Concept Drift Before F1 Drops used realized rule activations as a drift canary, monitoring rule settlement charges to detect distribution shift earlier than mannequin metrics visibly declined. That article left the response query open.
This text is the response. FIDI and symbolic battle detection set off therapeutic (developed in Neuro-Symbolic Fraud Detection: Catching Idea Drift Earlier than F1 Drops). The symbolic rule offers the consistency sign throughout therapeutic (the loss structure from Hybrid Neuro-Symbolic Fraud Detection and Neural Community Realized Its Personal Fraud Guidelines). The reflexive adapter offers the trainable capability to soak up the shift.
V14 connects all 4 articles. It appeared within the hybrid loss in Hybrid Neuro-Symbolic Fraud Detection. The gradient discovered it with out steering in Neural Community Realized Its Personal Fraud Guidelines. Its distribution change was the drift canary in Neuro-Symbolic Fraud Detection: Catching Idea Drift Earlier than F1 Drops. Right here its shift is the drift being recovered from. In actual fraud datasets, a small variety of options carry many of the discriminative sign, and people options are additionally those that change most meaningfully when fraud patterns evolve.
Working it your self
The total implementation is a single Python file that makes use of solely a totally artificial, generic dataset generated on-the-fly contained in the script. No exterior or real-world datasets are loaded. The generator creates a 10-feature tabular downside with a 15% fraud ratio and applies a managed imply shift to 1 delicate characteristic (known as “V14” for continuity throughout the collection) to simulate idea drift.
All code is accessible at: https://github.com/Emmimal/self-healing-neural-networks/
# 1. Be sure you're within the appropriate listing
cd manufacturing
# 2. Set up the required packages (solely these three are wanted)
pip set up torch numpy matplotlib
# 3. Run the script
python self_healing_production_final.py
Anticipated runtime is underneath two minutes on CPU. The run generates 8 plots and the three monitoring export information in monitoring_export/.
Key Parameters
| Parameter | Default | Controls |
|---|---|---|
drift_strength |
2.2 | Power of the simulated drift |
heal_steps |
5 | Gradient steps per therapeutic cycle |
heal_lr |
0.003 | Studying price for the ReflexiveLayer solely |
fidi_threshold |
1.0 | Z-score threshold for drift detection |
rollback_f1_drop |
0.08 | F1 drop that triggers rollback |
conflict_min |
5 | Minimal symbolic conflicts to set off therapeutic |
To see the rollback system set off: set rollback_f1_drop = 0.03 and drift_strength = 3.5. The adapter will obtain conflicting replace alerts from noisy late batches, F1 will dip beneath the tightened threshold, and the rollback engine will restore one of the best post-heal snapshot (batch 10, F1=0.408). Working this intentionally is the proper technique to confirm the protection internet earlier than trusting it.
Key takeaway: You don’t have to retrain the entire mannequin to outlive drift—you want a managed place for adaptation.
Abstract
A frozen-backbone structure with a trainable ReflexiveLayer adapter recovered 27.8 share factors of accuracy underneath distribution shift, with out retraining, with out labeled information, and with out blocking inference. The restoration comes from three mixed mechanisms: the adapter absorbs the distribution shift, the symbolic rule consistency loss retains the adapter anchored throughout therapeutic, and the conditional fraud weighting scales the loss to the fraud price noticed in incoming batches.
The tradeoffs are actual. Recall drops from 0.853 to 0.340 as a result of the adapter appropriately learns that V14’s shifted vary is now not a clear fraud sign. Whether or not that tradeoff is suitable will depend on the price construction of the deployment. For a system the place false optimistic price is excessive and evaluate capability is restricted, the healed mannequin’s conduct is clearly preferable. For a system the place lacking fraud is catastrophic, the numbers want cautious analysis earlier than deploying this strategy.
The rollback and registry infrastructure, the monitoring export, and the tunable thresholds will not be beauty. In a manufacturing system affecting actual transactions, you want visibility into mannequin conduct, the flexibility to revert if therapeutic degrades efficiency, and a clear separation between mannequin tuning and operational threshold tuning. The structure right here tries to supply that infrastructure alongside the core adaptation mechanism.
What the system can’t do: recuperate from drift that makes the spine’s representations out of date, function with none area rule for weak supervision, or exchange a full retrain when fraud patterns change essentially. It buys time on gradual distributional shift. For many manufacturing fraud techniques, gradual shift is the widespread case.
The query is now not whether or not fashions can adapt in actual time. It’s whether or not we’re guiding that adaptation in the proper route.
Disclosure
This text relies on unbiased experiments utilizing a totally artificial dataset generated completely in code. No actual transaction information, no exterior datasets, no proprietary info, and no confidential information have been used at any level.
The artificial information generator creates a easy 10-feature tabular downside with a 15% fraud ratio and applies a managed imply shift to 1 characteristic to simulate idea drift. Whereas the design attracts free inspiration from normal statistical patterns generally noticed in public fraud detection benchmarks, no precise information from the ULB Credit score Card Fraud dataset (Dal Pozzolo et al., 2015) — or another actual dataset — was loaded, copied, or used.
All outcomes are totally reproducible utilizing the one Python file offered within the repository. The views and conclusions expressed listed below are my very own and don’t characterize any employer or group.
GitHub: https://github.com/Emmimal/self-healing-neural-networks/
References
[1] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, Ok., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., and Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the Nationwide Academy of Sciences, 114(13), 3521-3526. https://doi.org/10.1073/pnas.1611835114
[2] Python Software program Basis. (2024). threading: Thread-based parallelism. Python 3 Documentation. https://docs.python.org/3/library/threading.html
[3] Powers, D. M. W. (2011). Analysis: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Studying Applied sciences, 2(1), 37-63. https://arxiv.org/abs/2010.16061
[4] Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on idea drift adaptation. ACM Computing Surveys, 46(4), Article 44. https://doi.org/10.1145/2523813
[5] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. (2018). Studying underneath idea drift: A evaluate. IEEE Transactions on Data and Information Engineering, 31(12), 2346-2363. https://doi.org/10.1109/TKDE.2018.2876857
[6] Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. (2019). Parameter-efficient switch studying for NLP. Proceedings of the thirty sixth Worldwide Convention on Machine Studying (ICML). https://arxiv.org/abs/1902.00751
[7] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An crucial type, high-performance deep studying library. Advances in Neural Data Processing Methods (NeurIPS). https://arxiv.org/abs/1912.01703

