Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Hisense U7SG TV Review (2026): Better Design, Great Value
    • Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)
    • Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live
    • Dreaming in Cubes | Towards Data Science
    • Onda tiny house flips layout to fit three bedrooms and two bathrooms
    • Best Meta Glasses (2026): Ray-Ban, Oakley, AR
    • At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules
    Artificial Intelligence

    Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules

    Editor Times FeaturedBy Editor Times FeaturedMarch 10, 2026No Comments16 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Summary

    datasets are extraordinarily imbalanced, with optimistic charges under 0.2%. Normal neural networks educated with weighted binary cross-entropy usually obtain excessive ROC-AUC however wrestle to determine suspicious transactions beneath threshold-sensitive metrics. I suggest a Hybrid Neuro-Symbolic (HNS) strategy that comes with area data instantly into the coaching goal as a differentiable rule loss — encouraging the mannequin to assign excessive fraud chance to transactions with unusually giant quantities and atypical PCA signatures. On the Kaggle Credit score Card Fraud dataset, the hybrid achieves ROC-AUC of 0.970 ± 0.005 throughout 5 random seeds, in comparison with 0.967 ± 0.003 for the pure neural baseline beneath symmetric analysis. A key sensible discovering: on imbalanced knowledge, threshold choice technique impacts F1 as a lot as mannequin structure — each fashions have to be evaluated with the identical strategy for any comparability to be significant. Code and reproducibility supplies can be found at GitHub.

    The Downside: When ROC-AUC Lies

    I had a fraud dataset at 0.17% optimistic charge. Educated a weighted BCE community, acquired ROC-AUC of 0.96, somebody mentioned “good”. Then I pulled up the rating distributions and threshold-dependent metrics. The mannequin had quietly discovered that predicting “not fraud” on something ambiguous was the trail of least resistance — and nothing within the loss perform disagreed with that call.

    What bothered me wasn’t the maths. It was that the mannequin had no concept what fraud seems like. A junior analyst on day one may let you know: giant transactions are suspicious, transactions with uncommon PCA signatures are suspicious, and when each occur collectively, you need to undoubtedly be paying consideration. That data simply… by no means makes it into the coaching loop.So I ran an experiment. What if I encoded that analyst instinct as a mushy constraint instantly within the loss perform — one thing the community has to fulfill whereas additionally becoming the labels? The end result was a Hybrid Neuro-Symbolic (HNS) setup. This text walks by the complete experiment: the mannequin, the rule loss, the lambda sweep, and — critically — what a correct multi-seed variance evaluation with symmetric threshold analysis really reveals.

    The Setup

    I used the Kaggle Credit Card Fraud dataset — 284,807 transactions, 492 of that are fraud (0.172%). The V1–V28 options are PCA elements from an anonymized authentic characteristic house. Quantity and Time are uncooked. The extreme imbalance is the entire level; that is the place customary approaches begin to wrestle [1].

    Cut up was 70/15/15 prepare/val/take a look at, stratified. I educated 4 issues and in contrast them head-to-head:

    • Isolation Forest — contamination=0.001, suits on the complete coaching set
    • One-Class SVM — nu=0.001, suits solely on the non-fraud coaching samples
    • Pure Neural — three-layer MLP with BCE + class weighting, no area data
    • Hybrid Neuro-Symbolic — the identical MLP, with a differentiable rule penalty added to the loss

    Isolation Forest and One-Class SVM function a gut-check. If a supervised community with 199k coaching samples can’t clear the bar set by an unsupervised methodology, that’s value understanding earlier than you write up outcomes. A tuned gradient boosting mannequin would probably outperform each neural approaches; this comparability is meant to isolate the impact of the rule loss, not benchmark towards all potential strategies. Full code for all 4 is on GitHub.

    The Mannequin

    Nothing unique. A 3-layer MLP with batch normalization after every hidden layer. The batch norm issues greater than you would possibly anticipate — beneath heavy class imbalance, activations can drift badly with out it [3].

    class MLP(nn.Module):
        def __init__(self, input_dim):
            tremendous().__init__()
            self.web = nn.Sequential(
                nn.Linear(input_dim, 128),
                nn.ReLU(),
                nn.BatchNorm1d(128),
                nn.Linear(128, 64),
                nn.ReLU(),
                nn.BatchNorm1d(64),
                nn.Linear(64, 1)
            )
    
        def ahead(self, x):
            return self.web(x)
    

    For the loss, BCEWithLogitsLoss with pos_weight — computed because the ratio of non-fraud to fraud counts within the coaching set. On this dataset that’s 577 [4]. A single fraud pattern in a batch generates 577 instances the gradient of a non-fraud one.

    pos_weight = rely(y=0) / rely(y=1) ≈ 577

    That weight gives a directional sign when labeled fraud does seem. However the mannequin nonetheless has no idea of what “suspicious” seems like in characteristic house — it solely is aware of that fraud examples, once they do present up, must be closely weighted. That’s totally different from understanding the place to look on batches that occur to comprise no labeled fraud in any respect.

    The Rule Loss

    Right here is the core concept. Fraud analysts know two issues empirically: unusually excessive transaction quantities are suspicious, and transactions that sit removed from regular habits in PCA house are suspicious. I would like the mannequin to assign excessive fraud chances to transactions that match each indicators — even when a batch accommodates no labeled fraud examples.

    The trick is making the rule differentiable. An if/else threshold — flag any transaction the place quantity > 1000 — is a tough step perform. Its gradient is zero all over the place besides on the threshold itself, the place it’s undefined. Meaning backpropagation has nothing to work with; the rule produces no helpful gradient sign and the optimizer ignores it. As a substitute, I exploit a steep sigmoid centered on the batch imply. It approximates the identical threshold habits however stays easy and differentiable all over the place — the gradient is small removed from the boundary and peaks close to it, which is precisely the place you need the optimizer paying consideration. The result’s a easy suspicion rating between 0 and 1:

    def rule_loss(x, probs):
        # x[:, -1]   = Quantity  (final column in creditcard.csv after dropping Class)
        # x[:, 1:29] = V1–V28  (PCA elements, columns 1–28)
        quantity   = x[:, -1]
        pca_norm = torch.norm(x[:, 1:29], dim=1)
    
        suspicious = (
            torch.sigmoid(5 * (quantity   - quantity.imply())) +
            torch.sigmoid(5 * (pca_norm - pca_norm.imply()))
        ) / 2.0
    
        penalty = suspicious * torch.relu(0.6 - probs.squeeze())
        return penalty.imply()
    

    A be aware on why PCA norm particularly: the V1–V28 options are the results of a PCA rework utilized to the unique anonymized transaction knowledge. A transaction that sits removed from the origin on this compressed house has uncommon variance throughout a number of authentic options concurrently — it’s an outlier within the latent illustration. The Euclidean norm of the PCA vector captures that distance in a single scalar. This isn’t a Kaggle-specific trick. On any dataset the place PCA elements symbolize regular behavioral variance, the norm of these elements is an inexpensive proxy for atypicality. In case your options are usually not PCA-transformed, you’d change this with a domain-appropriate sign — Mahalanobis distance, isolation rating, or a feature-specific z-score.

    The relu(0.6 – probs) time period is the constraint: it fires solely when the mannequin’s predicted fraud chance is under 0.6 for a suspicious transaction. If the mannequin is already assured (prob > 0.6), the penalty is zero. That is intentional — I’m not penalizing the mannequin for being too aggressive on suspicious transactions, just for being too conservative. The asymmetry means the rule can by no means battle towards an accurate high-confidence prediction.

    Formally, the mixed goal is:

    L_total = L_BCE + λ · L_rule

    L_rule = E[ σ_susp(x) · ReLU(0.6 − p) ]

    σ_susp(x) = ½ · [ σ(5·(amount − ā)) + σ(5·(‖V₁₋₂₈‖ − mean‖V‖)) ]

    The λ hyperparameter controls how exhausting the rule pushes. At λ=0 you get the pure neural baseline. The total coaching loop:

    for xb, yb in train_loader:
        xb, yb = xb.to(DEVICE), yb.to(DEVICE)
    
        logits = mannequin(xb)
        bce    = criterion(logits.squeeze(), yb)
        probs  = torch.sigmoid(logits)
        rl     = rule_loss(xb, probs)
        loss   = bce + lambda_rule * rl
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    

    Tuning Lambda

    5 values examined: 0.0, 0.1, 0.5, 1.0, 2.0. Every mannequin educated to greatest validation PR-AUC with early stopping at endurance=7, seed=42:

    Lambda 0.0  →  Val PR-AUC: 0.7580
    Lambda 0.1  →  Val PR-AUC: 0.7595
    Lambda 0.5  →  Val PR-AUC: 0.7620   ← greatest
    Lambda 1.0  →  Val PR-AUC: 0.7452
    Lambda 2.0  →  Val PR-AUC: 0.7504
    
    Greatest Lambda: 0.5
    

    λ=0.5 wins narrowly on validation PR-AUC. The hole between λ=0.0, 0.1, and 0.5 is small — throughout the vary of seed variance because the multi-seed evaluation under reveals. The significant drop at λ=1.0 and a couple of.0 means that aggressive rule weighting can override the BCE sign slightly than complement it. On new knowledge, deal with λ=0 because the default and confirm any enchancment holds throughout seeds earlier than trusting it.

    One factor to watch out about with threshold choice: I computed the optimum F1 threshold on the validation set and utilized it to the take a look at set — for each fashions symmetrically. On a 0.17% positive-rate dataset, the optimum choice boundary is nowhere close to 0.5. Making use of totally different thresholding methods to totally different fashions means measuring the brink hole, not the mannequin hole. Each should use the identical strategy:

    def find_best_threshold(y_true, probs):
        precision, recall, thresholds = precision_recall_curve(y_true, probs)
        f1_scores = 2*(precision*recall) / (precision+recall+1e-8)
        return thresholds[np.argmax(f1_scores)]
    
    # Utilized symmetrically to BOTH fashions — val set solely
    hybrid_thresh, _ = find_best_threshold(y_val, hybrid_val_probs)
    pure_thresh,   _ = find_best_threshold(y_val, pure_val_probs)
    

    Outcomes

    Mannequin F1 PR-AUC ROC-AUC Recall@1percentFPR
    Isolation Forest 0.121 0.172 0.941 0.581
    One-Class SVM 0.029 0.391 0.930 0.797
    Pure Neural (λ=0) 0.776 0.806 0.969 0.878
    Hybrid (λ=0.5) 0.767 0.745 0.970 0.878
    Desk 1 — Check-set outcomes, seed=42, each supervised fashions utilizing val-tuned thresholds. The pure neural baseline is a single retrained run; seed variance is quantified in Desk 2 under.

    On this seed, the hybrid and pure baseline are aggressive on F1 (0.767 vs 0.776) and similar on Recall@1percentFPR. The hybrid’s PR-AUC is decrease on this explicit seed (0.745 vs 0.806). The cleanest sign is ROC-AUC — 0.970 for the hybrid vs 0.969 for the pure baseline. ROC-AUC is threshold-independent, measuring rating high quality throughout all potential cutoffs. That edge is the place the rule loss reveals up most constantly.

    Precision-Recall Curve

    Determine 1 — Precision-Recall curve for the Hybrid mannequin (seed=42). PR-AUC = 0.745. Picture by Writer.

    Robust early precision is what you need in a fraud system. The curve holds fairly earlier than dropping — that means the mannequin’s top-ranked transactions are genuinely fraud-heavy, not only a fortunate threshold. In manufacturing you’d tune the brink to your precise price ratio: the price of a missed fraud versus the price of a false alarm. The val-optimized F1 threshold used here’s a affordable center floor for reporting, not the one legitimate selection.

    Confusion Matrix

    Confusion matrix for the Hybrid model (seed=42) at validation-tuned threshold
    Determine 2 — Confusion Matrix at validation-tuned threshold (seed=42). Picture by Writer.

    Rating Distributions

    Histogram of predicted probabilities for non-fraud (blue) and fraud (orange) classes using the Hybrid model (seed=42)
    Determine 3 — Predicted chance distributions (seed=42). Non-fraud (blue) clusters close to 0; fraud (orange) is pushed increased by the rule penalty. Picture by Writer.

    This histogram is what I have a look at first after coaching any classifier on imbalanced knowledge. The non-fraud distribution ought to spike close to zero; the fraud distribution ought to unfold towards 1. The overlap area within the center is the place the mannequin is genuinely unsure — that’s the place your threshold lives.

    Variance Evaluation — 5 Random Seeds

    A single-seed end result on a dataset this imbalanced isn’t sufficient to belief. I ran each fashions throughout seeds [42, 0, 7, 123, 2024], making use of val-optimized thresholds symmetrically to each in each run:

    Seed   42 | Hybrid F1: 0.767  PR-AUC: 0.745 | Pure F1: 0.776  PR-AUC: 0.806
    Seed    0 | Hybrid F1: 0.733  PR-AUC: 0.636 | Pure F1: 0.788  PR-AUC: 0.743
    Seed    7 | Hybrid F1: 0.809  PR-AUC: 0.817 | Pure F1: 0.767  PR-AUC: 0.755
    Seed  123 | Hybrid F1: 0.797  PR-AUC: 0.756 | Pure F1: 0.757  PR-AUC: 0.731
    Seed 2024 | Hybrid F1: 0.764  PR-AUC: 0.745 | Pure F1: 0.826  PR-AUC: 0.763
    
    Mannequin F1 (imply ± std) PR-AUC (imply ± std) ROC-AUC (imply ± std)
    Pure Neural 0.783 ± 0.024 0.760 ± 0.026 0.967 ± 0.003
    Hybrid (λ=0.5) 0.774 ± 0.027 0.740 ± 0.058 0.970 ± 0.005
    Desk 2 — Multi-seed variance throughout 5 seeds. Hybrid and pure baseline are statistically indistinguishable on F1 and PR-AUC. Hybrid reveals a constant ROC-AUC benefit throughout all 5 seeds.
    Bar chart showing mean and standard deviation of F1 and PR-AUC across 5 random seeds for pure neural and hybrid models
    Determine 4 — F1 and PR-AUC imply ± std throughout 5 seeds. Variations on threshold-dependent metrics are inside noise vary. Picture by Writer.

    Three observations from the variance knowledge. The hybrid wins on F1 in 2 of 5 seeds; the pure baseline wins in 3 of 5. Neither dominates on threshold-dependent metrics. The hybrid’s PR-AUC variance is notably increased (±0.058 vs ±0.026), that means the rule loss makes some initializations higher and a few worse — it’s a sensitivity, not a assured enchancment. The one end result that holds with out exception: ROC-AUC is increased for the hybrid throughout all 5 seeds. That’s the cleanest sign from this experiment.

    Why Does the Rule Loss Assist ROC-AUC?

    ROC-AUC is threshold-independent — it measures how effectively the mannequin ranks fraud above non-fraud throughout all potential cutoffs. A constant enchancment throughout 5 seeds is an actual sign. Here’s what I feel is going on.

    With 0.172% fraud prevalence, most 2048-sample batches comprise solely 3–4 labeled fraud examples. The BCE loss receives virtually no fraud-relevant gradient on nearly all of batches. The rule loss fires on each suspicious transaction no matter label — it generates gradient indicators on batches that will in any other case inform the optimizer virtually nothing about fraud. This provides the mannequin constant path all through coaching, not simply on the uncommon batches the place labeled fraud occurs to seem.

    The penalty can also be feature-selective. By pointing the mannequin particularly towards quantity and PCA norm, the rule reduces the prospect that the mannequin latches onto irrelevant correlations within the different 28 dimensions. It capabilities as mushy regularization over the characteristic house, not simply the output house.

    The one-sided relu issues too. I’m not penalizing the mannequin for being too aggressive on suspicious transactions — just for being too conservative. The rule can’t battle towards an accurate high-confidence prediction, solely push up underconfident ones. That asymmetry is deliberate.

    The lesson isn’t that guidelines change studying. It’s that guidelines can information it — particularly when labeled examples are scarce and also you already know one thing about what you might be searching for.

    On Threshold Analysis in Imbalanced Classification

    One discovering from this experiment is value its personal part as a result of it applies to any imbalanced classification drawback, not simply fraud.

    On a dataset with 0.17% optimistic charge, the optimum F1 threshold is nowhere close to 0.5. A mannequin can rank fraud virtually completely and nonetheless rating poorly on F1 at a default threshold, just because the choice boundary must be calibrated to the category imbalance. Which means that if two fashions are evaluated with totally different thresholding methods — one at a hard and fast cutoff, the opposite with a val-optimized cutoff — you aren’t evaluating fashions. You might be measuring the brink hole.

    The sensible guidelines for clear comparability on imbalanced knowledge:

    • Each fashions evaluated with the identical thresholding technique
    • Threshold chosen on validation knowledge, by no means on take a look at knowledge
    • PR-AUC and ROC-AUC reported alongside F1 — each are threshold-independent
    • Variance throughout a number of seeds to separate actual variations from fortunate initialization

    Issues to Watch Out For

    Batch-relative statistics. The rule computes “excessive quantity” and “excessive PCA norm” relative to the batch imply, not a hard and fast inhabitants statistic. Throughout coaching with giant batches (2048) and stratified sampling, batch means are secure sufficient. In on-line inference scoring particular person transactions, freeze these statistics to training-set values. In any other case the “suspicious” boundary shifts with each name.

    PR-AUC variance will increase with the rule loss. Hybrid PR-AUC ranges from 0.636 to 0.817 throughout seeds versus 0.731 to 0.806 for the pure baseline. A rule that helps on some initializations and hurts on others requires multi-seed validation earlier than drawing conclusions. Single-seed outcomes are usually not sufficient.

    Excessive λ degrades efficiency. λ=1.0 and a couple of.0 present a significant drop in validation PR-AUC. Aggressive rule weighting can override the BCE sign slightly than complement it. Begin at λ=0.5 and confirm by yourself knowledge earlier than going increased.

    A pure extension would make the rule weights learnable slightly than mounted at 0.5/0.5:

    # Learnable mixture weights
    self.rule_w = nn.Parameter(torch.tensor([0.5, 0.5]))
    
    w = torch.softmax(self.rule_w, dim=0)
    suspicious = (
        w[0] * torch.sigmoid(5 * (quantity   - quantity.imply())) +
        w[1] * torch.sigmoid(5 * (pca_norm - pca_norm.imply()))
    )
    

    This lets the mannequin resolve whether or not quantity or PCA norm is extra predictive for the particular knowledge, slightly than hard-coding equal weights. This variant has not been run but — it’s the subsequent factor on the listing.

    Closing Ideas

    The rule loss does one thing actual — the ROC-AUC enchancment is constant and threshold-independent throughout all 5 seeds. The advance on threshold-dependent metrics like F1 and PR-AUC is inside noise vary and will depend on initialization. The sincere abstract: area guidelines injected into the loss perform can enhance a mannequin’s underlying rating distributions on rare-event knowledge, however the magnitude relies upon closely on the way you measure it and the way secure the advance is throughout seeds.

    Should you work in fraud detection, anomaly detection, or any area the place labeled positives are uncommon and area data is wealthy, this sample is value experimenting with. The implementation is easy — a handful of strains on high of an ordinary coaching loop. The extra essential self-discipline is measurement: use symmetric threshold analysis, report threshold-independent metrics, and at all times run a number of seeds earlier than trusting a end result.

    The repo has the complete coaching loop, lambda sweep, variance evaluation, and eval code. Obtain the CSV from Kaggle, drop it in the identical listing, run app.py. The numbers above ought to reproduce — if they don’t in your machine, open a problem and I’ll have a look.

    References

    [1] A. Dal Pozzolo, O. Caelen, R. A. Johnson and G. Bontempi, Calibrating Chance with Undersampling for Unbalanced Classification (2015), IEEE SSCI. https://dalpozz.github.io/static/pdf/SSCI_calib_final_noCC.pdf

    [2] ULB Machine Studying Group, Credit score Card Fraud Detection Dataset (Kaggle). https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (Open Database license)

    [3] S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Community Coaching by Lowering Inner Covariate Shift (2015), arXiv:1502.03167. https://arxiv.org/abs/1502.03167

    [4] PyTorch Documentation — BCEWithLogitsLoss. https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

    [5] Experiment code and reproducibility supplies. https://github.com/Emmimal/neuro-symbolic-fraud-pytorch/

    Disclosure

    This text is predicated on unbiased experiments utilizing publicly out there knowledge (Kaggle Credit score Card Fraud dataset) and open-source instruments (PyTorch). No proprietary datasets, firm sources, or confidential data had been used. The outcomes and code are totally reproducible as described, and the GitHub repository accommodates the whole implementation. The views and conclusions expressed listed here are my very own and don’t symbolize any employer or group.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    Comments are closed.

    Editors Picks

    Hisense U7SG TV Review (2026): Better Design, Great Value

    April 19, 2026

    Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)

    April 19, 2026

    Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    AI marketing compliance startup Haast pockets $17 million Series A

    April 10, 2026

    Five ways world’s richest man changed the White House

    May 30, 2025

    Six wagering providers breach gambling self-exclusion rules, ACMA says

    January 30, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.