Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Our Favorite Apple Watch Has Never Been Less Expensive
    • Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)
    • Today’s NYT Strands Hints, Answer and Help for April 20 #778
    • KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.
    • OneOdio Focus A1 Pro review
    • The 11 Best Fans to Buy Before It Gets Hot Again (2026)
    • A look at Dylan Patel’s SemiAnalysis, an AI newsletter and research firm that expects $100M+ in 2026 revenue from subscriptions and AI supply chain research (Abram Brown/The Information)
    • ‘Euphoria’ Season 3 Release Schedule: When Does Episode 2 Come Out?
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The Machine Learning “Advent Calendar” Day 9: LOF in Excel
    Artificial Intelligence

    The Machine Learning “Advent Calendar” Day 9: LOF in Excel

    Editor Times FeaturedBy Editor Times FeaturedDecember 9, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Yesterday, we labored with Isolation Forest, which is an Anomaly Detection methodology.

    At present, we take a look at one other algorithm that has the identical goal. However in contrast to Isolation Forest, it does not construct timber.

    It’s known as LOF, or Native Outlier Issue.

    Folks typically summarize LOF with one sentence: Does this level dwell in a area with a decrease density than its neighbors?

    This sentence is definitely difficult to grasp. I struggled with it for a very long time.

    Nonetheless, there’s one half that’s instantly straightforward to grasp,
    and we’ll see that it turns into the important thing level:
    there’s a notion of neighbors.

    And as quickly as we discuss neighbors,
    we naturally return to distance-based fashions.

    We are going to clarify this algorithm in 3 steps.

    To maintain issues quite simple, we’ll use this dataset, once more:

    1, 2, 3, 9

    Do you do not forget that I’ve the copyright on this dataset? We did Isolation Forest with it, and we’ll do LOF with it once more. And we will additionally evaluate the 2 outcomes.

    LOF in Excel with 3 steps- all photographs by creator

    All of the Excel information can be found by means of this Kofi link. Your help means lots to me. The value will enhance in the course of the month, so early supporters get the perfect worth.

    All Excel/Google sheet information for ML and DL

    Step 1 – okay Neighbors and k-distance

    LOF begins with one thing very simple:

    Have a look at the distances between factors.
    Then discover the okay nearest neighbors of every level.

    Allow us to take okay = 2, simply to maintain issues minimal.

    Nearest neighbors for every level

    • Level 1 → neighbors: 2 and three
    • Level 2 → neighbors: 1 and three
    • Level 3 → neighbors: 2 and 1
    • Level 9 → neighbors: 3 and a couple of

    Already, we see a transparent construction rising:

    • 1, 2, and three type a decent cluster
    • 9 lives alone, removed from the others

    The k-distance: a neighborhood radius

    The k-distance is just the most important distance among the many okay nearest neighbors.

    And that is really the important thing level.

    As a result of this single quantity tells you one thing very concrete:
    the native radius across the level.

    If k-distance is small, the purpose is in a dense space.
    If k-distance is massive, the purpose is in a sparse space.

    With simply this one measure, you have already got a primary sign of “isolation”.

    Right here, we use the concept of “okay nearest neighbors”, which in fact reminds us of k-NN (the classifier or regressor).
    The context right here is totally different, however the calculation is strictly the identical.

    And when you consider k-means, don’t combine them:
    the “okay” in k-means has nothing to do with the “okay” right here.

    The k-distance calculation

    For level 1, the 2 nearest neighbors are 2 and 3 (distances 1 and a couple of), so k-distance(1) = 2.

    For level 2, neighbors are 1 and 3 (each at distance 1), so k-distance(2) = 1.

    For level 3, the 2 nearest neighbors are 1 and 2 (distances 2 and 1), so k-distance(3) = 2.

    For level 9, neighbors are 3 and 2 (6 and seven), so k-distance(9) = 7. That is large in comparison with all of the others.

    In Excel, we will do a pairwise distance matrix to get the k-distance for every level.

    LOF in Excel – picture by creator

    Step 2 – Reachability Distances

    For this step, I’ll simply outline the calculations right here, and apply the formulation in Excel. As a result of, to be trustworthy, I by no means succeeded find a very intuitive approach to clarify the outcomes.

    So, what’s “reachability distance”?

    For some extent p and a neighbor o, we outline this reachability distance as:

    reach-dist(p, o) = max(k-dist(o), distance(p, o))

    Why take the utmost?

    The aim of reachability distance is to stabilize density comparability.

    If the neighbor o lives in a really dense area (small k-dist), then we don’t wish to permit an unrealistically small distance.

    Specifically, for level 2:

    • Distance to 1 = 1, however k-distance(1) = 2 → reach-dist(2, 1) = 2
    • Distance to three = 1, however k-distance(3) = 2 → reach-dist(2, 3) = 2

    Each neighbors drive the reachability distance upward.

    In Excel, we’ll maintain a matrix format to show the reachability distances: one level in comparison with all of the others.

    LOF in Excel – picture by creator

    Common reachability distance

    For every level, we will now compute the common worth, which tells us: on common, how far do I must journey to achieve my native neighborhood?

    And now, do you discover one thing: the purpose 2 has a bigger common reachability distance than 1 and three.

    This isn’t that intuitive to me!

    Step 3 – LRD and the LOF Rating

    The ultimate step is type of a “normalization” to seek out an anomaly rating.

    First, we outline the LRD, Native Reachability Density, which is just the inverse of the common reachability distance.

    And the ultimate LOF rating is calculated as:

    So, LOF compares the density of some extent to the density of its neighbors.

    Interpretation:

    • If LRD(p) ≈ LRD (neighbors), then LOF ≈ 1
    • If LRD(p) is far smaller, then LOF >> 1. So p is in a sparse area
    • If LRD(p) is far bigger → LOF < 1. So p is in a really dense pocket.

    I additionally did a model with extra developments, and shorter formulation.

    Understanding What “Anomaly” Means in Unsupervised Fashions

    In unsupervised studying, there isn’t any floor reality. And that is precisely the place issues can change into difficult.

    We would not have labels.
    We would not have the “appropriate reply”.
    We solely have the construction of the information.

    Take this tiny pattern:

    1, 2, 3, 7, 8, 12
    (I even have the copyright on it.)

    Should you take a look at it intuitively, which one appears like an anomaly?

    Personally, I might say 12.

    Now allow us to take a look at the outcomes. LOF says the outlier is 7.

    (And you may discover that with k-distance, we’d say that it’s 12.)

    LOF in Excel – picture by creator

    Now, we will evaluate Isolation Forest and LOF facet by facet.

    On the left, with the dataset 1, 2, 3, 9, each strategies agree:
    9 is the clear outlier.
    Isolation Forest offers it the bottom rating,
    and LOF offers it the best LOF worth.

    If we glance nearer, for Isolation Forest: 1, 2 and three haven’t any variations in rating. And LOF offers a better rating for two. That is what we already observed.

    With the dataset 1, 2, 3, 7, 8, 12, the story modifications.

    • Isolation Forest factors to 12 as essentially the most remoted level.
      This matches the instinct: 12 is much from everybody.
    • LOF, nevertheless, highlights 7 as a substitute.
    LOF in Excel – picture by creator

    So who is true?

    It’s tough to say.

    In observe, we first must agree with enterprise groups on what “anomaly” really means within the context of our knowledge.

    As a result of in unsupervised studying, there isn’t any single reality.

    There may be solely the definition of “anomaly” that every algorithm makes use of.

    Because of this this can be very vital to grasp
    how the algorithm works, and how much anomalies it’s designed to detect.

    Solely then are you able to resolve whether or not LOF, or k-distance, or Isolation Forest is the fitting selection on your particular scenario.

    And that is the entire message of unsupervised studying:

    Completely different algorithms take a look at the information otherwise.
    There isn’t a “true” outlier.
    Solely the definition of what an outlier means for every mannequin.

    Because of this understanding how the algorithm works
    is extra vital than the ultimate rating it produces.

    Conclusion

    LOF and Isolation Forest each detect anomalies, however they take a look at the information by means of fully totally different lenses.

    • k-distance captures how far some extent should journey to seek out its neighbors.
    • LOF compares native densities.
    • Isolation Forest isolates factors utilizing random splits.

    And even on quite simple datasets, these strategies can disagree.
    One algorithm might flag some extent as an outlier, whereas one other highlights a totally totally different one.

    And that is the important thing message:

    In unsupervised studying, there isn’t any “true” outlier.
    Every algorithm defines anomalies based on its personal logic.

    Because of this understanding how a way works is extra vital than the quantity it produces.
    Solely then are you able to select the fitting algorithm for the fitting scenario, and interpret the outcomes with confidence.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    Comments are closed.

    Editors Picks

    Our Favorite Apple Watch Has Never Been Less Expensive

    April 19, 2026

    Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)

    April 19, 2026

    Today’s NYT Strands Hints, Answer and Help for April 20 #778

    April 19, 2026

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Extraordinary Allen Millyard-enhanced ‘widowmaker’ goes to auction

    January 31, 2025

    Ecuador president Daniel Noboa could legalize gambling in upcoming referendum

    August 7, 2025

    The Groove Thing Is a Bluetooth Speaker and Vibrator Combo, Because Why Not?

    June 25, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.