Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform
    • How to Edit, Merge, and Split PDFs With Free Online Tools
    • Florida crackdown targets illegal machines in Sarasota
    • Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds
    • New radio bursts detected from binary stars
    • Remarkable, Catalysr and Indigenous pre-accelerators score NSW government support for diverse founders
    • Whoop Promo Codes May 2026: 20% Off | June 2026
    • Hawthorne bankruptcy dispute targets Illinois racing funds
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel
    Artificial Intelligence

    The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

    Editor Times FeaturedBy Editor Times FeaturedDecember 23, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    of this collection, we are going to discuss deep studying.

    And when folks discuss deep studying, we instantly consider these photos of deep neural networks architectures, with many layers, neurons, and parameters.

    In apply, the true shift launched by deep studying is elsewhere.

    It’s about studying knowledge representations.

    On this article, we deal with textual content embeddings, clarify their position within the machine studying panorama, and present how they are often understood and explored in Excel.

    1. Basic Machine incomes vs. Deep studying

    We are going to talk about, on this half, why embedding is launched.

    1.1 The place does deep studying match?

    To grasp embeddings, we first must make clear the place of deep studying.

    We are going to use the time period traditional machine studying to explain strategies that don’t depend on deep architectures.

    All of the earlier articles take care of traditional machine studying, that may be described in two complementary methods.

    Studying paradigms

    • Supervised studying
    • Unsupervised studying

    Mannequin households

    • Distance-based fashions
    • Tree-based fashions
    • Weight-based fashions

    Throughout this collection, now we have already studied the training algorithms behind these fashions. Specifically, now we have seen that gradient descent applies to all weight-based fashions, from linear regression to neural networks.

    Deep studying is usually diminished to neural networks with many layers.

    However this clarification is incomplete.

    From an optimization standpoint, deep studying doesn’t introduce a brand new studying rule.

    So what does it introduce?

    1.2 Deep studying as knowledge illustration studying

    Deep studying is about how options are created.

    As an alternative of manually designing options, deep studying learns representations routinely, typically via a number of successive transformations.

    This additionally raises an necessary conceptual query:

    The place is the boundary between characteristic engineering and mannequin studying?

    Some examples make this clearer:

    • Polynomial regression continues to be a linear mannequin, however the options are polynomial
    • Kernel strategies mission knowledge right into a high-dimensional characteristic area
    • Density-based strategies implicitly rework the info earlier than studying

    Deep studying continues this concept, however at scale.

    From this angle, deep studying belongs to:

    • the characteristic engineering philosophy, for illustration
    • the weight-based mannequin household, for studying

    1.3 Pictures and convolutional neural networks

    Pictures are represented as pixels.

    From a technical standpoint, picture knowledge is already numerical and structured: a grid of numbers. Nonetheless, the data contained in these pixels is just not structured in a means that classical fashions can simply exploit.

    Pixels don’t explicitly encode: edges, shapes, textures, or objects.

    Convolutional Neural Networks (CNNs) are designed to create data from pixels. They apply filters to detect native patterns, then progressively mix them into higher-level representations.

    I’ve revealed a this article exhibiting how CNNs could be carried out in Excel to make this course of specific.

    CNN in Excel – all photos by writer

    For photos, the problem is not to make the info numerical, however to extract significant representations from already numerical knowledge.

    1.4 Textual content knowledge: a unique downside

    Textual content presents a essentially completely different problem.

    Not like photos, textual content is not numerical by nature.

    Earlier than modeling context or order, the primary downside is extra fundamental:

    How can we characterize phrases numerically?

    Making a numerical illustration for textual content step one.

    In deep studying for textual content, this step is dealt with by embeddings.

    Embeddings rework discrete symbols (phrases) into vectors that fashions can work with. As soon as embeddings exist, we will then mannequin: context, order and relationships between phrases.

    On this article, we deal with this primary and important step:
    how embeddings create numerical representations for textual content, and the way this course of could be explored in Excel.

    2. Two methods to be taught textual content embeddings

    On this article, we are going to use the IMDB film opinions dataset for example each approaches. The dataset is distributed underneath the Apache License 2.0.

    There are two major methods to be taught embeddings for textual content, and we are going to do each with this dataset:

    • supervised: we are going to create embeddings to foretell the sentiment
    • unsupervised or self-supervised: we are going to use word2vec algorithm

    In each instances, the objective is identical:
    to remodel phrases into numerical vectors that can be utilized by machine studying fashions.

    Earlier than evaluating these two approaches, we first must make clear what embeddings are and the way they relate to traditional machine studying.

    IMDB dataset picture by writer –Apache License 2.0

    2.1 Embeddings and traditional machine studying

    In traditional machine studying, categorical knowledge is normally dealt with with:

    • label encoding, which assigns fastened integers however introduces synthetic order
    • one-hot encoding, which removes order however produces high-dimensional sparse vectors

    How they can be utilized rely upon the character of the fashions.

    Distance-based fashions can’t successfully use one-hot encoding, as a result of all classes find yourself being equally distant from one another. Label encoding may work provided that we will attribute significant numerical values for the classes, which is mostly not the case in traditional fashions.

    Weight-based fashions can use one-hot encoding, as a result of the mannequin learns a weight for every class. In distinction, with label encoding, the numerical values are fastened and can’t be adjusted to characterize significant relationships.

    Tree-based fashions deal with all variables as categorical splits quite than numerical magnitudes, which makes label encoding acceptable in apply. Nonetheless, most implementations, together with scikit-learn, nonetheless require numerical inputs. Because of this, classes have to be transformed to numbers, both via label encoding or one-hot encoding. If the numerical values carried semantic which means, this might once more be useful.

    Total, this highlights a limitation of traditional approaches:
    class values are fastened and never realized.

    Embeddings prolong this concept by studying the illustration itself.
    Every phrase is related to a trainable vector, turning the illustration of classes right into a studying downside quite than a preprocessing step.

    2.2 Supervised embeddings

    In supervised studying, embeddings are realized as a part of a prediction process.

    For instance, the IMDB dataset has labels in regards to the in sentiment evaluation. So we will create a quite simple structure:

    In our case, we will use a quite simple structure: every phrase is mapped to a one-dimensional embedding

    That is potential as a result of the target is binary sentiment classification.

    As soon as coaching is full, we will export the embeddings and discover them in Excel.

    When plotting the embeddings on the x-axis and phrase frequency on the y-axis, a transparent sample seems:

    • constructive values are related to phrases comparable to wonderful or great,
    • destructive values are related to phrases comparable to worst or waste

    Relying on the initialization, the signal could be inverted, because the logistic regression layer additionally has parameters that affect the ultimate prediction.

    Lastly, in Excel, we reconstruct the total pipeline that corresponds to the structure we outline early.

    Enter column
    The enter textual content (a evaluation) is reduce into phrases, and every row corresponds to at least one phrase.

    Embedding search
    Utilizing a lookup perform, the embedding worth related to every phrase is retrieved from the embedding desk realized throughout coaching.

    World common
    The worldwide common embedding is computed by averaging the embeddings of all phrases seen to date. This corresponds to a quite simple sentence illustration: the imply of phrase vectors.

    Likelihood prediction
    The averaged embedding is then handed via a logistic perform to supply a sentiment chance.

    What we observe

    • Phrases with strongly constructive embeddings (for instance wonderful, love, enjoyable) push the common upward.
    • Phrases with strongly destructive embeddings (for instance worst, horrible, waste) pull the common downward.
    • Impartial or weakly weighted phrases have little affect.

    As extra phrases are added, the worldwide common embedding stabilizes, and the sentiment prediction turns into extra assured.

    2.3 Word2Vec: embeddings from co-occurrence

    In Word2Vec, similarity doesn’t imply that two phrases have the identical which means.
    It implies that they seem in comparable contexts.

    Word2Vec learns phrase embeddings by taking a look at which phrases are inclined to co-occur inside a hard and fast window within the textual content. Two phrases are thought-about comparable in the event that they typically seem across the similar neighboring phrases, even when their meanings are reverse.

    As proven within the Excel sheet under, we compute the cosine similarity for the phrase good and retrieve essentially the most comparable phrases.

    From the mannequin’s perspective, the encircling phrases are nearly an identical. The one factor that modifications is the adjective itself.

    Because of this, Word2Vec learns that “good” and “unhealthy” play an identical position in language, though their meanings are reverse.

    So, Word2Vec captures distributional similarity, not semantic polarity.

    A helpful means to consider it’s:

    Phrases are shut if they’re utilized in the identical locations.

    2.4 How embeddings are used

    In trendy methods comparable to RAG (Retrieval-Augmented Era), embeddings are sometimes used to retrieve paperwork or passages for query answering.

    Nonetheless, this method has limitations.

    Mostly used embeddings are skilled in a self-supervised means, based mostly on co-occurrence or contextual prediction goals. Because of this, they seize normal language similarity, not task-specific which means.

    Because of this:

    • embeddings might retrieve textual content that’s linguistically comparable however not related
    • semantic proximity doesn’t assure reply correctness

    Different embedding methods can be utilized, together with task-adapted or supervised embeddings, however they typically stay self-supervised at their core.

    Understanding how embeddings are created, what they encode, and what they don’t encode is due to this fact important earlier than utilizing them in downstream methods comparable to RAG.

    Conclusion

    Embeddings are realized numerical representations of phrases that make similarity measurable.

    Whether or not realized via supervision or via co-occurrence, embeddings map phrases to vectors based mostly on how they’re utilized in knowledge. By exporting them to Excel, we will examine these representations straight, compute similarities, and perceive what they seize and what they don’t.

    This makes embeddings much less mysterious and clarifies their position as a basis for extra complicated methods comparable to retrieval or RAG.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Escaping the Valley of Choice in BI

    June 2, 2026

    Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

    June 1, 2026

    RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

    June 1, 2026

    How to Combine Claude Code and Codex for Maximum Coding Power

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Comments are closed.

    Editors Picks

    London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

    June 2, 2026

    How to Edit, Merge, and Split PDFs With Free Online Tools

    June 2, 2026

    Florida crackdown targets illegal machines in Sarasota

    June 2, 2026

    Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    IEEE Connects Hardware Startups With Investors

    April 16, 2026

    Only 7% of Australian businesses broadly use AI

    May 15, 2026

    Etteliot S urine vaporizer for off-grid dry separating toilets

    November 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.