Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Our Favorite Apple Watch Has Never Been Less Expensive
    • Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)
    • Today’s NYT Strands Hints, Answer and Help for April 20 #778
    • KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.
    • OneOdio Focus A1 Pro review
    • The 11 Best Fans to Buy Before It Gets Hot Again (2026)
    • A look at Dylan Patel’s SemiAnalysis, an AI newsletter and research firm that expects $100M+ in 2026 revenue from subscriptions and AI supply chain research (Abram Brown/The Information)
    • ‘Euphoria’ Season 3 Release Schedule: When Does Episode 2 Come Out?
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»GliNER2: Extracting Structured Information from Text
    Artificial Intelligence

    GliNER2: Extracting Structured Information from Text

    Editor Times FeaturedBy Editor Times FeaturedJanuary 6, 2026No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    , we had SpaCy, which was the de facto NLP library for each rookies and superior customers. It made it simple to dip your toes into NLP, even in case you weren’t a deep studying knowledgeable. Nevertheless, with the rise of ChatGPT and different LLMs, it appears to have been moved apart.

    Whereas LLMs like Claude or Gemini can do all types of NLP issues automagically, you don’t all the time wish to deliver a rocket launcher to a fist struggle. GliNER is spearheading the return of smaller, centered fashions for traditional NLP strategies like entity and relationship extraction. It’s light-weight sufficient to run on a CPU, but highly effective sufficient to have constructed a thriving neighborhood round it.

    Launched earlier this yr, GliNER2 is a big leap ahead. The place the unique GliNER centered on entity recognition (spawning numerous spin-offs like GLiREL for relations and GLiClass for classification), GliNER2 unifies named entity recognition, textual content classification, relation extraction, and structured knowledge extraction right into a single framework.

    The core shift in GliNER2 is its schema-driven method, which lets you outline extraction necessities declaratively and execute a number of duties in a single inference name. Regardless of these expanded capabilities, the mannequin stays CPU-efficient, making it a great answer for reworking messy, unstructured textual content into clear knowledge with out the overhead of a big language mannequin.
    As a data graph fanatic at Neo4j, I’ve been significantly drawn to newly added structured knowledge extraction by way of extract_json methodology. Whereas entity and relation extraction are worthwhile on their very own, the flexibility to outline a schema and pull structured JSON instantly from textual content is what actually excites me. It’s a pure match for data graph ingestion, the place structured, constant output is crucial.

    Establishing data graphs with GliNER2. Picture by writer.

    On this weblog put up, we’ll consider GliNER2’s capabilities, particularly the mannequin fastino/gliner2-large-v1, with a give attention to how nicely it could actually assist us construct clear, structured data graphs.

    The code is accessible on GitHub.

    Dataset choice

    We’re not operating formal benchmarks right here, only a fast vibe examine to see what GliNER2 can do. Right here’s our take a look at textual content, pulled from the Ada Lovelace Wikipedia page:

    Augusta Ada King, Countess of Lovelace (10 December 1815–27 November 1852), often known as Ada Lovelace, was an English mathematician and author mainly identified for work on Charles Babbage’s proposed mechanical general-purpose laptop, the analytical engine. She was the primary to recognise the machine had purposes past pure calculation. Lovelace is usually thought of the primary laptop programmer. Lovelace was the one legit little one of poet Lord Byron and reformer Anne Isabella Milbanke. All her half-siblings, Lord Byron’s different youngsters, have been born out of wedlock to different girls. Lord Byron separated from his spouse a month after Ada was born, and left England perpetually. He died in Greece throughout the Greek Struggle of Independence, when she was eight. Girl Byron was anxious about her daughter’s upbringing and promoted Lovelace’s curiosity in arithmetic and logic, to stop her creating her father’s perceived madness. Regardless of this, Lovelace remained taken with her father, naming one son Byron and the opposite, for her father’s center title, Gordon. Lovelace was buried subsequent to her father at her request. Though usually unwell in childhood, Lovelace pursued her research assiduously. She married William King in 1835. King was a Baron, and was created Viscount Ockham and 1st Earl of Lovelace in 1838. The title Lovelace was chosen as a result of Ada was descended from the extinct Baron Lovelaces. The title given to her husband thus made Ada the Countess of Lovelace.

    At 322 tokens, it’s a strong chunk of textual content to work with. Let’s dive in.

    Entity extraction

    Let’s begin with entity extraction. At its core, entity extraction is the method of robotically figuring out and categorizing key entities inside textual content, akin to individuals, areas, organizations, or technical ideas. GliNER1 already dealt with this nicely, however GliNER2 takes it additional by letting you add descriptions to entity varieties, supplying you with finer management over what will get extracted.

    entities = extractor.extract_entities(
        textual content,
        {
            "Particular person": "Names of individuals, together with the Aristocracy titles.",
            "Location": "Nations, cities, or geographic locations.",
            "Invention": "Machines, units, or technological creations.",
            "Occasion": "Historic occasions, wars, or conflicts."
        }
    )

    The outcomes are the next:

    Entity extraction outcomes. Picture by writer.

    Offering customized descriptions for every entity sort helps resolve ambiguity and improves extraction accuracy. That is particularly helpful for broad classes like Occasion, the place by itself, the mannequin won’t know whether or not to incorporate wars, ceremonies, or private milestones. Including historic occasions, wars, or conflicts clarifies the supposed scope.

    Relation extraction

    Relation extraction identifies relationships between pairs of entities in textual content. For instance, within the sentence “Steve Jobs based Apple”, a relation extraction mannequin would determine the connection Based between the entities Steve Jobs and Apple.

    With GLiNER2, you outline solely the relation varieties you wish to extract as you possibly can’t constrain which entity varieties are allowed as the top or tail of every relation. This simplifies the interface however might require post-processing to filter undesirable pairings.

    Right here, I added a easy experiment by including each the alias and the same_as relationship definitions.

    relations = extractor.extract_relations(
        textual content,
        {
            "parent_of": "An individual is the dad or mum of one other individual",
            "married_to": "An individual is married to a different individual",
            "worked_on": "An individual contributed to or labored on an invention",
            "invented": "An individual created or proposed an invention",
            "alias": "Entity is an alias, nickname, title, or alternate reference for one more entity",
            "same_as": "Entity is an alias, nickname, title, or alternate reference for one more entity"
        }
    )

    The outcomes are the next:

    Relation extraction outcomes. Picture by writer.

    The extraction appropriately recognized key relationships: Lord Byron and Anne Isabella Milbanke as Ada’s mother and father, her marriage to William King, Babbage as inventor of the analytical engine, and Ada’s work on it. Notably, the mannequin detected Augusta Ada King as an alias of Ada Lovelace, however same_as wasn’t captured regardless of having an an identical description. The choice doesn’t appear random because the mannequin all the time populates the alias however by no means the same_as relationship. This highlights how delicate relation extraction is to label naming, not simply descriptions.

    Conveniently, GLiNER2 permits combining a number of extraction varieties in a single name so you will get entity varieties alongside relation varieties in a single cross. Nevertheless, the operations are unbiased: entity extraction doesn’t filter or constrain which entities seem in relation extraction, and vice versa. Consider it as operating each extractions in parallel somewhat than as a pipeline.

    schema = (extractor.create_schema()
        .entities({
            "Particular person": "Names of individuals, together with the Aristocracy titles.",
            "Location": "Nations, cities, or geographic locations.",
            "Invention": "Machines, units, or technological creations.",
            "Occasion": "Historic occasions, wars, or conflicts."
        })
        .relations({
            "parent_of": "An individual is the dad or mum of one other individual",
            "married_to": "An individual is married to a different individual",
            "worked_on": "An individual contributed to or labored on an invention",
            "invented": "An individual created or proposed an invention",
            "alias": "Entity is an alias, nickname, title, or alternate reference for one more entity"
        })
    )
    
    outcomes = extractor.extract(textual content, schema)

    The outcomes are the next:

    Mixed entity and relation extraction outcomes. Picture by writer.

    The mixed extraction now offers us entity varieties, that are distinguished by shade. Nevertheless, a number of nodes seem remoted (Greece, England, Greek Struggle of Independence) since not each extracted entity participates in a detected relationship.

    Structured JSON extraction

    Maybe essentially the most highly effective characteristic is structured knowledge extraction by way of extract_json. This mimics the structured output performance of LLMs like ChatGPT or Gemini however runs totally on CPU. Not like entity and relation extraction, this allows you to outline arbitrary fields and pull them into structured information. The syntax follows a field_name::sort::description sample, the place sort is str or listing.

    outcomes = extractor.extract_json(
        textual content,
        {
            "individual": [
                "name::str",
                "gender::str::male or female",
                "alias::str::brief summary of included information about the person",
                "description::str",
                "birth_date::str",
                "death_date::str",
                "parent_of::str",
                "married_to::str"
            ]
        }
    )

    Right here we’re experimenting with some overlap: alias, parent_of, and married_to may be modeled as relations. It’s value exploring which method works higher in your use case. One fascinating addition is the description subject, which pushes the boundaries a bit: it’s nearer to abstract technology than pure extraction.

    The outcomes are the next:

    {
      "individual": [
        {
          "name": "Augusta Ada King",
          "gender": null,
          "alias": "Ada Lovelace",
          "description": "English mathematician and writer",
          "birth_date": "10 December 1815",
          "death_date": "27 November 1852",
          "parent_of": "Ada Lovelace",
          "married_to": "William King"
        },
        {
          "name": "Charles Babbage",
          "gender": null,
          "alias": null,
          "description": null,
          "birth_date": null,
          "death_date": null,
          "parent_of": "Ada Lovelace",
          "married_to": null
        },
        {
          "name": "Lord Byron",
          "gender": null,
          "alias": null,
          "description": "reformer",
          "birth_date": null,
          "death_date": null,
          "parent_of": "Ada Lovelace",
          "married_to": null
        },
        {
          "name": "Anne Isabella Milbanke",
          "gender": null,
          "alias": null,
          "description": "reformer",
          "birth_date": null,
          "death_date": null,
          "parent_of": "Ada Lovelace",
          "married_to": null
        },
        {
          "name": "William King",
          "gender": null,
          "alias": null,
          "description": null,
          "birth_date": null,
          "death_date": null,
          "parent_of": "Ada Lovelace",
          "married_to": null
        }
      ]
    }

    The outcomes reveal some limitations. All gender fields are null, regardless that Ada is explicitly known as a daughter, the mannequin doesn’t infer she’s feminine. The description subject captures solely surface-level phrases (“English mathematician and author”, “reformer”) somewhat than producing significant summaries, not helpful for workflows like Microsoft’s GraphRAG that depend on richer entity descriptions. There are additionally clear errors: Charles Babbage and William King are incorrectly marked as parent_of Ada, and Lord Byron is labeled a reformer (that’s Anne Isabella). These errors with parent_ofdidn’t come up throughout relation extraction, so maybe that’s the higher methodology right here. General, the outcomes suggests the mannequin excels at extraction however struggles with reasoning or inference, doubtless a tradeoff of its compact dimension.

    Moreover, all attributes are non-compulsory, which is smart and simplifies issues. Nevertheless, you need to watch out as typically the title attribute might be null, therefore making the document invalid. Lastly, we might use one thing like PyDantic to validate outcomes and solid to to applicable varieties like floats or dates and deal with invalid outcomes.

    Establishing data graphs

    Since GLiNER2 permits a number of extraction varieties in a single cross, we will mix all above strategies to assemble a data graph. Quite than operating separate pipelines for entity, relation, and structured knowledge extraction, a single schema definition handles all three. This makes it simple to go from uncooked textual content to a wealthy, interconnected illustration.

    schema = (extractor.create_schema()
        .entities({
            "Particular person": "Names of individuals, together with the Aristocracy titles.",
            "Location": "Nations, cities, or geographic locations.",
            "Invention": "Machines, units, or technological creations.",
            "Occasion": "Historic occasions, wars, or conflicts."
        })
        .relations({
            "parent_of": "An individual is the dad or mum of one other individual",
            "married_to": "An individual is married to a different individual",
            "worked_on": "An individual contributed to or labored on an invention",
            "invented": "An individual created or proposed an invention",
        })
        .construction("individual")
            .subject("title", dtype="str")
            .subject("alias", dtype="str")
            .subject("description", dtype="str")
            .subject("birth_date", dtype="str")
    )
    
    outcomes = extractor.extract(textual content, schema)

    The way you map these outputs to your graph (nodes, relationships, properties) depends upon your knowledge mannequin. On this instance, we use the next knowledge mannequin:

    Data graph development end result. Picture by writer.

    What you possibly can discover is that we embody the unique textual content chunk within the graph as nicely, which permits us to retrieve and reference the supply materials when querying the graph, enabling extra correct and traceable outcomes. The import Cypher seems to be like the next:

    import_cypher_query = """
    // Create Chunk node from textual content
    CREATE (c:Chunk {textual content: $textual content})
    
    // Create Particular person nodes with properties
    WITH c
    CALL (c) {
      UNWIND $knowledge.individual AS p
      WITH p
      WHERE p.title IS NOT NULL
      MERGE (n:__Entity__ {title: p.title})
      SET n.description = p.description,
          n.birth_date = p.birth_date
      MERGE (c)-[:MENTIONS]->(n)
      WITH p, n WHERE p.alias IS NOT NULL
      MERGE (m:__Entity__ {title: p.alias})
      MERGE (n)-[:ALIAS_OF]->(m)
    }
    
    // Create entity nodes dynamically with __Entity__ base label + dynamic label
    CALL (c) {
      UNWIND keys($knowledge.entities) AS label
      UNWIND $knowledge.entities[label] AS entityName
      MERGE (n:__Entity__ {title: entityName})
      SET n:$(label)
      MERGE (c)-[:MENTIONS]->(n)
    }
    
    // Create relationships dynamically
    CALL (c) {
      UNWIND keys($knowledge.relation_extraction) AS relType
      UNWIND $knowledge.relation_extraction[relType] AS rel
      MATCH (a:__Entity__ {title: rel[0]})
      MATCH (b:__Entity__ {title: rel[1]})
      MERGE (a)-[:$(toUpper(relType))]->(b)
    }
    RETURN distinct 'import accomplished' AS end result
    """

    The Cypher question takes the outcomes from GliNER2 output and shops them into Neo4j. We might additionally embody embeddings for the textual content chunks, entities, and so forth.

    Abstract

    GliNER2 is a step in the appropriate route for structured knowledge extraction. With the rise of LLMs, it’s simple to succeed in for ChatGPT or Claude every time that you must pull data from textual content, however that’s usually overkill. Operating a multi-billion-parameter mannequin to extract a couple of entities and relationships feels wasteful when smaller, specialised instruments can do the job on a CPU.

    GliNER2 unifies named entity recognition, relation extraction, and structured JSON output right into a single framework. It’s well-suited for duties like data graph development, the place you want constant, schema-driven extraction somewhat than open-ended technology.
    Whereas the mannequin has its limitations. It really works finest for direct extraction somewhat than inference or reasoning, and outcomes might be inconsistent. However the progress from the unique GliNER1 to GliNER2 is encouraging, and hopefully we’ll see continued growth on this house. For a lot of use instances, a centered extraction mannequin beats an LLM that’s doing way over you want.

    The code is accessible on GitHub.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    Comments are closed.

    Editors Picks

    Our Favorite Apple Watch Has Never Been Less Expensive

    April 19, 2026

    Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)

    April 19, 2026

    Today’s NYT Strands Hints, Answer and Help for April 20 #778

    April 19, 2026

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    World’s first self-charging solar motorcycle

    November 24, 2025

    New Study Shows Smartwatch Stress Sensors Have No Idea What They’re Doing

    August 13, 2025

    New anti-gambling laws come into effect in Japan

    September 28, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.