Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Give Mom Warm Coffee All Year Long With This Ember Smart Mug Deal
    • AI startups are struggling to access Nvidia GPUs as Microsoft and other cloud providers divert supply to internal teams and large customers like OpenAI (The Information)
    • Instagram’s New ‘Instants’ App Could Let You Ditch the Edits and Just Be Real for 24 Hours
    • Introduction to Approximate Solution Methods for Reinforcement Learning
    • AI infrared grill brings real grilled flavor indoors
    • The Latest Push to Extend Key US Spy Powers Is Still a Mess
    • Anthropic says Google is committing to invest $10B now in cash at a $350B valuation and will invest another $30B if Anthropic hits performance targets (Bloomberg)
    • ‘Apex’ Review: Charlize Theron Netflix Thriller Avoids Rock Bottom, but Barely
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Friday, April 24
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability
    Artificial Intelligence

    JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability

    Editor Times FeaturedBy Editor Times FeaturedDecember 2, 2025No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Introduction

    marketing campaign you arrange for Black Friday was an enormous success, and clients begin pouring into your web site. Your Mixpanel setup which might often have round 1000 buyer occasions an hour finally ends up having hundreds of thousands of buyer occasions inside an hour. Thereby, your knowledge pipeline is now tasked with parsing huge quantities of JSON knowledge and storing it in your database. You see that your commonplace JSON parsing library just isn’t in a position to scale as much as the sudden knowledge development, and your close to real-time analytics reviews fall behind. That is if you notice the significance of an environment friendly JSON parsing library. Along with dealing with massive payloads, JSON parsing libraries ought to be capable of serialize and deserialize extremely nested JSON payloads.

    On this article, we discover Python parsing libraries for big payloads. We particularly take a look at the capabilities of ujson, orjson, and ijson. We then benchmark the usual JSON library (stdlib/json), ujson, and orjson for serialization and deserialization efficiency. As we use the phrases serialization and deserialization all through the article, right here’s a refresher on the ideas. Serialization includes changing your Python objects to a JSON string, whereas Deserialization includes rebuilding the JSON string out of your Python knowledge constructions.

    As we progress by means of the article, you will see that a choice move diagram to assist determine on the parser to make use of primarily based in your workflow and distinctive parsing wants. Along with this, we additionally discover NDJSON and libraries to parse NDJSON payloads. Let’s get began.

    Stdlib JSON

    Stdlib JSON helps serialization for all primary Python knowledge sorts, together with dicts, lists, and tuples. When the perform json.masses() known as, it masses the complete JSON into reminiscence directly. That is superb for smaller payloads, however for bigger payloads, json.masses() may cause vital efficiency points reminiscent of out-of-memory errors and choking of downstream workflows. 

    import json
    
    with open("large_payload.json", "r") as f:
        json_data = json.masses(f)   #masses whole file into reminiscence, all tokens directly

    ijson

    For payloads which can be within the order of lots of of MBs, it’s advisable to make use of ijson. ijson, quick for ‘iterative json’, reads information one token at a time with out the reminiscence overhead. Within the code under, we examine json and ijson.

    #The ijson library reads information one token at a time
    import ijson
    with open("json_data.json", "r") as f:
        for document in ijson.objects(f, "objects.merchandise"): #fetch one dict from the array
           course of(document) 

    As you’ll be able to see, ijson fetches one aspect at a time from the JSON and masses it right into a Python dict object. That is then fed to the calling perform, on this case, the method(document) perform. The general working of ijson has been supplied within the illustration under.

    A high-level illustration of ijson (Picture by the Writer)

    ujson

    Ujson – Underneath the Hood (Picture by the Writer)

    Ujson has been a extensively used library in lots of purposes involving massive JSON payloads, because it was designed to be a sooner various to the stdlib JSON in Python. The velocity of parsing is nice for the reason that underlying code of ujson has been written in C, with Python bindings that connect with the Python interface. The areas that wanted enchancment in the usual JSON library have been optimized in Ujson for velocity and efficiency. However, Ujson is not utilized in newer tasks, because the makers themselves have talked about on PyPI that the library has been positioned in maintenance-only mode. Under is an illustration of ujson’s processes at a high-level.

    import ujson
    taxonomy_data = '{"id":1, "genus":"Thylacinus", "species":"cynocephalus", "extinct": true}'
    data_dict = ujson.masses(taxonomy_data) #Deserialize
    
    with open("taxonomy_data.json", "w") as fh: #Serialize
        ujson.dump(data_dict, fh) 
    
    with open("taxonomy_data.json", "r") as fh: #Deserialize
        knowledge = ujson.load(fh)
        print(knowledge)

    We transfer to the subsequent potential library named ‘orjson’.

    orjson

    Since Orjson is written in Rust, it’s optimized not just for velocity but in addition has memory-safe mechanisms to forestall buffer overflows that builders face whereas utilizing C-based JSON libraries like ujson. Furthermore, Orjson helps serialization of a number of extra datatypes past the usual Python datatypes, together with dataclass and datetime objects. One other key distinction between orjson and the opposite libraries is that orjson’s dumps() perform returns a bytes object, whereas the others return a string. Returning the info as a bytes object is among the principal causes for orjson’s quick throughput.

    import orjson
    book_payload = '{"id":1,"title":"The Nice Gatsby","creator":"F. Scott Fitzgerald","Publishing Home":"Charles Scribner's Sons"}'
    data_dict = orjson.masses(book_payload) #Deserialize
    print(data_dict)          
      
    with open("book_data.json", "wb") as f: #Serialize
        f.write(orjson.dumps(data_dict)) #Returns bytes object
    
    with open("book_data.json", "rb") as f:#Deserialize
        book_data = orjson.masses(f.learn())
        print(book_data)

    Now that we’ve explored some JSON parsing libraries, let’s take a look at their serialization capabilities.

    Testing Serialization Capabilities of JSON, ujson and orjson

    We create a pattern dataclass object with an integer, string and a datetime variable.

    from dataclasses import dataclass
    from datetime import datetime
    
    @dataclass
    class Consumer:
        id: int
        title: str
        created: datetime
    
    u = Consumer(id=1, title="Thomas", created=datetime.now())

    We then cross it to every of the libraries to see what occurs. We start with the stdlib JSON.

    import json
    attempt:
        print("json:", json.dumps(u))
    besides TypeError as e:
        print("json error:", e)

    As anticipated, we get the next error. (The usual JSON library doesn’t assist serialization of “dataclass” objects and datetime objects.)

    Subsequent, we take a look at the identical with the ujson library.

    import ujson
    attempt:
    print("json:", ujson.dumps(u))
    besides TypeError as e:
    print("json error:", e)

    As we see above, ujson just isn’t in a position to serialize the info class object and the datetime datatype. Lastly, we use the orjson library for serialization.

    import orjson
    attempt:
        print("orjson:", orjson.dumps(u))
    besides TypeError as e:
        print("orjson error:", e)

    We see that orjson was in a position to serialize each the dataclass and the datetime datatypes.

    Working with NDJSON (A particular Point out)

    We’ve seen the libraries for JSON parsing, however what about NDJSON? NDJSON (Newline Delimited JSON), as you may know, is a format wherein every line is a JSON object. In different phrases, the delimiter just isn’t a comma however a newline character. For example, that is what NDJSON appears like.

    {"id": "A13434", "title": "Ella"}
    {"id": "A13455", "title": "Charmont"}
    {"id": "B32434", "title": "Areida"}

    NDJSON is usually used for logs and streaming knowledge, and therefore, NDJSON payloads are wonderful candidates for being parsed utilizing the ijson library. For small to average NDJSON payloads, it’s endorsed to make use of the stdlib JSON. Aside from ijson and stdlib JSON, there’s a devoted NDJSON library. Under are code snippets displaying every strategy.

    NDJSON utilizing stdlib JSON & ijson

    As NDJSON just isn’t delimited by commas, it doesn’t qualify for a bulk load, as a result of stdlib json expects to see a listing of dicts. In different phrases, stdlib JSON’s parser appears for a single legitimate JSON aspect, however is as a substitute given a number of JSON parts within the payload file. Due to this fact, the file needs to be parsed iteratively, line by line, and despatched to the caller perform for additional processing.

    import json
    ndjson_payload = """{"id": "A13434", "title": "Ella"}
    {"id": "A13455", "title": "Charmont"}
    {"id": "B32434", "title": "Areida"}"""
    
    #Writing NDJSON file
    with open("json_lib.ndjson", "w", encoding="utf-8") as fh:
        for line in ndjson_payload.splitlines(): #Break up string into JSON obj
            fh.write(line.strip() + "n") #Write every JSON object as its line
    
    #Studying NDJSON file utilizing json.masses
    with open("json_lib.ndjson", "r", encoding="utf-8") as fh:
        for line in fh:
            if line.strip():                       #Take away new traces
                merchandise= json.masses(line)             #Deserialize
                print(merchandise) #or ship it to the caller perform

    With ijson, the parsing is completed as proven under. With commonplace JSON, we now have only one root aspect, which is both a dictionary if it’s a single JSON or an array if it’s a checklist of dicts. However with NDJSON, every line is its personal root aspect. The argument “” in ijson.objects() tells the ijson parser to have a look at every root aspect. The arguments “” and multiple_values=True let the ijson parser know that there are a number of JSON root parts within the file, and to fetch one line (every JSON) at a time.

    import ijson
    ndjson_payload = """{"id": "A13434", "title": "Ella"}
    {"id": "A13455", "title": "Charmont"}
    {"id": "B32434", "title": "Areida"}"""
    
    #Writing the payload to a file to be processed by ijson
    with open("ijson_lib.ndjson", "w", encoding="utf-8") as fh:
        fh.write(ndjson_payload)
    
    with open("ijson_lib.ndjson", "r", encoding="utf-8") as fh:
        for merchandise in ijson.objects(fh, "", multiple_values=True):
            print(merchandise)

    Lastly, we now have the devoted library NDJSON. It mainly converts the NDJSON format to straightforward JSON.

    import ndjson
    ndjson_payload = """{"id": "A13434", "title": "Ella"}
    {"id": "A13455", "title": "Charmont"}
    {"id": "B32434", "title": "Areida"}"""
    
    #writing the payload to a file to be processed by ijson
    with open("ndjson_lib.ndjson", "w", encoding="utf-8") as fh:
        fh.write(ndjson_payload)
    
    with open("ndjson_lib.ndjson", "r", encoding="utf-8") as fh:
        ndjson_data = ndjson.load(fh)   #returns a listing of dicts

    As you have got seen, NDJSON file codecs can often be parsed utilizing stdlib json and ijson. For very massive payloads, ijson is your best option as it’s memory-efficient. However in case you are trying to generate NDJSON payloads from different Python objects, the NDJSON library is the best selection. It is because the perform ndjson.dumps() routinely converts python objects to NDJSON format with out having to iterate over these knowledge constructions.

    Now that we’ve explored NDJSON, let’s pivot again to benchmarking the libraries stdlib json, ujson, and orjson.

    The explanation IJSON just isn’t thought of for Benchmarking

    ‘ijson’ being a streaming parser may be very completely different from the majority parsers that we checked out. If we benchmarked ijson together with these bulk parsers, we might be evaluating apples to oranges. Even when we benchmarked ijson alongside the opposite parsers, we might get the misunderstanding that ijson is the slowest, when the truth is it serves a distinct objective altogether. ijson is optimized for reminiscence effectivity and due to this fact has decrease throughput than bulk parsers.

    Producing a Artificial JSON Payload for Benchmarking Functions

    We generate a big artificial JSON payload having 1 million information, utilizing the library ‘mimesis’. This knowledge shall be used to benchmark the libraries. The under code can be utilized to create the payload for this benchmarking, when you want to replicate this. The generated file could be between 100 MB and 150 MB in measurement, which I imagine, is massive sufficient to conduct checks on benchmarking.

    from mimesis import Particular person, Deal with
    import json
    person_name = Particular person("en")
    complete_address = Deal with("en")
    
    #streaming to a file
    with open("large_payload.json", "w") as fh:
        fh.write("[")  #JSON array
        for i in range(1_000_000):
            payload = {
                "id": person_name.identifier(),
                "name": person_name.full_name(),
                "email": person_name.email(),
                "address": {
                    "street": complete_address.street_name(),
                    "city": complete_address.city(),
                    "postal_code": complete_address.postal_code()
                }
            }
            json.dump(payload, fh)
            if i < 999_999: #To prevent a comma at the last entry
                fh.write(",") 
        fh.write("]")   #finish JSON array

    Under is a pattern of what the generated knowledge would seem like. As you’ll be able to see, the deal with fields are nested to make sure that the JSON isn’t just massive in measurement but in addition represents real-world hierarchical JSONs.

    [
      {
        "id": "8177",
        "name": "Willia Hays",
        "email": "[email protected]",
        "deal with": {
          "road": "Emerald Cove",
          "metropolis": "Crown Level",
          "postal_code": "58293"
        }
      },
      {
        "id": "5931",
        "title": "Quinn Greer",
        "e-mail": "[email protected]",
        "deal with": {
          "road": "Ohlone",
          "metropolis": "Bridgeport",
          "postal_code": "92982"
        }
      }
    ]

    Let’s begin with benchmarking.

    Benchmarking Pre-requisites

    We use the learn() perform to retailer the JSON file as a string. We then use the hundreds() perform in every of the libraries (json, ujson, and orjson) to deserialize the JSON string right into a Python object. Initially, we create the payload_str object from the uncooked JSON textual content.

    with open("large_payload1.json", "r") as fh:
        payload_str = fh.learn()   #uncooked JSON textual content

    We then create a benchmarking perform with two arguments. The primary argument is the perform that’s being examined. On this case, it’s the masses() perform. The second argument is the payload_str constructed from the file above.

    def benchmark_load(func, payload_str):
        begin = time.perf_counter()
        for _ in vary(3):
            func(payload_str)
        finish = time.perf_counter()
        return finish - begin

    We use the above perform to check for each serialization and deserialization speeds.

    Benchmarking Deserialization Velocity

    We load the three libraries being examined. We then run the perform benchmark_load() in opposition to the hundreds() perform of every of those libraries.

    import json, ujson, orjson, time
    
    outcomes = {
        "json.masses": benchmark_load(json.masses, payload_str),
        "ujson.masses": benchmark_load(ujson.masses, payload_str),
        "orjson.masses": benchmark_load(orjson.masses, payload_str),
    }
    
    for lib, t in outcomes.objects():
        print(f"{lib}: {t:.4f} seconds")

    As we will see, orjson has taken the least period of time for deserialization.

    Benchmarking Serialization Velocity

    Subsequent, we take a look at the serialization velocity of those libraries.

    import json
    import ujson
    import orjson
    import time
    
    
    outcomes = {
        "json.dumps": benchmark("json", json.dumps, payload_str),
        "ujson.dumps": benchmark("ujson", ujson.dumps, payload_str),
        "orjson.dumps": benchmark("orjson", orjson.dumps, payload_str),
    }
    
    for lib, t in outcomes.objects():
        print(f"{lib}: {t:.4f} seconds")

    On evaluating run occasions, we see that orjson takes the least period of time to serialize Python objects to a JSON object.

    Selecting the Finest JSON library on your Workflow

    A information to picking the optimum JSON library (Picture by the Writer)

    Clipboard & Workflow Hacks for JSON

    Let’s suppose that you simply’d prefer to view your JSON in a textual content editor reminiscent of Notepad++ or share a snippet (from a big payload) on Slack with a teammate. You’ll rapidly run into clipboard or textual content editor/IDE crashes. In such conditions, one may use Pyperclip or Tkinter. Pyperclip works properly for payloads inside 50 MB, whereas Tkinter works properly for medium-sized payloads. For giant payloads, you can write the JSON to a file to view the info.

    Conclusion

    JSON can appear easy, however the bigger the payload and the extra nesting, the extra these payloads can rapidly flip right into a efficiency bottleneck. This text aimed to focus on how every Python parsing library addresses this problem. Whereas deciding on JSON parsing libraries, velocity and throughput usually are not at all times the principle standards. It’s the workflow that determines whether or not throughput, reminiscence effectivity, or long-term scalability is required for parsing payloads. Briefly, JSON parsing shouldn’t be a one-size-fits-all strategy.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Introduction to Approximate Solution Methods for Reinforcement Learning

    April 24, 2026

    I Built an AI Pipeline for Kindle Highlights

    April 24, 2026

    How to Select Variables Robustly in a Scoring Model

    April 24, 2026

    Your Synthetic Data Passed Every Test and Still Broke Your Model

    April 23, 2026

    Using a Local LLM as a Zero-Shot Classifier

    April 23, 2026

    I Simulated an International Supply Chain and Let OpenClaw Monitor It

    April 23, 2026

    Comments are closed.

    Editors Picks

    Give Mom Warm Coffee All Year Long With This Ember Smart Mug Deal

    April 24, 2026

    AI startups are struggling to access Nvidia GPUs as Microsoft and other cloud providers divert supply to internal teams and large customers like OpenAI (The Information)

    April 24, 2026

    Instagram’s New ‘Instants’ App Could Let You Ditch the Edits and Just Be Real for 24 Hours

    April 24, 2026

    Introduction to Approximate Solution Methods for Reinforcement Learning

    April 24, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    The 17 Best Barefoot Shoes for Running or Walking (2024)

    November 4, 2024

    Sportradar and Bundesliga International announce live player markets ahead of new season

    August 21, 2025

    10 Best Meal Delivery Services, Tested By an Ex-Restaurant Critic

    August 7, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.