Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds
    • New radio bursts detected from binary stars
    • Remarkable, Catalysr and Indigenous pre-accelerators score NSW government support for diverse founders
    • Whoop Promo Codes May 2026: 20% Off | June 2026
    • Hawthorne bankruptcy dispute targets Illinois racing funds
    • Today’s NYT Connections: Sports Edition Hints, Answers for June 2 #617
    • Encore ROG 12RK-FB teardrop camper with pop-up wet bathroom tent
    • Munich-based encosa raises €25 million to bring battery storage to German SMEs
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker
    Artificial Intelligence

    How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker

    Editor Times FeaturedBy Editor Times FeaturedSeptember 2, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Dataset preparation for an object detection coaching workflow can take a very long time and sometimes be irritating. Label Studio, an open-source knowledge annotation software, can help by offering a straightforward strategy to annotate datasets. It helps all kinds of annotation templates, together with laptop imaginative and prescient, pure language processing, and audio or speech processing. Nevertheless, we’ll focus particularly on the item detection workflow.

    However what if you wish to benefit from pre-annotated open-source datasets, such because the Pascal VOC dataset? On this article, I’ll present you find out how to simply import these duties into Label Studio’s format whereas organising all the stack — together with a PostgreSQL database, MinIO object storage, an Nginx reverse proxy, and the Label Studio backend. MinIO is an S3-compatible object storage service: you would possibly use cloud-native storage in manufacturing, however you may as well run it domestically for growth and testing.

    On this tutorial, we’ll undergo the next steps:

    1. Convert Pascal VOC annotations – rework bounding containers from XML into Label Studio duties in JSON format.
    2. Run the complete stack – begin Label Studio with PostgreSQL, MinIO, Nginx, and the backend utilizing Docker Compose.
    3. Arrange a Label Studio challenge – configure a brand new challenge contained in the Label Studio interface.
    4. Add pictures and duties to MinIO – retailer your dataset in an S3-compatible bucket.
    5. Join MinIO to Label Studio – add the cloud storage bucket to your challenge so Label Studio can fetch pictures and annotations instantly.

    Stipulations

    To observe this tutorial, ensure you have:

    From VOC to Label Studio: Getting ready Annotations

    The Pascal VOC dataset has a folder construction the place the practice and check datasets are already break up. The Annotations folder incorporates the annotation recordsdata for every picture. In whole, the coaching set contains 17,125 pictures, every with a corresponding annotation file.

    .
    └── VOC2012
        ├── Annotations  # 17125 annotations
        ├── ImageSets 
        │   ├── Motion
        │   ├── Format
        │   ├── Most important
        │   └── Segmentation
        ├── JPEGImages  # 17125 pictures
        ├── SegmentationClass
        └── SegmentationObject

    The XML snippet beneath, taken from one of many annotations, defines a bounding field round an object labeled “individual”. The field is specified utilizing 4 pixel coordinates: xmin, ymin, xmax, and ymax.

    XML snippet from the Pascal VOC dataset (Picture by Creator)

    The illustration beneath exhibits the interior rectangle because the annotated bounding field, outlined by the top-left nook (xmin, ymin) and the bottom-right nook (xmax, ymax), inside the outer rectangle representing the picture.

    Pascal VOC bounding field coordinates in pixel format (Picture by Creator)

    Label Studio expects every bounding field to be outlined by its width, top, and top-left nook, expressed as percentages of the picture dimension. Under is a working instance of the transformed JSON format for the annotation proven above.

    {
      "knowledge": {
        "picture": "s3:////2007_000027.jpg"
      },
      "annotations": [
        {
          "result": [
            {
              "from_name": "label",
              "to_name": "image",
              "type": "rectanglelabels",
              "value": {
                "x": 35.802,
                "y": 20.20,
                "width": 36.01,
                "height": 50.0,
                "rectanglelabels": ["person"]
              }
            }
          ]
        }
      ]
    }

    As you possibly can see within the JSON format, you additionally must specify the placement of the picture file — for instance, a path in MinIO or an S3 bucket should you’re utilizing cloud storage.

    Whereas preprocessing the info, I merged all the dataset, regardless that it was already divided into coaching and validation. This simulates a real-world situation the place you usually start with a single dataset and carry out the splitting into coaching and validation units your self earlier than coaching.

    Working the Full Stack with Docker Compose

    I merged the docker-compose.yml and docker-compose.minio.yml recordsdata right into a simplified single configuration so all the stack can run on the identical community. Each recordsdata had been taken from the official Label Studio GitHub repository.

    
    
    companies:
      nginx:
        # Acts as a reverse proxy for Label Studio frontend/backend
        picture: heartexlabs/label-studio:newest
        restart: unless-stopped
        ports:
          - "8080:8085" 
          - "8081:8086"
        depends_on:
          - app
        setting:
          - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
        
        volumes:
          - ./mydata:/label-studio/knowledge:rw # Shops Label Studio tasks, configs, and uploaded recordsdata
        command: nginx
    
      app:
        stdin_open: true
        tty: true
        picture: heartexlabs/label-studio:newest
        restart: unless-stopped
        expose:
          - "8000"
        depends_on:
          - db
        setting:
          - DJANGO_DB=default
          - POSTGRE_NAME=postgres
          - POSTGRE_USER=postgres
          - POSTGRE_PASSWORD=
          - POSTGRE_PORT=5432
          - POSTGRE_HOST=db
          - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
          - JSON_LOG=1
        volumes:
          - ./mydata:/label-studio/knowledge:rw  # Shops Label Studio tasks, configs, and uploaded recordsdata
        command: label-studio-uwsgi
    
      db:
        picture: pgautoupgrade/pgautoupgrade:13-alpine
        hostname: db
        restart: unless-stopped
        setting:
          - POSTGRES_HOST_AUTH_METHOD=belief
          - POSTGRES_USER=postgres
        volumes:
          - ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge  # Persistent storage for PostgreSQL database
      minio:
        picture: "minio/minio:${MINIO_VERSION:-RELEASE.2025-04-22T22-12-26Z}"
        command: server /knowledge --console-address ":9009"
        restart: unless-stopped
        ports:
          - "9000:9000"
          - "9009:9009"
        volumes:
          - minio-data:/knowledge   # Shops uploaded dataset objects (like pictures or JSON duties)
        # configure env vars in .env file or your methods setting
        setting:
          - MINIO_ROOT_USER=${MINIO_ROOT_USER:-minio_admin_do_not_use_in_production}
          - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minio_admin_do_not_use_in_production}
          - MINIO_PROMETHEUS_URL=${MINIO_PROMETHEUS_URL:-http://prometheus:9090}
          - MINIO_PROMETHEUS_AUTH_TYPE=${MINIO_PROMETHEUS_AUTH_TYPE:-public}
     
    volumes:
      minio-data: # Named quantity for MinIO object storage

    This simplified Docker Compose file defines 4 core companies with their quantity mappings:

    App – runs the Label Studio backend itself.

    • Shares the mydata listing with Nginx, which shops tasks, configurations, and uploaded recordsdata.
    • Makes use of a bind mount: ./mydata:/label-studio/knowledge:rw → maps a folder out of your host into the container.

    Nginx – acts as a reverse proxy for the Label Studio frontend and backend.

    • Shares the mydata listing with the App service.

    PostgreSQL (db) – manages metadata and challenge info.

    • Shops persistent database recordsdata.
    • Makes use of a bind mount: ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge.

    MinIO – an S3-compatible object storage service.

    • Shops dataset objects akin to pictures or JSON annotation duties.
    • Makes use of a named quantity: minio-data:/knowledge.

    Whenever you mount host folders akin to ./mydata and ./postgres-data, it’s good to assign possession on the host to the identical consumer that runs contained in the container. Label Studio doesn’t run as root — it makes use of a non-root consumer with UID 1001. If the host directories are owned by a distinct consumer, the container received’t have write entry and also you’ll run into permission denied errors.

    After creating these folders in your challenge listing, you possibly can regulate their possession with:

    mkdir mydata 
    mkdir postgres-data
    sudo chown -R 1001:1001 ./mydata ./postgres-data

    Now that the directories are ready, we will convey up the stack utilizing Docker Compose. Merely run:

    docker compose up -d

    It could take a couple of minutes to tug all of the required pictures from Docker Hub and arrange Label Studio. As soon as the setup is full, open http://localhost:8080 in your browser to entry the Label Studio interface. You might want to create a brand new account, after which you possibly can log in together with your credentials to entry the interface. You possibly can allow a legacy API token by going to Group → API Token Settings. This token allows you to talk with the Label Studio API, which is particularly helpful for automation duties.

    Arrange a Label Studio challenge

    Now we will create our first knowledge annotation challenge on Label Studio, particularly for an object detection workflow. However earlier than beginning to annotate your pictures, it’s good to outline the sorts of courses to select from. Within the Pascal VOC dataset, there are 20 sorts of pre-annotated objects.

    XML-style labeling setup (Picture by Creator)

    Add pictures and duties to MinIO

    You possibly can open the MinIO consumer interface in your browser at localhost:9000, after which log in utilizing the credentials you specified beneath the related service within the docker-compose.yml file.

    I created a bucket with folders, considered one of which is used for storing pictures and one other for JSON duties formatted based on the directions above.

    Screenshot of an instance bucket in MinIO (Picture by Creator)

    We arrange an S3-like service domestically that enables us to simulate S3 cloud storage with out incurring any prices. If you wish to switch recordsdata to an S3 bucket on AWS, it’s higher to do that instantly over the web, contemplating the info switch prices. The excellent news is you can additionally work together together with your MinIO bucket utilizing the AWS CLI. To do that, it’s good to add a profile in ~/.aws/config and supply the corresponding credentials in ~/.aws/credentials beneath the identical profile identify.

    After which, you possibly can simply sync together with your native folder utilizing the next instructions:

    #!/bin/bash
    set -e
    
    PROFILE=
    MINIO_ENDPOINT=   # e.g. http://localhost:9000
    BUCKET_NAME=
    SOURCE_DIR=    
    DEST_DIR= 
    
    aws s3 sync 
          --endpoint-url "$MINIO_ENDPOINT" 
          --no-verify-ssl 
          --profile "$PROFILE" 
          "$SOURCE_DIR" "s3://$BUCKET_NAME/$DEST_DIR"
    
     
    

    Join MinIO to Label Studio

    In any case the info, together with the photographs and annotations, has been uploaded, we will transfer on to including cloud storage to the challenge we created within the earlier step.

    Out of your challenge settings, go to Cloud Storage and add the required parameters, such because the endpoint (which factors to the service identify within the Docker stack together with the port quantity, e.g., minio:9000), the bucket identify, and the related prefix the place the annotation recordsdata are saved. Every path contained in the JSON recordsdata will then level to the corresponding picture.

    Screenshot of the Cloud Storage settings (Picture by Creator)

    After verifying that the connection is working, you possibly can sync your challenge with the cloud storage. You might must run the sync command a number of instances for the reason that dataset incorporates 22,263 pictures. It could seem to fail at first, however once you restart the sync, it continues to make progress. Finally, all of the Pascal VOC knowledge will likely be efficiently imported into Label Studio.

    Screenshot of the duty checklist (Picture by Creator)

    You possibly can see the imported duties with their thumbnail pictures within the process checklist. Whenever you click on on a process, the picture will seem with its pre-annotations.

    Screenshot of a picture with bounding containers (Picture by Creator)

    Conclusions

    On this tutorial, we demonstrated find out how to import the Pascal VOC dataset into Label Studio by changing XML annotations into Label Studio’s JSON format, operating a full stack with Docker Compose, and connecting MinIO as S3-compatible storage. This setup lets you work with large-scale, pre-annotated datasets in a reproducible and cost-effective approach, all in your native machine. Testing your challenge settings and file codecs domestically first will guarantee a smoother transition when transferring to cloud environments.

    I hope this tutorial helps you kickstart your knowledge annotation challenge with pre-annotated knowledge you can simply broaden or validate. As soon as your dataset is prepared for coaching, you possibly can export all of the duties in widespread codecs akin to COCO or YOLO.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Escaping the Valley of Choice in BI

    June 2, 2026

    Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

    June 1, 2026

    RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

    June 1, 2026

    How to Combine Claude Code and Codex for Maximum Coding Power

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Comments are closed.

    Editors Picks

    Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds

    June 2, 2026

    New radio bursts detected from binary stars

    June 2, 2026

    Remarkable, Catalysr and Indigenous pre-accelerators score NSW government support for diverse founders

    June 2, 2026

    Whoop Promo Codes May 2026: 20% Off | June 2026

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    From Possible to Probable AI Models

    May 20, 2026

    Adaptive Skiing: Where to Go Around the US

    February 1, 2025

    Profitable new reactor turns plastic waste into hydrogen fuel

    April 11, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.