Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Hisense U7SG TV Review (2026): Better Design, Great Value
    • Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)
    • Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live
    • Dreaming in Cubes | Towards Data Science
    • Onda tiny house flips layout to fit three bedrooms and two bathrooms
    • Best Meta Glasses (2026): Ray-Ban, Oakley, AR
    • At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Dreaming in Cubes | Towards Data Science
    Artificial Intelligence

    Dreaming in Cubes | Towards Data Science

    Editor Times FeaturedBy Editor Times FeaturedApril 19, 2026No Comments10 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    that’s pricey to me (and to many others) as a result of it has, in a manner, watched me develop from an elementary faculty scholar, all the way in which to a (soon-to-be!) school graduate. An plain a part of the sport’s appeal is its infinite replayability derived from its world technology. In present editions of the sport, Minecraft makes use of a wide range of noise capabilities in conjunction to procedurally generate [1] its worlds within the type of chunks, that’s, 16×16×38416 occasions 16 occasions 384 blocks, in a manner that tends to (roughly) kind ‘pure’ wanting terrain, offering a lot of the immersion for the sport. 

    My aim with this challenge was to see if I might transfer past hard-coded noise and as a substitute educate a mannequin to ‘dream’ in voxels. By leveraging current developments in Vector Quantized Variational Autoencoders (VQ-VAE) and Transformers, I constructed a pipeline to generate 3D world slices that seize the structural essence of the sport’s landscapes. As a concrete output, I wished the flexibility to generate 44 chunks (organized in a 2×22 occasions 2 grid) that regarded like Minecraft’s terrain.

    As a aspect notice, this isn’t a wholly novel thought, specifically, ChunkGAN [2] gives an alternate method to handle the identical aim. 

    The Problem of 3D Generative Modeling

    In a video [3] from January 2026, Computerphile featured Lewis Stuart that highlighted the primary points with 3D technology and I might encourage readers to present it a watch nonetheless, to summarize the important thing factors, 3D technology is difficult as a result of good 3D datasets are onerous to search out or just don’t exist and including a dimension of freedom makes issues a lot more durable (take into account the traditional Three-body problem [4]). It needs to be famous that the video explicitly addresses diffusion fashions (which requires labelled information) although most of the considerations could be ported over to the overall thought of 3D technology. One other concern is solely scale; a 512×512512 occasions 512 picture (2182^{18} pixels) would virtually definitely be thought of low-resolution by fashionable requirements however a 3D mannequin on the similar constancy would require 2272^{27} voxels. Extra factors instantly implies increased compute necessities and may rapidly make such duties infeasible. 

    To beat the 3D information shortage talked about by Stuart, I turned to Minecraft, which, in my view, is the perfect supply of voxel information out there for terrain technology. Through the use of a script to teleport by means of a pre-generated world, I pressured the sport engine to load and render 1000’s of distinctive chunks. Utilizing a separate extraction script, I pulled these chunks instantly from the sport’s area recordsdata. This gave me a dataset with excessive semantic consistency; in contrast to a group of random 3D objects, these chunks signify a steady, flowing panorama the place the ‘logic’ of the terrain (how a river mattress dips or how a mountain peaks) is preserved over chunk boundaries.

    To bridge the hole between the complexity of 3D voxels and the constraints of recent {hardware}, I couldn’t merely feed uncooked chunks right into a mannequin and hope for the perfect. I wanted a strategy to condense the ‘noise’ of thousands and thousands of blocks right into a significant, compressed language. This lead me to the center of the challenge: a two-stage generative pipeline that first learns to ‘tokenize’ 3D house, after which learns to ‘communicate’ it.

    Information Preprocessing

    A key but non-obvious statement is that a good portion of Minecraft’s chunks are stuffed with ‘air’ blocks. It’s a non-trivial statement principally as a result of air isn’t technically a block, you’ll be able to’t place it or take away it as you’ll be able to with each different block within the recreation however quite, it’s the non-existence of a block at that time. In fashionable Minecraft, a lot of the vertical span is air and as such, as a substitute of contemplating full 384384 peak ranges, I restricted it to y∈[0,128]y in [0, 128]. These extra aware of Minecraft’s world technology would know that blocks have damaging yy-values, all the way in which to −64-64 and at this level, I need to apologize as a result of once I carried out this structure, this had solely slipped my thoughts. The mannequin I current on this article would work simply as effectively when you thought of a bigger vertical span however as a result of my unlucky oversight, the outcomes that I current will likely be from a restricted span of blocks. 

    On the notice of proscribing blocks, chunks have lots of blocks that don’t present up fairly often and don’t contribute to the overall form of the terrain however vital to keep up immersion for the participant. A minimum of for this challenge, I select to limit blocks to the highest 30 blocks that made up chunks by frequency. 

    Pruning the vocabulary, so to talk, is helpful however solely half the battle. As said earlier than, as a result of Minecraft worlds are primarily composed of ‘air’ and ‘stone,’ the dataset suffers from some fairly excessive class imbalance. To stop the mannequin from taking the ‘path of least resistance,’ that’s, merely predicting empty house to realize low loss, I carried out a Weighted Cross-Entropy loss. By scaling the loss primarily based on the inverse log-frequency of every block, I pressured the VQ-VAE to prioritize the structural ‘minorities’ like grass, water, and snow.

    weight[block]=1log⁡(1.1+likelihood[block])textual content{weight[block]} = frac{1}{log(1.1 + textual content{likelihood[block]})}

    In plain phrases: the rarer a block sort is within the dataset, the extra closely the mannequin is penalized for failing to foretell it, pushing the community to deal with a patch of snow or a river mattress as simply as vital because the huge expanses of stone and air that dominate most chunks.

    Structure Overview

    This mermaid sequenceDiagram [6] gives a chook’s eye view of the structure. 

    Uncooked Voxel Downside and Tokenizing 3D Area

    A naive method to constructing such an structure would contain studying and constructing chunks block by block. There’s a myriad of the reason why this is able to be unideal however an important downside is that it may possibly grow to be computationally infeasible in a short time with out actually offering semantic construction. Think about assembling a LEGO set with 1000’s of 1×11 occasions 1 bricks. Whereas doable, it could be manner too gradual and it wouldn’t actually have any structural integrity: items which can be adjoining horizontally wouldn’t be linked and also you’d primarily be constructing a set of disjoint towers. The best way LEGO addresses that is by having bigger blocks, like the enduring 2×42 occasions 4 brick, that take over house that will usually require a number of 1×11 occasions 1 items. As such, you refill house quicker and there’s extra structural integrity. 

    For the system, codewords are the 2×42 occasions 4 LEGO bricks. Utilizing a VQ-VAE (Vector Quantized Variational AutoEncoder), the aim is to construct a codebook, that’s, a set of structural signatures that it may possibly use to reconstruct full chunks. Consider buildings like a flat part of grass or a blob of diorite. In my implementation, I allowed a codebook with 512512 distinctive codes. 

    To implement this, I used 3D Convolutions. Whereas 2D convolutions are the bread and butter of picture processing, 3D convolutions enable the mannequin to be taught kernels that slide throughout the X, Y, and Z axes concurrently. That is very important for Minecraft, the place the connection between a block and the one beneath it (gravity/help) is simply as vital as its relationship to the one beside it.

    Additional Particulars

    Probably the most crucial part of this stage is the `VectorQuantizer`. This layer sits on the ‘bottleneck’ of the community, forcing steady neural indicators to snap to a set ‘vocabulary’ of 512 realized 3D shapes.

    Certainly one of my largest hurdles in VQ-VAE coaching is ‘lifeless’ embeddings, that’s, codewords that the encoder by no means chooses, which successfully waste the mannequin’s capability. To unravel this, I added a strategy to ‘reset’ lifeless codewords. If a codeword’s utilization drops too low, the mannequin forcefully re-initializes it by ‘stealing’ a vector from the present enter batch:

    Brick by Brick

    A various assortment of blocks is nice however they don’t imply a lot except they’re put collectively effectively. Due to this fact, to place these codewords to good use, I used a GPT. To be able to make this work, I took the latent grid produced by the VQ-VAE right into a set of tokens, primarily, the 3D world will get flattened right into a 1D language. Then, the GPT sees 8 chunks value of tokens to be taught the spatial grammar, so to talk, of Minecraft to realize the aforementioned semantic consistency. 

    To realize this, I used Informal Self-Consideration: 

    Lastly, throughout inference, the mannequin makes use of top-k sampling, together with some temperature to regulate erratic technology creativity within the following technology loop:

    By the top of this sequence, the GPT has ‘written’ a structural blueprint 256 tokens lengthy. The subsequent step is to cross these by means of the VQ-VAE decoder to manifest a 2×22 occasions 2 grid of recognizable Minecraft terrain.

    Outcomes

    On this render [6], the mannequin efficiently clusters leaf blocks, mimicking the sport’s tree buildings.

    On this one [6], the mannequin makes use of snow blocks to cap the stone and grass, reflecting the high-altitude or tundra slices discovered within the coaching information. Moreover, this render reveals that the mannequin realized generate caves. 

    On this picture [6], the mannequin locations water in a despair and borders it with sand, demonstrating that it has internalized the spatial logic of a shoreline, quite than scattering water blocks arbitrarily throughout the floor. 

    Maybe essentially the most spectacular result’s the interior construction of the chunks. As a result of the implementation used 3D convolutions and a weighted loss perform, the mannequin really generates subterranean options like contiguous caves, overhangs, and cliffs. 

    Whereas the outcomes are recognizable, they aren’t excellent clones of Minecraft. The VQ-VAE’s compression is ‘lossy,’ which typically leads to a slight ‘blurring’ of block boundaries or the occasional floating block. Nevertheless, for a mannequin working on a extremely compressed latent house, the flexibility to keep up structural integrity throughout a 2×22 occasions 2 chunk grid, I imagine, is a big success. 

    Reflections and Future Work

    Whereas the mannequin efficiently ‘goals’ in voxels, there may be important room for enlargement. Future iterations might revisit the complete vertical span of y∈[−64,320]y in [−64,320] to accommodate the huge jagged peaks and deep ‘cheese’ caves attribute of recent Minecraft variations. Moreover, scaling the codebook past 512 entries would enable the system to tokenize extra complicated, area of interest buildings like villages or desert temples. Maybe most enjoyable is the potential for conditional technology, or ‘biomerizing’ the GPT, which might allow customers to information the architectural course of with particular prompts corresponding to ‘Mountain’ or ‘Ocean,’ turning a random dream right into a directed artistic software.

    Thanks for studying! Should you’re within the full implementation or wish to experiment with the weights your self, be happy to take a look at the repository [5]. 

    Citations and Hyperlinks

    [1] Minecraft Wiki Editors, World technology (2026), https://minecraft.wiki/w/World_generation

    [2] x3voo, ChunkGAN (2024), https://github.com/x3voo/ChunkGAN

    [3] Lewis Stuart for Computerphile, Producing 3D Fashions with Diffusion – Computerphile (2026), https://www.youtube.com/watch?v=C1E500opYHA

    [4] Wikipedia Editors, Three-body Downside (2026), https://en.wikipedia.org/wiki/Three-body_problem

    [5] spaceybread, glowing-robot (2026), https://github.com/spaceybread/glowing-robot/tree/grasp

    [6] Picture by creator. 

    A Notice on the Dataset

    All coaching information was generated by the creator utilizing a regionally run occasion of Minecraft Java Version. Chunks have been extracted from procedurally generated world recordsdata utilizing a customized extraction script. No third-party datasets have been used. As the information was generated and extracted by the creator from their very own recreation occasion, no exterior licensing restrictions apply to its use on this analysis context.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    Hisense U7SG TV Review (2026): Better Design, Great Value

    April 19, 2026

    Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)

    April 19, 2026

    Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    How I Used ChatGPT to Land My Next Data Science Role

    October 6, 2025

    Today’s NYT Mini Crossword Answers for Nov. 22

    November 22, 2025

    GE Profile Smart Grind and Brew Review: Just the Basics

    March 23, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.