It’s not about audio and video anymore
about compression as an audio/video downside; in the present day, it’s about all-kind-of-data compression: genomes, level clouds, haptics, 3D scenes, neural networks, and machine options.
Each knowledge kind now has to undergo some type of compression, just because we’re producing an absurd quantity of bits in each sector, from leisure to drugs to autonomous automobiles.
In simply over 70 years for the reason that transistor was invented in 1947, we’ve unlocked unprecedented computing energy, wi-fi networks, the web, synthetic intelligence, cell units, excessive‑decision shows, and spectacular advances in genetics, drugs, and area exploration.
All of this rests on one substrate: digital knowledge.
As people, we love knowledge. Meals and water could feed our our bodies, however knowledge, as soon as remodeled into data, feeds our minds.
After we share it, we evolve as a species.
We innovate.
And we don’t appear to be slowing down.
Again in 2020, international knowledge created, captured, copied and consumed in a single yr was about 59 zettabytes (the equal of 59 trillion gigabytes), and projections put 2025 at around 175 ZB. One zettabyte is 8,000,000,000,000,000,000,000 bits. We’re good at producing knowledge, however the issue is the best way to switch it, retailer it, course of it… and belief it.
You’ve in all probability heard the road “knowledge is the brand new oil.” It isn’t. Oil is finite. Information shouldn’t be.
That’s precisely why compression is now a foundational know-how for the whole digital ecosystem.
The spine of the media world
ISO/IEC JTC 1/SC 29 shouldn’t be a model many individuals acknowledge, however its work underpins the whole digital media and leisure business. This subcommittee coordinates JPEG, which defines picture compression requirements, and the MPEG affiliated teams, which develop applied sciences for compressing and transporting video, audio, and different multimedia knowledge.
The requirements popping out of SC 29 cowl the total worth chain: content material creation, processing, and storage; broadcast distribution; streaming over IP; and consumption on all the pieces from smartphones to massive‑display screen TVs. What’s altering now could be that their scope is widening from “media for people” to “knowledge for people and machines” throughout photographs, video, 3D, AI, and past.
JPEG: From .jpg to AI, Belief, Plenoptic and DNA
For over 30 years, JPEG (.jpg) has been the default visible format of the net. However the committee has considerably expanded its portfolio.
JPEG AI: Latent tensors, not pixels
JPEG AI is the primary studying‑based mostly picture coding normal that makes use of AI and latent areas as an alternative of hand‑crafted transforms.
On the core of JPEG AI, the codec transforms a picture right into a latent tensor that’s then compressed and transmitted. The decoder reconstructs the picture from this latent illustration, however may function straight within the compressed area, enabling evaluation pipelines and pc imaginative and prescient duties with out absolutely decoding pixels.
A single compressed illustration serves each human viewing and machine evaluation.
JPEG AI additionally integrates the idea of “on-demand” complexity: the usual defines three decoding variants with totally different computational price, letting units select the perfect transformation for his or her {hardware} capabilities. This adaptive structure due to this fact permits for optimised use throughout a variety of units, from low-power terminals to high-performance platforms.
JPEG Belief: Authenticity in an AI‑artificial world
The online is being flooded with artificial photographs, movies, and even information, all generated in seconds by highly effective generative fashions. People and algorithms can not simply distinguish what’s actual from what’s artificial, and that straight impacts search rating, model belief, and person notion of authenticity.
JPEG Trust defines a framework for monitoring origin, authenticity, and possession of digital photographs, together with AI‑generated content material. That is important for correctly managing the dissemination and use of media that has been intentionally modified or created to govern public opinion (deepfakes), an goal outlined within the AI Act issued by the European Union.
JPEG Belief defines a framework for establishing belief in digital media.
It’s constructed upon and extends the Coalition for Content material Provenance and Authenticity (C2PA) engine, an answer to confirm authenticity on the net. It defines an ordinary solution to connect metadata to media that data its origin and any subsequent modifications.
Consider it as an embedded digital signature that travels together with your content material.
JPEG Pleno: Gentle fields, level clouds, holograms
JPEG Pleno is a global framework for representing and compressing plenoptic knowledge: mild fields, level clouds, and holograms. It goes past 2D photographs to seize the route and depth of sunshine in area, not simply colour and brightness.
That is essential for VR/AR, medical imaging, and cultural‑heritage functions, the place you want excessive compression, random entry, and interactivity on advanced volumetric content material.
JPEG Pleno standardises encoding instruments and file codecs in order that next-generation seize units, comparable to light-field cameras, LiDAR techniques and volumetric platforms, might be built-in into interoperable workflows somewhat than customised, remoted pipelines.
JPEG XS: Light-weight, low‑latency video
JPEG XS is totally different.
It is because it targets video, however with a really totally different focus than MPEG. As a substitute of pushing compression to the restrict, JPEG XS prioritizes extremely‑low latency and low complexity. In {many professional} workflows, it emerges as a sensible different to uncompressed video.
A light-weight compression ratio of round 4:1 might be sufficient to move a 4K/50p/4:2:2/10‑bit sign inside the bandwidth historically required for 1080p50 over 3G‑SDI. Meaning: similar cable, HD bandwidth, however 4K content material. This not solely saves bandwidth but in addition reduces power consumption each time the price of compression is decrease than the price of transmission.
This is among the causes the Tv Academy awarded a 2025 Know-how & Engineering Emmy to Fraunhofer and intoPIX for his or her work on JPEG XS.
JPEG DNA: Storing photographs in molecules
One of the crucial fascinating explorations is JPEG DNA, the place the storage medium shouldn’t be magnetic or optical, however organic. DNA is nature’s authentic knowledge retailer, with unimaginable longevity, orders of magnitude past any disk or tape.
The issue: our digital knowledge manufacturing is exponential, whereas standard storage media have restricted sturdiness and infrequently turn into unreadable inside a few many years (assume floppy disks and CDs). DNA, in contrast, can protect info for hundreds of years or longer beneath the appropriate circumstances.
JPEG DNA goals to outline how digital photographs might be encoded into DNA sequences in a means that’s each environment friendly and strong, whereas respecting biochemical constraints and dealing with excessive error charges in synthesis and sequencing.
It feels like science fiction, however it’s an try to think about what a ‘future-proof’ storage answer would possibly seem like once we begin pondering when it comes to centuries somewhat than years.
MPEG: Past bitrate, and towards AI‑native and energy-aware codecs
Why we nonetheless want new video codecs
The Shifting Image Specialists Group, aka MPEG, in over 35 years of actions has launched codecs that actually constructed the media business: MPEG‑2, MPEG‑4, AVC, HEVC, VVC, AAC, and extra.
The most recent video normal is known as VVC (Versatile Video Coding) and was revealed in 2020.
Why we’d like a brand new video codec?
Video continues to be essentially the most bandwidth‑hungry media we distribute in the present day; it floods IP networks, terrestrial and satellite tv for pc hyperlinks, and knowledge facilities.
Every new codec era introduced bitrate reductions that translated straight into decrease supply prices and wider attain (e.g., UHD for customers who couldn’t entry it beneath AVC/HEVC constraints).
That logic nonetheless holds, however it’s not the one driver.
Time is altering and bitrate-reduction-for-the-same-visual-quality, though essential, shouldn’t be the one motivation behind the creation of a brand new video codec. Subsequent-gen codecs will probably be evaluated not simply on compression effectivity but in addition latency, deployability, implementation price, and use‑case relevance .
MPEG Enhanced Compression Mannequin (ECM) venture has reached model 19, demonstrating roughly 27% bitrate financial savings over VVC in random-access configurations. ECM is a probable basis for the long run H.267 codec, which goals for a ~40% bitrate discount relative to VVC (H.266).
The codec is designed for various functions, together with cell streaming, reside broadcasting, immersive VR/AR, cloud gaming, and AI-generated content material. It targets environment friendly real-time decoding and scalable encoder complexity, supporting resolutions as much as 8Kx4K and body charges as much as 240 fps. It helps stereoscopic 3D, multi-view content material, huge colour gamut, and excessive dynamic vary.
The H.267 normal is presently projected to be finalized in 2028, with significant deployment possible not occurring till round 2034–2036.
Curiously, the codec cycle has shortened: roughly 10 years between AVC and HEVC, about 7 between HEVC and VVC, and an anticipated ~8 between VVC and H.267. One cause is that MPEG is not the one “rooster within the henhouse”: AOMedia is advancing AV1 and AV2, AVS3 is rising in China, and proprietary or area of interest codecs are rising in parallel.
The proposed timeline is aimed toward sustaining an applicable cycle such that new normal stays aggressive within the market when it comes to offering superior compression efficiency and fulfilling industry’s needs.
Name of proposals anticipated for July 2026.
Between mid-2026 and early 2027, a collection of subjective video high quality assessments will probably be performed, masking varied classes of content material (SDR, HDR, video games and user-generated content material). Impartial laboratories are invited to take part (deadline prolonged to fifteen April 2026), topic to strict technical, organisational and conflict-of-interest necessities; the outcomes will contribute to the analysis of future video coding standards.
Vitality effectivity and inexperienced metadata
As codecs develop extra advanced, encoding turns into extra power‑intensive. That’s more and more unacceptable in a world the place sustainability is a board degree precedence.
ISO/IEC 23001‑11 (Inexperienced Metadata) addresses power environment friendly media consumption by defining metadata that permits units and shows to scale back energy utilization, for instance by adapting backlight levels to content characteristics.
Quite than treating power as an afterthought, the ecosystem is beginning to deal with “joules per bit” as severely as bits per pixel.
“Vitality effectivity” is turning into a proper choice criterion for codecs, alongside BD‑Price, particularly in cell and huge‑scale streaming contexts.
AI in video coding: Hybrid, tremendous‑decision, finish‑to‑finish
What about AI utilized to video coding?
MPEG continues its work to maneuver past the constraints of the normal 2D rework plus movement‑compensation framework.
AIs have turn into more and more important within the 2020s and can doubtless affect many points of our lives. Nonetheless, their affect on the close to time period evolution of communication applied sciences continues to be unsure. Within the context of video coding, you will need to acknowledge that extensively deployed, mass market video techniques have to be sensible, strong, power environment friendly, and value efficient, whereas nonetheless delivering state‑of‑the‑artwork compression efficiency.
Any AI-based method should due to this fact tackle the whole video processing chain, from pre‑processing and encoding to storage, transmission, decoding, publish‑processing, evaluation, and content material repurposing. Furthermore, these options should assist excessive decision, excessive body charges, and excessive dynamic vary, all whereas working in actual time.
Because of this JVET evaluates neural instruments beneath a number of working factors: VLOP (Very Low Complexity), LOP (Low Complexity), and HOP (Excessive Complexity), explicitly balancing coding beneficial properties with computational funds.
The JVET group is exploring Neural Community Video Coding (NNVC) alongside three most important instructions:
- Hybrid codecs with neural instruments
Neural elements are added into the normal rework + movement‑compensation framework, changing or augmenting present instruments. Examples embody:
- Deep Reference Body (DRF inter), which reinforces reference frames for movement compensation… at the price of greater decoder complexity.
- Cross Element Convolutional Fashions (CCCM), which enhance chroma prediction and denoising by studying cross‑element construction, providing a good commerce‑off.
- Neural tremendous‑decision and publish‑filters
The codec stays standard (e.g., VVC), however the pipeline is modified:
- Enter decision is decreased earlier than encoding, leading to a a lot smaller bitstream. If the pre‑processing stage downsamples the enter by an element of two in each the horizontal and vertical instructions, the quantity of knowledge getting into the encoder is straight away decreased by an element of 4.
- After decoding, neural tremendous‑decision upsamples the video again to the goal decision. The hot button is to recuperate perceived high quality with tremendous‑decision.
- Neural publish‑filters (NNPF) function after decoding to boost high quality whereas preserving bitstream conformance.
- Finish‑to‑finish neural codecs
Right here, the whole pipeline, evaluation rework, entropy mannequin, synthesis rework, is discovered as a single community. In MPEG it’s thought-about end-to- finish additionally neural tremendous decision, discovered intra coding and the DRF inter (Deep Reference Body)
NNVC is in model 15 of its algorithm and software program specification and reviews BD‑fee reductions within the ~6–14% vary versus VVC beneath Y‑PSNR, with greater beneficial properties at excessive‑complexity working factors [6% NN-Intra & VLOP filter (2 tools); 14% NN-Intra & HOP filter (2 tools)].
The draw back is decoder complexity: from an order of magnitude greater than VVC at very low complexity as much as two orders of magnitude greater in essentially the most aggressive modes, which is difficult for cell units [think of 14x (VLOP) to 118x (HOP) that of the VTM anchor].
The 2026–2027 roadmap is shaping as much as be notably compelling. From a technical perspective, there’s a robust and rising emphasis on decreasing computational complexity and power consumption. Complexity reporting has turn into a standardized and integral a part of the event course of, not afterthoughts. Strategies comparable to weight pruning, decreased receptive fields, data distillation, and absolutely integer‑solely inference are more and more seen as baseline necessities somewhat than non-obligatory optimizations.
At this stage, true differentiation is predicted to come back from deep, kernel‑degree optimizations, notably concentrating on SIMD architectures, the place a single instruction operates on a number of knowledge parts in parallel, and NPU backends, that are specialised processors designed to effectively speed up AI and machine‑studying workloads.
Finally, success will belong to those that can ship the best efficiency in actual deployments, in different phrases, those that can ship the quickest, most effective options.
Reproducibility is one other main theme.
There’s a clear shift towards bit‑precise inference and the institution of mannequin registries, official repositories the place the committee hosts the precise neural community fashions used for experiments and cross‑checks. These registries seize not solely the mannequin structure and weights, but in addition versioning, coaching recipes, and related metadata.
The purpose is to make sure that each submission is absolutely reusable, auditable, and verifiable by others. The ecosystem is more and more embracing a “belief, however confirm” philosophy, and the tooling is evolving accordingly to assist clear validation and lengthy‑time period reproducibility.
We’re additionally seeing early consolidation round particular NN instruments. For mainstream profiles, Neural Community Loop Filter (NNLF), each LOP and VLOP, and Cross Element Convolutional Fashions (CCCM) are trying just like the early winners. In the meantime, DRF inter appears poised to seem extra usually in greater‑tier encoders and decoders, particularly in environments the place NPUs can be found.
The query is not “Do neural instruments assist?” however “How a lot acquire can we preserve whereas assembly decoder energy and latency budgets?”
The following two JVET cycles will put these concepts to the check, because the neighborhood defines a path past VVC. The teachings discovered from NNVC are anticipated to play a significant position in shaping future check circumstances and in setting expectations for permissible complexity, serving to to ascertain a sensible and effectively‑grounded baseline for subsequent‑era video coding applied sciences.
Video for Machines: VCM and FCM
Most individuals nonetheless consider video compression as one thing finished “for people” to observe. However in the present day, a big share of visible knowledge, particularly from cameras, is consumed by machines: autonomous automobiles, drones, industrial robots, good‑metropolis sensors, and surveillance techniques.
But, nearly all of these techniques nonetheless stream pixel‑based mostly video compressed with human‑centric codecs. This wastes bandwidth, doesn’t scale effectively, and exposes uncooked visible content material, together with faces and delicate scenes, to 3rd‑occasion servers.
MPEG‑AI (ISO/IEC 23888) responds to this with a household of requirements designed for machine‑to‑machine (M2M) communication, with two key pillars: Video coding for machines and Feature coding for machines.
Video coding for machines (VCM)
VCM reorganizes the classical video coding pipeline round machine process efficiency somewhat than human visible high quality. As a substitute of optimizing PSNR or SSIM, VCM optimizes object detection, monitoring, segmentation, and related duties in eventualities like good cities and autonomous driving.
VCM represents an essential step towards machine centric video coding.
To attain this, VCM departs from conventional sign‑centric approaches in a number of key methods. It applies temporal resampling, dropping frames that present no extra info for the goal process. Quite than transmitting full decision frames, VCM adaptively downsamples spatial decision based mostly on process relevance. As well as, it safely reduces luma and chroma precision by discarding least‑important bits that don’t affect machine inference efficiency.
Importantly, VCM nonetheless wraps round an ordinary H.26x codec (AVC/HEVC/VVC) however surrounds it with process conscious pre and postprocessing. The disadvantage is that it nonetheless transmits recognizable frames, which raises privateness issues.
VCM reached DIS standing (Draft Worldwide Commonplace).
Characteristic coding for machines (FCM)
FCM addresses that privateness and bandwidth downside by compressing intermediate neural options as an alternative of pixels.
Immediately, most machine‑to‑machine techniques depend on distant inference, the place edge units ship full video frames to the cloud for processing. This method is problematic as a result of pixel video knowledge is bandwidth‑intensive, and uncooked photographs usually comprise delicate info comparable to faces, places, and contextual scenes that ought to not essentially be uncovered to 3rd occasion servers.
On the similar time, fashionable edge units more and more embody NPUs able to executing elements of a neural community domestically, even when they can not run full deep fashions finish‑to‑finish. This creates a possibility to separate the mannequin: execute the early layers on the system and transmit solely the compressed intermediate options, considerably decreasing knowledge switch whereas preserving process efficiency.
The thought is collaborative intelligence.
Meaning run the primary a part of a neural community on the sting system, then transmit solely the intermediate function tensor to the cloud, which finishes the inference.
To attain this, FCM reduces the dimensionality of the intermediate function tensors, eliminating pointless levels of freedom. It prunes redundant function channels that don’t contribute significant info to the duty, and it quantizes numerical precision, mapping 32‑bit floating‑level values to 10‑bit (or related) integer representations to considerably cut back knowledge dimension and computational price with out compromising process efficiency.
Bandwidth financial savings might be huge, as much as about 97% discount in some eventualities, whereas options protect semantics however not identities. Even when intercepted, they don’t straight reveal photographs. In apply, utilizing HEVC as an internal codec for these options can carry out practically in addition to VVC, making deployment extra versatile.
FCM is now at Working Draft (WD) progressing towards Committee Draft (CD).
Compression of neural networks for multimedia content material description and evaluation
The irony of AI is that the fashions we use to compress and interpret knowledge are themselves big blobs of knowledge. Transport a state‑of‑the‑artwork deep mannequin over a community, or deploy it on a tool with tight reminiscence, is pricey in each bandwidth and storage.
To know the necessity for compression, contemplate a digicam that adjusts its computerized mode based mostly on scene or object recognition carried out by a skilled neural community. This can be a quickly evolving space, and it is not uncommon for newer, higher skilled fashions to turn into obtainable over time.
Nonetheless, growing this “intelligence” is time and labor intensive, so as soon as a mannequin is prepared it’s sometimes deployed from a central location to thousands and thousands of person units. With fashionable neural networks now reaching lots of of megabytes in dimension, this creates a scalability downside. A situation through which thousands and thousands of units concurrently obtain the most recent mannequin with enhanced options would place a major and doubtlessly unsustainable load on the community.
Whereas less complicated deployments contain coaching a neural community as soon as, transferring it to the system, and utilizing it domestically for inference, rising paradigms comparable to federated studying require steady, bidirectional communication amongst massive numbers of units and central servers. In these eventualities, environment friendly compression and communication mechanisms turn into important.
Neural Community Coding (NNC) involves the rescue with a standardized, environment friendly, modular solution to shrink neural networks dramatically with out hurting their accuracy.
Due to this fact, the NNC normal is designed to realize very excessive compression effectivity for deep neural networks by combining a number of complementary methods. These embody pre‑processing strategies for knowledge discount, comparable to sparsification (e.g., setting chosen weights to zero to make tensors extra compressible) and structural pruning, the place whole neurons or filters that contribute little to efficiency are eliminated. These steps are adopted by quantization and context‑adaptive arithmetic binary coding, particularly DeepCABAC, to effectively encode the remaining info
The outcome: neural networks might be compressed by as much as about 97% while keeping their accuracy.
This know-how seems to be a kind of Lego brick that may be ‘mounted’ into different MPEG requirements. This modularity explains why the group is advancing work on NNC as an internal codec for FCM, in addition to on the appliance of NNC to spherical harmonics (SH) coefficients inside Gaussian Splat Coding (GSC).
Gaussian Splatting and Level Clouds
MPEG continues to discover Gaussian Splat Coding (GSC), which addresses the compression of Gaussian Splat (GS) representations used for 3D scene seize and rendering.
3D Gaussian Splatting has considerably modified how actual world scenes are captured and visualized. Not like conventional photogrammetry, which produces mesh based mostly fashions, or neural radiance fields (NeRFs), which depend on computationally costly ray tracing, Gaussian splatting represents a scene as thousands and thousands of fuzzy ellipsoids (splats) that may be rendered effectively. New viewpoints are generated just by drawing these splats from the specified perspective.
The result’s photorealistic rendering at actual time-frame charges on shopper {hardware}, making Gaussian splatting notably effectively fitted to digital and augmented actuality, immersive video, interactive net experiences, and sport improvement.
Gaussian splat knowledge essentially consists of collections of factors in 3D area, every related to attributes comparable to place, orientation, scale, opacity, and colour coefficients. Figuring out the best solution to compress and transport this knowledge stays an energetic space of analysis.
Compressed Gaussian splat representations may doubtlessly be carried over present video infrastructure, leveraging acquainted encoding and supply pipelines, or alternatively be dealt with inside geometry‑based mostly level cloud frameworks. Totally different industries are inclined to favor totally different approaches, largely relying on the infrastructures they already function.
On the purpose cloud facet, the G‑PCC household of requirements has continued to broaden to deal with a wider vary of use circumstances. This now contains E‑G‑PCC, which introduces enhanced temporal prediction to enhance the compression of dynamic and time‑evolving level clouds; GeS‑PCC, concentrating on dense, strong objects and floor‑like constructions that behave extra like steady manifolds; and L3C2, a low‑latency level‑cloud codec designed particularly for spinning LiDAR sensors, enabling actual‑time processing for functions comparable to autonomous driving and robotics.
Audio: Immersion, personalization, and dialogue readability
Video often will get the highlight, however immersive experiences fail with out nice audio. MPEG‑H Audio is gaining momentum exactly as a result of it strikes from channel‑based mostly to object‑based mostly approaches.
Object‑based mostly audio permits customers to personalize their combine, elevating commentary, reducing crowd noise, or switching between totally different views supplied by broadcasters. Specialists are engaged on full six levels‑of‑freedom (6DoF) audio, the place customers can localize sound sources in 3D area, understand loudness adjustments as they transfer, and expertise real looking reverberation and occlusion (when a bodily object is interposed between a sound supply and a person).
One notably impactful function is MPEG‑H Dialog+, which separates speech from the remainder of the soundtrack and permits selective enhancement of dialogue. This can be a sensible answer for folks with listening to difficulties and for anybody who merely needs to know speech in a combination with out sacrificing music and results high quality. The outcome: constantly enhanced dialogues whereas sustaining the excessive sound high quality of the background music.
Conclusion
A transparent sample comes into focus: compression is not merely a method for decreasing file sizes however has turn into a unifying precept that guides the design, scalability and reliability of digital techniques.
We’re compressing all the pieces: from photographs and audio to genomes, neural networks and high-dimensional world fashions. In the midst of this course of, the main target is shifting from human-centred representations to machine-centred ones, the place semantics matter greater than pixels and which means takes priority over uncooked constancy. On the similar time, belief, authenticity, power consumption and computational complexity are transferring to the foreground, embedded straight into the material of our media and knowledge pipelines.
On this broader view, compression capabilities like an “working system” for the worldwide datasphere: hardly ever seen when it really works, however foundational to all the pieces constructed on high of it. It defines what we will retailer, what we will transmit, how effectively we will be taught and iterate, and more and more, what we will consider.

