Future Directions in Video Coding

What happens when we move beyond traditional block-based hybrid codecs? The future of video compression lies at the intersection of machine learning, perceptual understanding, and immersive media formats.

TLDR:

Explore emerging video compression technologies including neural network-based approaches, generative AI compression, perceptual coding innovations, immersive formats, and evolving adaptive streaming standards.

Neural Network Compression

Autoencoder-based architectures learn optimal representations directly from data, potentially outperforming hand-designed transforms.

End-to-End Learned Compression

Neural networks can replace traditional transform, quantization, and entropy coding stages with learned operations optimized jointly for rate-distortion performance.

Key advantages include adaptive basis functions, joint optimization of all compression stages, and better modeling of complex statistical dependencies in natural video.

Challenges

High computational complexity, lack of hardware acceleration, difficulty in standardization, and ensuring compatibility with existing infrastructure remain significant hurdles.

Generative Compression

Instead of transmitting every pixel, send minimal descriptors and let AI reconstruct the scene using generative models.

This approach shifts from predictive coding to generative modeling: transmit semantic information (pose, lighting, expression) and let decoder-generated priors fill in visual details.

Latent Space Representation

Compress to extremely low-dimensional latent spaces that capture essential perceptual information, then use powerful generative decoders (GANs, diffusion models) for reconstruction.

Early research shows potential for orders-of-magnitude bitrate reduction for talking-head videos and similar constrained content, though general-purpose applications remain challenging.

Perceptual Coding Advances

Modern codecs increasingly incorporate sophisticated models of human visual perception to optimize bit allocation where it matters most.

Beyond basic luminance masking and contrast sensitivity, newer approaches include:

Foveated encoding for VR/AR applications
Attention-guided bit allocation
Perceptual loss functions in training
Content-adaptive quantization based on semantic understanding

Foveated Video Coding

Exploit the non-uniform resolution of human vision by allocating fewer bits to peripheral vision regions, particularly valuable for head-mounted displays.

Immersive Video Formats

New video experiences demand new compression approaches tailored to their unique characteristics.

360° Video

Equirectangular projection creates sampling inefficiencies (polar oversampling). Solutions include:

Sphere-adaptive projections (cube maps, octahedron)
Viewport-adaptive streaming
Rotation-invariant encoding

Volumetric Video

Represent scenes as 3D point clouds or meshes with associated textures, requiring geometry and attribute compression techniques.

Light Field Video

Capture and compress multi-view, multi-directional light information enabling post-capture refocusing and perspective shifts.

Storage Requirements

Immersive formats can require 10-100x more data than conventional video, making efficient compression essential for practical adoption.

ML in-loop Filtering

Machine learning enhances traditional deblocking and deringing filters with content-adaptive, learned approaches.

Instead of fixed filters, neural networks learn optimal filtering strategies based on local content characteristics, blocking artifacts, and quantization parameters.

Examples

AV1's Film Grain Synthesis and VVC's ALF (Adaptive Loop Filter) represent steps toward fully learned in-loop processing that adapts to video content in real-time.

Adaptive Streaming Evolution

Delivery technologies continue to evolve to meet diverse network conditions and device capabilities.

CMAF (Common Media Application Format)

Unifies HLS and DASH fragmented MP4 formats to reduce storage and CDN costs while enabling low-latency streaming.

LL-HLS (Low Latency HLS)

Reduces glass-to-glass latency to 2-3 seconds through partial segment delivery and HTTP/2 server push.

Per-Title Encoding

Customize encoding ladder (bitrate/resolution pairs) for each content title based on its complexity, rather than using a one-size-fits-all approach.

Benefits

Per-title encoding typically saves 10-25% bitrate at equivalent quality compared to fixed encoding ladders, translating to significant CDN cost reductions.

Cloud Gaming Latency Requirements

Interactive streaming imposes extremely strict latency constraints that challenge traditional video coding approaches.

Cloud gaming requires end-to-end latencies under 100ms for responsive gameplay, necessitating:

Ultra-low latency encoding (<30ms)
Minimal buffering
Simplified entropy coding
Hardware-accelerated implementations

Trade-offs

Cloud gaming codecs often sacrifice compression efficiency for speed, using higher quantizers and simpler modes to meet latency budgets.

Sustainable Codecs

With video representing approximately 80% of global internet traffic, codec efficiency has direct environmental impact.

More efficient codecs reduce:

Data center energy consumption
Network transmission power
Device decoding energy
Overall carbon footprint of video delivery

Energy-Aware Encoding

Modern encoders can optimize for energy consumption alongside quality and bitrate, selecting coding tools that minimize decoder complexity on target devices.

The push for greener codecs drives innovation in both compression efficiency and computational simplicity, benefiting both the environment and end-users.

Interactive Demonstrations

Mark Lesson Complete

I've reviewed all concepts in Lesson 6