Video Codecs: From Why to Wow
Understanding how video compression works, from fundamentals to future innovations
Video codecs compress video by removing redundancy and irrelevancy, using techniques like temporal prediction, transform coding, and entropy encoding. Modern codecs balance compression efficiency, computational complexity, and licensing considerations. The landscape is evolving rapidly with royalty-free alternatives challenging established standards.
What Is Video Compression, Really?
Raw video is enormous. A single minute of uncompressed 1080p video at 30 frames per second requires approximately:
- 1080p (1920Γ1080): ~165 MB per second
- 4K (3840Γ2160): ~660 MB per second
- 8K (7680Γ4320): ~2.6 GB per second
Without compression, streaming video over the internet would be impossible. Even storing a few hours of video would require massive storage systems. Video codecs solve this by exploiting two fundamental properties of video:
Temporal redundancy: Consecutive frames are often very similar, with only small changes between them.
By recognizing and eliminating these redundancies, video codecs can achieve compression ratios of 100:1 or more while maintaining visually acceptable quality.
The Video Codec Pipeline
Whether it's H.264, HEVC, AV1, or VVC, all modern video codecs follow a similar basic structure:
- Partitioning: Divide frames into blocks (typically 4Γ4 to 64Γ64 pixels)
- Prediction: Predict block content from neighboring pixels (spatial) or previous/future frames (temporal)
- Transform: Convert prediction errors to frequency domain (usually DCT or similar)
- Quantization: Reduce precision of frequency coefficients (main source of compression)
- Entropy Coding: Losslessly compress the quantized coefficients (Huffman, arithmetic coding, etc.)
- Reconstruction: For feedback in prediction loop (inverse quant, inverse transform, add prediction)
Let's break down each stage with a simple example:
Key Frame Types: I, P, and B Frames
Video codecs use different types of frames to balance compression efficiency with random access and error resilience:
- I-frames (Intra-coded): Encoded independently using only spatial information. Serve as random access points.
- P-frames (Predictive): Encoded using motion compensation from previous I or P frames.
- B-frames (Bidirectional): Encoded using motion compensation from both previous and future frames (highest compression).
A typical encoding sequence might look like: I B B P B B P B B P B B I ... This structure provides excellent compression while allowing efficient seeking and error recovery.
Motion Compensation: The Heart of Temporal Compression
The biggest gains in video compression come from motion compensation β recognizing that objects in video move predictably from frame to frame.
Instead of encoding every pixel, the codec estimates how blocks of pixels have moved (motion vectors) and only encodes the difference between the predicted block and the actual content.
Advanced codecs use increasingly sophisticated motion compensation:
- Block sizes: From 4Γ4 to 64Γ64 pixels, with adaptive partitioning
- Fractional pixel accuracy: Quarter-pixel or eighth-pixel precision
- Multiple reference frames: Using more than just the previous frame for prediction
- Weighted prediction: Accounting for lighting changes and fade effects
Transform Coding: From Spatial to Frequency Domain
After motion compensation, we're left with prediction residuals β the differences between predicted and actual pixel values. These residuals often have energy concentrated in low frequencies.
By applying a transform (typically Discrete Cosine Transform or similar), we convert spatial data to frequency data, making it easier to compress:
Where C(u), C(v) are normalization factors, and f(x,y) is the spatial domain data.
The transform concentrates most of the signal energy into a few low-frequency coefficients, allowing us to aggressively quantize (reduce precision of) the high-frequency coefficients that contribute less to perceived quality.
Quantization: Where the Compression Happens
Quantization is the primary lossy step in video compression. We reduce the precision of transform coefficients, typically by dividing them by a quantization step size and rounding to integers.
Higher quantization = more compression = more loss Lower quantization = less compression = better quality
Modern codecs use adaptive quantization, allocating more bits to complex regions (edges, textures) and fewer bits to flat regions (sky, walls).
After quantization, we apply entropy coding to losslessly compress the quantized coefficients and motion vectors:
- CAVLC/CABAC (H.264): Context-adaptive binary arithmetic coding
- CABAC (HEVC): Improved arithmetic coding
- Tabaci (AV1): Adaptive binary arithmetic coding
- Exponential-Golomb/Rice: Simpler but less efficient methods
What We'll Cover
In this course, we'll explore:
- The technical fundamentals of modern video codecs
- Limitations and weaknesses of current approaches
- How specialized hardware enables complex algorithms
- The evolving landscape of competing standards
- Licensing, royalties, and economic considerations
- Emerging technologies and future directions
Ready to Begin?
Let's start with Lesson 1: a deep dive into how video codecs actually work, using H.264 as our primary example. We'll build up from first principles to understand why these algorithms look the way they do.