Course Overview

Video Codecs: From Why to Wow

Understanding how video compression works, from fundamentals to future innovations

Every time you stream a Netflix show, make a Zoom call, or watch a YouTube video, you're benefiting from one of the most impressive feats of engineering: video compression. But how can a 4K movie that would normally require terabytes of storage fit into a few gigabytes? Why does your video sometimes look blocky or blurry? And why are there so many different codecs like H.264, HEVC, and AV1? Let's dive into the fascinating world of video codecs and uncover the magic that makes modern video possible.

TL;DR

Video codecs compress video by removing redundancy and irrelevancy, using techniques like temporal prediction, transform coding, and entropy encoding. Modern codecs balance compression efficiency, computational complexity, and licensing considerations. The landscape is evolving rapidly with royalty-free alternatives challenging established standards.

What Is Video Compression, Really?

Raw video is enormous. A single minute of uncompressed 1080p video at 30 frames per second requires approximately:

1080p (1920×1080): ~165 MB per second
4K (3840×2160): ~660 MB per second
8K (7680×4320): ~2.6 GB per second

Without compression, streaming video over the internet would be impossible. Even storing a few hours of video would require massive storage systems. Video codecs solve this by exploiting two fundamental properties of video:

Spatial redundancy: Nearby pixels in a frame often have similar colors and brightness.
Temporal redundancy: Consecutive frames are often very similar, with only small changes between them.

By recognizing and eliminating these redundancies, video codecs can achieve compression ratios of 100:1 or more while maintaining visually acceptable quality.

The Video Codec Pipeline

Whether it's H.264, HEVC, AV1, or VVC, all modern video codecs follow a similar basic structure:

Partitioning: Divide frames into blocks (typically 4×4 to 64×64 pixels)
Prediction: Predict block content from neighboring pixels (spatial) or previous/future frames (temporal)
Transform: Convert prediction errors to frequency domain (usually DCT or similar)
Quantization: Reduce precision of frequency coefficients (main source of compression)
Entropy Coding: Losslessly compress the quantized coefficients (Huffman, arithmetic coding, etc.)
Reconstruction: For feedback in prediction loop (inverse quant, inverse transform, add prediction)

Let's break down each stage with a simple example:

Key Frame Types: I, P, and B Frames

Video codecs use different types of frames to balance compression efficiency with random access and error resilience:

I-frames (Intra-coded): Encoded independently using only spatial information. Serve as random access points.
P-frames (Predictive): Encoded using motion compensation from previous I or P frames.
B-frames (Bidirectional): Encoded using motion compensation from both previous and future frames (highest compression).

A typical encoding sequence might look like: I B B P B B P B B P B B I ... This structure provides excellent compression while allowing efficient seeking and error recovery.

I 1

B 2

B 3

P 4

B 5

B 6

P 7

B 8

B 9

P 10

B 11

I 12

I‑Frame

Largest, self-contained. Enables random access and seeking.

P‑Frame

Medium. Predicts from previous I/P frames using motion vectors.

B‑Frame

Smallest. Uses both past and future frames for best compression.

Motion Compensation: The Heart of Temporal Compression

The biggest gains in video compression come from motion compensation – recognizing that objects in video move predictably from frame to frame.

Instead of encoding every pixel, the codec estimates how blocks of pixels have moved (motion vectors) and only encodes the difference between the predicted block and the actual content.

Example: If a soccer ball moves 10 pixels right and 2 pixels down between frames, instead of encoding the entire ball, we encode: "move this block by (+10, +2) and encode only the small difference."

Advanced codecs use increasingly sophisticated motion compensation:

Block sizes: From 4×4 to 64×64 pixels, with adaptive partitioning
Fractional pixel accuracy: Quarter-pixel or eighth-pixel precision
Multiple reference frames: Using more than just the previous frame for prediction
Weighted prediction: Accounting for lighting changes and fade effects

Transform Coding: From Spatial to Frequency Domain

After motion compensation, we're left with prediction residuals – the differences between predicted and actual pixel values. These residuals often have energy concentrated in low frequencies.

By applying a transform (typically Discrete Cosine Transform or similar), we convert spatial data to frequency data, making it easier to compress:

F(u,v) = (1/√(2N)) C(u) C(v) ΣΣ f(x,y) cos[((2x+1)uπ)/(2N)] cos[((2y+1)vπ)/(2N)]

Where C(u), C(v) are normalization factors, and f(x,y) is the spatial domain data.

The transform concentrates most of the signal energy into a few low-frequency coefficients, allowing us to aggressively quantize (reduce precision of) the high-frequency coefficients that contribute less to perceived quality.

Quantization: Where the Compression Happens

Quantization is the primary lossy step in video compression. We reduce the precision of transform coefficients, typically by dividing them by a quantization step size and rounding to integers.

Higher quantization = more compression = more loss Lower quantization = less compression = better quality

Modern codecs use adaptive quantization, allocating more bits to complex regions (edges, textures) and fewer bits to flat regions (sky, walls).

After quantization, we apply entropy coding to losslessly compress the quantized coefficients and motion vectors:

CAVLC/CABAC (H.264): Context-adaptive binary arithmetic coding
CABAC (HEVC): Improved arithmetic coding
Tabaci (AV1): Adaptive binary arithmetic coding
Exponential-Golomb/Rice: Simpler but less efficient methods

What We'll Cover

In this course, we'll explore:

The technical fundamentals of modern video codecs
Limitations and weaknesses of current approaches
How specialized hardware enables complex algorithms
The evolving landscape of competing standards
Licensing, royalties, and economic considerations
Emerging technologies and future directions

Ready to Begin?

Let's start with Lesson 1: a deep dive into how video codecs actually work, using H.264 as our primary example. We'll build up from first principles to understand why these algorithms look the way they do.

✓ Mark as complete

← Home Lesson 1 →