Course Overview

Video Codecs: From Why to Wow

Understanding how video compression works, from fundamentals to future innovations

Every time you stream a Netflix show, make a Zoom call, or watch a YouTube video, you're benefiting from one of the most impressive feats of engineering: video compression. But how can a 4K movie that would normally require terabytes of storage fit into a few gigabytes? Why does your video sometimes look blocky or blurry? And why are there so many different codecs like H.264, HEVC, and AV1? Let's dive into the fascinating world of video codecs and uncover the magic that makes modern video possible.
TL;DR

Video codecs compress video by removing redundancy and irrelevancy, using techniques like temporal prediction, transform coding, and entropy encoding. Modern codecs balance compression efficiency, computational complexity, and licensing considerations. The landscape is evolving rapidly with royalty-free alternatives challenging established standards.

What Is Video Compression, Really?

Raw video is enormous. A single minute of uncompressed 1080p video at 30 frames per second requires approximately:

Without compression, streaming video over the internet would be impossible. Even storing a few hours of video would require massive storage systems. Video codecs solve this by exploiting two fundamental properties of video:

Spatial redundancy: Nearby pixels in a frame often have similar colors and brightness.
Temporal redundancy: Consecutive frames are often very similar, with only small changes between them.

By recognizing and eliminating these redundancies, video codecs can achieve compression ratios of 100:1 or more while maintaining visually acceptable quality.

The Video Codec Pipeline

Whether it's H.264, HEVC, AV1, or VVC, all modern video codecs follow a similar basic structure:

  1. Partitioning: Divide frames into blocks (typically 4Γ—4 to 64Γ—64 pixels)
  2. Prediction: Predict block content from neighboring pixels (spatial) or previous/future frames (temporal)
  3. Transform: Convert prediction errors to frequency domain (usually DCT or similar)
  4. Quantization: Reduce precision of frequency coefficients (main source of compression)
  5. Entropy Coding: Losslessly compress the quantized coefficients (Huffman, arithmetic coding, etc.)
  6. Reconstruction: For feedback in prediction loop (inverse quant, inverse transform, add prediction)

Let's break down each stage with a simple example:

πŸŽ₯ Pipeline in Action: Quantization Effect

Each frame goes through prediction, transform, quantization, and encoding. The slider shows how quantization parameter (QP) affects the reconstructed detail β€” higher QP = smaller file, lower quality.

Key Frame Types: I, P, and B Frames

Video codecs use different types of frames to balance compression efficiency with random access and error resilience:

A typical encoding sequence might look like: I B B P B B P B B P B B I ... This structure provides excellent compression while allowing efficient seeking and error recovery.

πŸ“Š Frame Types in Action
I 1
B 2
B 3
P 4
B 5
B 6
P 7
B 8
B 9
P 10
B 11
I 12

I‑Frame

Largest, self-contained. Enables random access and seeking.

P‑Frame

Medium. Predicts from previous I/P frames using motion vectors.

B‑Frame

Smallest. Uses both past and future frames for best compression.

A GOP (Group of Pictures) with 12 frames. I-frames are orange, P-frames are blue, B-frames are gray. B-frames give the most compression but need both directions for decoding.

Motion Compensation: The Heart of Temporal Compression

The biggest gains in video compression come from motion compensation – recognizing that objects in video move predictably from frame to frame.

Instead of encoding every pixel, the codec estimates how blocks of pixels have moved (motion vectors) and only encodes the difference between the predicted block and the actual content.

Example: If a soccer ball moves 10 pixels right and 2 pixels down between frames, instead of encoding the entire ball, we encode: "move this block by (+10, +2) and encode only the small difference."

Advanced codecs use increasingly sophisticated motion compensation:

Transform Coding: From Spatial to Frequency Domain

After motion compensation, we're left with prediction residuals – the differences between predicted and actual pixel values. These residuals often have energy concentrated in low frequencies.

By applying a transform (typically Discrete Cosine Transform or similar), we convert spatial data to frequency data, making it easier to compress:

F(u,v) = (1/√(2N)) C(u) C(v) ΣΣ f(x,y) cos[((2x+1)uΟ€)/(2N)] cos[((2y+1)vΟ€)/(2N)]

Where C(u), C(v) are normalization factors, and f(x,y) is the spatial domain data.

The transform concentrates most of the signal energy into a few low-frequency coefficients, allowing us to aggressively quantize (reduce precision of) the high-frequency coefficients that contribute less to perceived quality.

Quantization: Where the Compression Happens

Quantization is the primary lossy step in video compression. We reduce the precision of transform coefficients, typically by dividing them by a quantization step size and rounding to integers.

Higher quantization = more compression = more loss Lower quantization = less compression = better quality

Modern codecs use adaptive quantization, allocating more bits to complex regions (edges, textures) and fewer bits to flat regions (sky, walls).

After quantization, we apply entropy coding to losslessly compress the quantized coefficients and motion vectors:

What We'll Cover

In this course, we'll explore:

Ready to Begin?

Let's start with Lesson 1: a deep dive into how video codecs actually work, using H.264 as our primary example. We'll build up from first principles to understand why these algorithms look the way they do.

βœ“ Mark as complete
← Home