Lesson 2 of 6

Weaknesses of Existing Codecs

Blocking, ringing, banding, and the trade-offs we accept

You've probably noticed it before: that strange grid-like pattern in dark scenes of a streaming movie, or the slight shimmer around sharp edges in a fast-moving sports broadcast. These aren't transmission errors – they're fundamental artifacts of how video codecs work. Every compression technique involves trade-offs, and understanding these limitations is key to appreciating why newer codecs keep emerging and why we can never quite eliminate all visual imperfections.

TL;DR

All lossy video codecs produce characteristic artifacts due to block-based processing, quantization, and frequency domain transformations. These include blocking artifacts at boundaries, ringing (Gibbs phenomenon) around edges, banding in smooth gradients, detail loss from high-frequency removal, and color artifacts from chroma subsampling. Understanding these trade-offs helps explain codec design decisions and the ongoing evolution of compression standards.

1. Blocking Artifacts

The most recognizable artifact in compressed video is blocking – visible boundaries between processing blocks that appear as a grid-like pattern, especially in flat or low-contrast areas.

Blocking occurs because modern codecs divide frames into blocks (typically 4×4 to 32×32 pixels) and process each block independently. When quantization is coarse (high QP values), each block gets quantized to slightly different values, creating discontinuous jumps at block boundaries.

Why it happens: Block-based transform coding processes each block independently. At low bitrates, coarse quantization means adjacent blocks may have significantly different average values, creating visible seams.

Blocking is most noticeable in:

Dark or shadow areas with limited color variation
Slow-moving or static scenes
Flat backgrounds like walls or skies
Low bitrate encodes (especially at high resolutions)

Use the widget below to see how blocking increases with compression severity. Notice how the block boundaries become more pronounced as the quantization parameter increases.

2. Ringing Artifacts (Gibbs Phenomenon)

Ringing appears as oscillating patterns or "halos" around sharp edges in the image, particularly noticeable around high-contrast transitions like text on backgrounds or object boundaries.

This artifact stems from the Gibbs phenomenon in Fourier analysis: when we represent a discontinuous function (like a sharp edge) using a finite series of sinusoids (frequency components), we get overshoot and ringing near the discontinuity.

In video codecs, the Discrete Cosine Transform (DCT) represents image blocks as sums of cosine functions. When we quantize (especially aggressively), we lose some high-frequency components needed to accurately represent sharp edges, causing the reconstruction to overshoot and undershoot around edges.

Why it happens: Quantization removes high-frequency DCT coefficients needed to represent sharp transitions. The reconstruction approximates the edge using fewer frequencies, causing oscillations (ringing) around the true edge position.

Ringing is particularly problematic with:

Text and graphics overlays
Sharp object boundaries
High-contrast edges in scenes
Animated content with crisp lines

3. Banding / Contouring

Banding (also called contouring) appears as visible steps or bands in smooth gradients that should appear continuous, such as skies, walls, or gradual lighting transitions.

This occurs when quantization is too coarse in the DC (average) or low-frequency AC coefficients of the transform. Instead of smoothly varying pixel values, we get quantized levels that create visible bands.

Why it happens: Human vision is extremely sensitive to gradual changes in smooth areas. When quantization step size exceeds our ability to discern subtle gradations, we perceive discrete steps instead of smooth transitions.

Banding is most visible in:

Clear skies and smooth walls
Gradual lighting changes
Fog, haze, or atmospheric effects
Slow fades and dissolves
Any area with low spatial frequency content

4. Blurring / Detail Loss

Detail loss or blurring occurs when high-frequency image details are removed during quantization, resulting in a softer, less sharp image.

High-frequency DCT coefficients represent fine details like texture, noise, and sharp edges. When these coefficients are quantized to zero (or near-zero), we lose this detail in the reconstructed image.

Unlike blocking or ringing which create artificial structures, detail loss removes real image information. The result is a plasticky or waxy appearance where fine textures and subtle variations disappear.

Trade-off consideration: Some degree of detail loss is often preferable to more objectionable artifacts like blocking or ringing, as it maintains overall image structure while reducing compression artifacts.

5. Color Artifacts from Chroma Subsampling

Most video codecs use chroma subsampling to reduce color resolution while preserving full luminance (brightness) resolution, based on the fact that human vision is more sensitive to brightness changes than color changes.

Common subsampling formats include:

4:4:4: Full color resolution (no subsampling)
4:2:2: Half horizontal color resolution
4:2:0: Half horizontal and vertical color resolution (most common)

When chroma resolution is reduced, color edges can appear blurred or shifted relative to luminance edges, causing color fringing or bleeding, especially around sharp color transitions.

Why it happens: Chroma subsampling reduces color resolution, so when reconstructing the image, color transitions become less sharp than brightness transitions, leading to misalignment between color and luminance edges.

Use the widget below to compare different chroma subsampling schemes. Notice how 4:2:0 shows more color bleeding around edges compared to 4:4:4, while requiring significantly less bandwidth.

6. Generational Loss

Generational loss occurs when video undergoes multiple encode-decode cycles, with each generation compounding the artifacts from the previous generation.

Unlike generational loss in analog systems (which degrades gradually), digital generational loss can be particularly problematic because:

Each encode may make different quantization decisions
Different block boundaries may align differently
Motion compensation may propagate errors
Certain artifacts can become amplified over generations

This is why professional workflows prefer working with lossless or visually lossless intermediates, and why multiple compression steps (like capturing → editing → web delivery) should be minimized.

Real-world impact: User-generated content often suffers from generational loss as videos are downloaded, re-edited, and re-uploaded across platforms, each step potentially reducing quality further.

7. Complexity Tradeoffs

There's an exponential relationship between compression efficiency and computational complexity. Achieving better compression often requires exponentially more processing power.

This explains why:

Newer codecs like AV1 and VVC offer better compression but require more powerful hardware for encoding
Hardware acceleration becomes crucial for adopting newer standards
Real-time applications (video conferencing, live streaming) often use simpler codecs despite lower efficiency
Encoding complexity is often much higher than decoding complexity

As a rule of thumb, each generation of codec that provides ~50% bitrate savings typically requires ~10x the computational complexity for encoding.

8. Latency vs Compression Tradeoffs

Different applications have different sensitivity to latency, creating another important trade-off axis in codec design and selection.

Low-latency requirements:

Video conferencing (Zoom, Teams, Meet)
Live sports broadcasting
Gaming and cloud gaming
Real-time remote control systems

Higher latency tolerance:

Streaming video (Netflix, YouTube, Hulu)
Video on demand services
File-based workflows
Archival and storage applications

Codecs optimized for low latency often:

Use smaller GOP structures or I-frame only encoding
Reduce motion search ranges
Limit reference frames
Use simpler entropy coding modes

Example: A video conferencing codec might operate at 50-100ms end-to-end latency with moderate compression, while a streaming codec might use 2-5 second buffering for significantly better compression efficiency.

9. Motion Estimation Failures

Even the best motion estimation algorithms can fail under certain conditions, leading to specific artifacts that depend on how the codec handles prediction errors.

Common failure modes include:

Fast motion: When objects move faster than the search range can accommodate

Fades and dissolves: Global illumination changes that break pixel-value assumptions

Scene cuts: Abrupt changes where previous frames provide no useful prediction

Unusual motion patterns: Rotation, zooming, or complex deformations

Small, high-contrast objects: Difficult to track reliably

When motion estimation fails, the encoder must fall back to:

Spatial prediction (intra-coding within the frame)
Larger quantization steps (more compression artifacts)
Intra refresh cycles (periodically rebuilding from key frames)
Error resilience modes (increasing overhead for robustness)

This is why scenes with rapid action, quick cuts, or complex motion often show more visible artifacts than slow, predictable motion.

○ Mark as complete

← Lesson 1 Lesson 3 →