Digital video basics
Natural phenomena that we perceive as images are analog in nature. The camera's analog transducers transform the original analog information into an analog electrical signal, e.g. voltage. Analog composite signals, such as NTSC, PAL and SECAM, are subjected to various types of cumulative distortions and noise, which affect the quality of the reproduced picture. Separate distortions of the luminance and chrominance components as well as intermodulation between them are likely to occur. Such distortions can be reduced, but not completely eliminated, by performing all or at least a major part of production and post-production operations using component video signals.
The cumulative composite or component analog video signal impairments, and their effect on the reproduced picture, can be considerably reduced by using a digital representation of the video signal and effecting the distribution, processing and recording in the digital domain. The analog-to-digital (A/D) and digital-to-analog (D/A) conversions introduce some impairments. By a proper selection of two parameters, namely the sampling frequency and the quantizing accuracy, these impairments can be reduced to very low values. As long as the digitized signals are distributed, processed and recorded in the digital domain, these impairments are limited to those introduced by a single-pass A/D and D/A processing.
The coded signals
North American and European digital standardization efforts resulted in the ITU-R BT.601 recommendation, which established an agreement on a digital component video format that is compatible with both the 525/50 and 625/50 scanning formats and is at the root of all subsequent component digital developments.
Video signals are usually generated by an analog camera. The camera generates three gamma-corrected wideband primary signals: E'G (Green), E'B (Blue) and E'R (Red). By convention, the symbol “E” represents a voltage, and the prime sign indicates that the respective signal is gamma-corrected. We will discuss two predominant component digital formats:
- The ITU-R BT.601 component digital standard: This SDTV standard with a 4:3 aspect ratio covers a family of component digital formats, the well-known 4:4:4, 4:2:2 and 4:1:1. The pervasive 4:2:2 format uses a wideband (limited to Fb=5.75MHz) luminance signal (E'Y) and two narrowband (limited to Fb=2.75MHz) amplitude-scaled blue color-difference (E'CB) and red color-difference (E'CR) signals.
- The ITU-R BT.709 standard: The HDTV formats with a 16:9 aspect ratio specified by SMPTE 274M (1920×1080 interlaced scanning) and SMPTE 296M (1280×720 progressive scanning) are rooted in this standard. These formats are an extension of the 4:2:2 SDTV format and use a wideband (limited to Fb=30MHz) luminance signal (E'Y) and two narrow-band (limited to Fb=15MHz) amplitude-scaled blue color-difference (E'CB) and red color-difference (E'CR) signals.
The mathematical expressions defining these signals are given in Table 1. In both standards, the color-difference scaling factors were chosen to ensure that the signal amplitudes for a 100/0/100/0 color bars signal equal 0.7 V p-p.
The sampling process
Get the TV Tech Newsletter
The professional video industry's #1 source for news, trends and product and tech information. Sign up below.
Table 1. Component digital signal characteristics of the ITU-R BT.601 and ITU-R BT.709 standards. Click here to see an enlarged diagram.
The sampling of the video signal is essentially a pulse amplitude modulation process. It consists in checking the signal amplitude at periodic intervals. The sampling frequency (Fs) is a multiple of the horizontal scanning frequency and higher than twice the maximum baseband frequency of the analog signal (Fb) to avoid aliasing. Aliasing is visible as spurious picture elements associated with fine details in the picture. The only way to avoid aliasing is to use an anti-aliasing filter ahead of the A/D converter. The task of this filter is to reduce the bandwidth of the sampled baseband to less than Fs/2. Rec. 601 specifies the sampling frequencies of the three SDTV component analog signals as well as the characteristics of the associated anti-aliasing filters. The chosen sampling frequencies for the 4:2:2 format are 13.5MHz for E'Y and 6.75MHz for E'CB/E'CR. This sampling strategy results in 720 Y samples per active line and 360 each CB/CR samples per active line.
The SMPTE 274M and 296M standards specify the sampling frequencies of the three HDTV component analog signals as well as the characteristics of the associated anti-aliasing filters. The chosen sampling frequencies, shared by the two formats, are 74.25MHz for E'Y and 37.125MHz for E'CB/E'CR. The 274M has 1920 Y samples and 960 each CB/CR samples per active line. The 296M has 1280 Y samples and 640 each CB/CR samples per active line. Figure 1 details the 4:2:2 sampling structure. Note that the CB and CR samples are cosited with odd Y samples. The sampling strategy is called orthogonal sampling.
The quantizing process
Figure 1. The 4:2:2 sampling structure. Note that the CB and CR samples are cosited with odd Y samples. Click here to see an enlarged diagram.
The pulse amplitude modulation results in a sequence of pulses, spaced at T=1/Fs, whose amplitude is proportional to the amplitude of the sampled analog signal at the sampling instant. There is an infinite number of shades of grey, ranging from black (lowest video signal amplitude) to white (highest video signal amplitude), that the analog video signal can represent. The instantaneous sampling pulse amplitudes can be represented in the digital domain by only a limited number of binary values resulting in quantizing errors. The possible number of shades of grey is equal to 2n, where n is the number of bits per sample.
Experiments have shown that using less than eight bits per sample, the quantizing errors appear as “contouring.” With eight bits per sample or more, the quantizing errors appear, in general, as random noise (quantizing noise) in the picture. In practical applications, in order to avoid clipping, the signal occupies less than 2n steps, resulting in a specified “quantizing range.” Figure 2 shows the relationship between the E'Y, E'CB and E'CR analog component signal levels corresponding to a 100/0/100/0 color bars signal and the 10-bit and 8-bit Y, CB and CR digital sample values, as specified in ITU-R BT.601, SMPTE 274M and SMPTE 296M.
Figure 2. The relationship between the E’Y, E’CB and E’CR analog component signal levels corresponding to a 100/0/100/0 color bars signal and the 10-bit and 8-bit Y, CB and CR digital sample values, as specified in ITU-R BT.601, SMPTE 274M and SMPTE 296M. Click here to see an enlarged diagram.
In a 10-bit system, there are 1024 digital levels (210) ranging from 0 to 1023 (000 to 3FF hex). Levels 000, 001, 002, 003 and 3FC, 3FD, 3FE, 3FF are reserved to indicate timing references. Note that the sync is not sampled. This leaves a “maximum quantizing range” of 1016 digital levels, ranging from 4 to 1019 to represent the signal levels. The normalized (700mV p-p) Y signal levels are assigned a range extending from 64 to 940, a total of 877 quantizing levels. This leaves a small upper headroom (940 to 1019) and lower headroom (4 to 64). The normalized (700mV p-p) CB and CR signal levels are assigned a range extending from 64 to 960, a total of 897 quantizing levels. This leaves a small upper headroom (960 to 1019) and lower headroom (4 to 64). An 8-bit system would have 220 quantizing levels for the Y component and 225 quantizing levels for the CB and CR components.
The overall performance
The picture quality is related to the signal-to-RMS-quantizing-noise-ratio (SNR). The expression of signal-to-RMS-quantizing-noise-ratio yields a complicated formula that takes into consideration the quantizing range and the ratio Fs/2Fb. Taking into account the standards detailed above, the formula can be simplified to:
S/QRMS(dB) = 6n + 6
where:
S: Quantizing range occupied by the full p-p video signal amplitude
QRMS: RMS quantizing noise
n: Number of bits per sample
A 10-bit system would thus have an SNR of 66dB, and an 8-bit system would have an SNR of 54dB.
Michael Robin, a fellow of the SMPTE and former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill, and recently translated into Chinese and Japanese.
Send questions and comments to:michael_robin@primediabusiness.com