Long-GOP editing
Product presentations at NAB are a great way to rest your burning feet while looking like you are absorbing significant information. At the 2009 NAB Show, in a state somewhere before sleep, I heard a presenter tie long-GOP formats to generational loss caused by adding titles to a production. Say what?
While the industry may have moved past a pronouncement made in 2004 that “…MPEG-2 ain't supposed to be edited,” the NAB 2009 statement sadly matched comments made over the last five years — often by Hollywood influencers — that reveal a profound misunderstanding about how modern editing tools work.
A few years ago, negative statements about long GOP were simply veiled attacks on MPEG-2. But, now with AVCHD and AVCCAM — both long-GOP versions of MPEG-4 — claims about GOP length and editing cover a wide range of formats and products. By addressing the myths surrounding MPEG-2, we can hopefully prevent the same myths resurfacing with long-GOP H.264/AVC.
Performance concerns
Some myths about interframe encoding may result from explaining, in simple terms, how encoding works. A key frame often was described as though it were a photograph, with no mention that an I-frame itself is highly compressed. Using terms that better fit a description of delta-modulation, subsequent frames were described as containing differences from the initial frame.
It's not surprising that those who conceive of computer-based editing as nothing more than the replacement of VTRs with hard disks were sure the need to keep “rewinding” the disk files to find I-frames would make jog and shuttle sluggish and, therefore, a serious hindrance to editing.
In reality, NLEs access a disk only to replenish large buffers held in RAM. Moreover, B and P frames are stored within a GOP in a series that facilitates decoding. (For a brief review of interframe and intraframe MPEG-2 and MPEG-4 encoding, see the “MPEG-2 and H.264/AVC” article in the July 2008 issue.)
Get the TV Tech Newsletter
The professional video industry's #1 source for news, trends and product and tech information. Sign up below.
A more realistic concern was that the enormous number of calculations required to obtain each image would prevent multistream real-time editing. However, as reported in my review of a RAID system (“CalDigit's HDPro” in the September 2008 issue), I measured nine streams of 1920 × 1080 XDCAM EX from a Mac Book Pro. These kinds of numbers effectively refute this concern.
Unfortunately, this concern remains true for AVCHD and AVCCAM, as well as AVC-Intra. Nevertheless, there will be a day when H.264/AVC performance concerns will vanish.
Native vs. intermediate editing
There have always been warnings about long-GOP MPEG-2, such as the one I heard at NAB. Although they sound reasonable, they involve invalid assumptions. At heart is the belief that because long-GOP MPEG-2 is highly compressed, you'll probably want to convert that MPEG-2 stream to something else before you do any editing or compositing work.
Converting MPEG-2 (or H.264/AVC) to an intermediate codec — other than uncompressed — results in at least some quality loss because it involves a decode followed by a recompression using an intermediate codec. Moreover, conversion always increases the size of all your source files because interframe source files require the least possible storage space. But, more importantly, conversion during import in no way can improve or preserve image quality. Even with a conversion to uncompressed video, image quality only remains constant.
Additional warnings and recommendations involve reference to the evils of 4:2:0 chroma sampling and generation loss caused by multiple re-encodes of long-GOP files to long-GOP files. First, it is important to note that although most long-GOP formats have employed 4:2:0 sampling, this is not an inherent characteristic of interframe encoding. For example, 50Mb/s Sony HDCAM 422HD is a long-GOP format.
Second, the quality of 4:2:0 sampling is not the subject of this debate. Rather, the question is at what point in the editing process 4:2:0 video is upsampled to either 4:2:2 or 4:4:4. A 4:2:2 conversion is made so various video formats can be mixed together. To mix RGB graphics with video, a conversion to 4:4:4 is performed.
A conversion can be made within a VTR when MPEG-2 is decoded prior to being sent as uncompressed 4:2:2 video over an HD-SDI connection. Another option is to perform the conversion during import when MPEG-2 or AVCHD/AVCCAM is transcoded to an intermediate codec. And, of course, the conversion can be made on the fly as MPEG-2 or AVCHD is decoded to an uncompressed YCrCb signal. In all cases, the key to upconversion quality is the equations themselves and the degree to which rounding errors are prevented. The point at which the conversion occurs is irrelevant. (See Figure 1.)
Early NLE operation
Early NLEs generated effects following this process: One or more sources were decompressed to 4:2:2 YCrCb. The digital data stream(s) was mathematically (dissolves) or logically (wipes) combined. The resulting 4:2:2 YCrCb data were then recompressed using the same codec used by the source files. The recompressed files were called preview, render or precompute files.
Using this procedure, long-GOP video was subjected to generation loss during recompression. Moreover, graphics and titles would also be subjected to long-GOP compression that would indeed cause significant graphics degradation.
In order to save rendering time, many NLEs would reuse the render files when an editor, for example, added another layer to the layers already rendered. Likewise, to save compute time, renders were used when a project was exported to another format.
If modern professional NLEs such Avid Media Composer and Apple Final Cut Pro worked in the manner described, generation loss would indeed be a valid concern. Thankfully, they do not.
Current NLE operation
When MPEG-2 sources are used by a project, some of the processes previously described are the same. One, or more, MPEG-2 sources are decoded on the fly to 4:2:2 or 4:4:4 YCrCb data. One data stream, one data stream combined with one or more graphics, or the computed result of combining several data streams (plus one or more graphics) become a real-time video stream that is sent to a monitor window on the computer's display and/or to an external monitor. The computed result is used only as a real-time preview. (See Figure 2.)
In this cycle, long-GOP sources are decoded only once. And, it makes no difference when there are decoded — during ingest or at the time of playback. Moreover, if, as recommended by some, our 4:2:0 video had been decoded and recompressed, this 4:2:2 intermediate format would also have had to be decompressed on the fly to 4:4:4 YCrCb data. Upsampling 4:2:0 to 4:2:2 and then later upsampling from 4:2:2 to 4:4:4 opens the window to more conversion rounding errors.
The same logic applies to recommendations to convert 8-bit sources to an intermediate format that uses 10-bit data. Placing 8-bit samples within 10-bit words, of course, does not increase the precision of the digital data. It simply needlessly increases the size of all source files.
To increase the accuracy of preview or render calculations, the NLE itself need only be set to use a high-precision mode. Final Cut Pro, for example, can compute effects at 8 bits, 10 bits or 32-bit floating-point. Media Composer works at either 8 bits or 16 bits. In all cases, 8-bit data are on the fly fed into higher precision calculations.
As the number of layers increases and effects become computationally more complex, real-time previews are no longer practical. In this case, the editor can choose to render all, or portions, of a project. While making this choice might seem to open the door to generation loss, it does not.
Final Cut Pro offers an editor the option of selecting Apple's “uncompressed quality” ProRes 422 HQ as the render codec. (See Figure 3.) Generation loss caused by re-encoding to a long-GOP codec is prevented. A render file can be played back for a smooth preview, or combined with new video and/or effects.
Transfer from and to NLEs
While video transfer to Apple's Color application can be long-GOP MPEG-2, during color correction, data are decoded to 4:2:2 video. Color correction is never performed with MPEG-2. Color corrected results are returned to Final Cut Pro using ProRes 422 HQ.
This same procedure can be used to transfer segments from Final Cut Pro (or Media Composer) to another application. Working this way saves the needless conversion of hundreds of hours of source material to an intermediate codec. Transfer back is via ProRes 422 HQ to Final Cut Pro or Avid's mastering quality DN×HD codec to Media Composer.
Export
Long-GOP generation loss during export is prevented because Final Cut Pro always generates each export frame from source files. Render files are not used. (See Figure 4.) With an Avid Media Composer, deleting precomputes before export prevents render files from being used during export — even though they use DNxHD or DVCPRO HD codecs.
Given the fear, uncertainty and doubt that has too long existed around long-GOP MPEG-2, is there a scenario in which generation loss can occur from natively editing long-GOP MPEG-2? Yes. Were an editor to not export an HD master to HDCAM SR, HDCAM, XDCAM 422HD, D5 or tape-based DVCPRO HD and instead export to a tape format that employs long-GOP MPEG-2, then the master would indeed be second-generation long-GOP MPEG-2. However, formats such as HDV are rarely accepted as master tapes.
Steve Mullen is owner of Digital Video Consulting, which provides consulting and conducts seminars on digital video technology.