Real loudness control

Loudness, if you will, increasingly has been making noise in the broadcast industry. Standards organizations have been working long and hard to publish recommendations, best practices and specifications to guide the industry and ensure consistent measurements.

Governments are introducing legislation to promote and enforce better regulation. Equipment vendors are writing articles to discuss the situation. Experts have penned books on the subject. Yet, despite all the available information, there remains a great deal of uncertainty as to how to best deal with the issue.

The loudness paradox lies at the heart of the matter. On one side, there is a desire to maintain the artistic integrity of the original program (translation: don't modify audio). On the other, there is a need to minimize viewer frustration with regard to loudness inconsistencies across different content and channels. (See Figure 1.) This often requires some level of audio processing.

The paradox is this: How can loudness be kept consistent if the audio is not modified?

Before we discuss possible solutions, it's important to define a few basic terms:

Artistic integrity: Original dynamic range + original spectral density.
Long-term loudness: Loudness of the clip averaged over a relatively long period of time (usually the duration of the entire clip).
Short-term loudness: Loudness of the clip measured over a very short period of time (measured in milliseconds or seconds).
Dialnorm: Metadata value used to indicate the long-term loudness of a clip.

Loudness control on ingest

Careful audio management on ingest will go a long way toward providing consistent loudness. Tools are available now that allow broadcasters to measure the loudness of an entire clip, and then adjust the gain of the clip to ensure that the overall long-term loudness is at the desired level. That way, regardless of what content is played, each clip will share the same long-term loudness, thereby minimizing loudness inconsistencies.

A single gain value is applied across the clip, ensuring the dynamic range (the difference between the softest and loudest audio) is perfectly preserved. (See Figure 2) This satisfies the first half of the artistic integrity formula: artistic integrity = dynamic range + spectral density. Since the gain is applied equally across the frequency spectrum, the spectral content of the audio is also preserved, fully satisfying the artistic integrity formula: artistic integrity = dynamic range + spectral density.

However, there are a few places where this approach is problematic:

Silence — Long periods of silence contribute to average loudness, lowering the measured result. The gain applied must, therefore, be increased such that the long-term loudness of the clip reaches the desired level. This impacts the dialog level of the clip and can lead to dialog levels that are slightly different between clips. Dialog is considered a key “anchor” point in human loudness perception. Gating algorithms have been developed to suppress periods of silence from contributing to the measured result. Some mild, short-term loudness control processing will help reduce differences in perceived dialog levels, while having little impact on the overall dynamic range. Trade-off decision: With a proper gating algorithm, differences in dialog levels should be very small. Any short-term correction should be very light.
Transitions — Short-term characteristics of the human ear during a clip transition mean that although two clips can have the same long-term loudness, the switch point from one clip to the next could cause the ear to perceive a sudden jump in loudness. Again, mild, short-term loudness control will help to reduce the jumps. Trade-off decision: The jumps in loudness perception will not always happen. When they do, however, they are usually short lived. Any correction applied here should also be light. That said, commercials can be the exception. The largest source of complaints, transitioning to a commercial often results in a sudden increase in loudness. This is partly due to the severely restricted dynamic range of most commercials, which translates to a high average loudness value.
Live audio — Live material and material received too late to be processed before going to air cannot have its long-term loudness measured. It is necessary to apply some level of short-term, real-time control to keep the loudness reasonably close to the desired level. The level of the anchor point, in this case, will depend on the person mixing the audio or the absolute gain value of the incoming audio. It is, therefore, subject to more unpredictability. Generally speaking, this scenario requires more aggressive loudness control to ensure the anchor point is close to the desired target level. Trade-off decision: In this case, the variations in perceived loudness could be substantial, and the loss of artistic integrity is overshadowed in cases where the audio level becomes annoying to the listener. More aggressive loudness control is recommended.

Dynamic correction profiles

With many different profiles, it becomes necessary to adapt to the content. This translates into providing the least possible amount of real-time correction for the situation at hand.

Automation can be of great service to a network or station. Known good content can have mild processing only. Enhanced processing can be enabled for live or unknown/unprocessed content. It is important for real-time loudness controllers to be able to respond to automation triggers.

A benefit of leaving some mild processing for all content is it helps when known good content is not at the correct loudness. This can happen when the ingest control process is applied incorrectly or inconsistently, or when dialnorm metadata is either incorrect or missing.

In addition to mild processing, some loudness controllers provide more advanced features such as intelligent metadata handling of missing/incorrect dialnorm, and input loudness measurement alarms to warn of very hot or very quiet content.

Cascaded processing

One of the simplest approaches is a “set and forget” control profile. The controller will be configured to apply mild processing, providing an excellent trade-off between artistic integrity and consistent loudness level. That said, someone upstream or downstream may also apply loudness correction. Imagine where a feed, with mild loudness correction is sent to a local station, which then also applies mild correction. The effect is an ever-increasing reduction in the signal's dynamic range.

Everyone in the audio chain must understand sources and destination, and have flexible enough systems to dynamically make changes as required.

Tailored solutions

There are many unique applications and situations, with different characteristics and requirements for each. There is no “one size fits all,” and engineers must tailor solutions to the situation at hand.

Good ingest loudness management is one of the most important steps in ensuring consistent levels. However, processing the audio is inevitable in some scenarios. Therefore, quality of the real-time loudness control processor is key when choosing a real-time solution.

To provide a solution that minimizes processing, real-time controllers must apply the most appropriate correction profile. The goal is a system that preserves artistic integrity of the audio wherever possible, while applying only enough processing to ensure compliance. Spending time up front to configure such a system will result in an enhanced listening experience.

Randy Conrod is product manager, digital products, Harris Broadcast Communications. Stephane Gauthier is senior account manager, strategic sales, Altera.

ITU-R BS.1770 adoption

The measurement algorithm must come close to mimicking the human perception of loudness in order to be useful. The accuracy of an algorithm is determined by how close it comes. Any difference between the two essentially leads to incorrect audio gain adjustments, and therefore unnecessary audio modifications and an incorrect overall loudness.

The ITU-R BS.1770 algorithm was developed to identify a means of measuring loudness by splitting the signal into five frequency bands and applying a weighting filter. This solution is simple and cost-effective for equipment manufacturers to implement, and results in a reasonably good approximation.

More complex techniques (such as critical-band analysis) are available with better accuracy. These techniques require more hardware/software processing power, and the expertise is limited to very few companies worldwide. Such complex technology would severely restrict wide adoption of the standard. ITU-R BS.1770 represents a good balance between accuracy and simplicity, enabling widespread adoption. With everyone using the same technique, content across providers will be more consistent.

For more information on measurement techniques, refer to “Perceptual Loudness Management for Broadcast Applications” by DTS.

Single-, multi- and critical-band techniques

Manufacturers of multiband compressors often provide content-based profiles, in addition to controlling the amount of loudness control. This is useful for choosing profiles that match the type of content going through, so that the multiband compressor can adapt to different audio characteristics (for example, news, sports, drama, jazz music, rock music, etc.). This helps to compensate for the differences between how a multiband compressor measures loudness and how the ear perceives loudness. It also helps to compensate for the changes to spectral density that multiband compressors, by definition, impose on the audio.

Loudness controllers that use critical-band analysis, however, do not need to offer such profiles. The increased computational algorithm delivers a loudness measurement that is mathematically much closer to how the ear perceives loudness and is a single result for the entire audible range. This means that a loudness controller that uses critical band loudness measure can apply a single gain to the entire signal (known as “wideband”) and, therefore, does not change the spectral density of the signal in any way.

This is not to be confused with single-band compressors (also known as wideband) that only look at the intensity of the signal to apply gain correction. These original wideband compressors create significant, audible artifacts and are not suitable for modern broadcast loudness control.

For more information on single-band, multi-band and critical-band techniques, refer to “Perceptual Loudness Management for Broadcast Applications” by DTS.

Recommended reading