Computers & Networks: Media asset management

Media asset management (MAM) systems help people find what they are looking for by rapidly sifting through vast collections of materials to locate the material the user requests. MAM systems are particularly useful in video applications where it may be difficult or impossible to sift through thousands of hours of material manually.


Figure 1. This diagram illustrates the stratified-documentation technique that can be used in the cataloging process.

Search-and-retrieve technology has been economically applied to text for many years. But, until relatively recently, it hasn’t been practical to apply this technology to video. There were three main reasons for this. First, in the past, it was not economically feasible to store large amounts of video in a “computer-friendly” form. However, with the falling price of storage and the increasing processing power of computers, this problem has largely gone away.

Second, while it was relatively easy to develop retrieval technologies for text systems, developing technology that aids in retrieving images is another matter. But, over the last five years, dramatic improvements have been made in the automatic indexing, cataloging and retrieval of video. Finally, cost has been a significant barrier to the adoption of these technologies for video. While cost continues to be a factor, smaller and less costly systems are beginning to appear on the market. This allows users to implement MAM on a limited basis, learning for themselves where the benefit of these systems lies.

MAM components

Ingest is the process of taking material in whatever form and loading it into the system. The ingest process may involve playing back video tapes, transferring files from one server to another, or playing back audio DATs or CDs. During the ingest process, basic slate or “clapper-board” metadata are also entered. This information consists of items such as title, length, ID number and source. Some of this information must be manually entered, some of it may come from other systems, and in the near future, some of it will come from the metadata contained in the file itself through something like the Advanced Authoring Format (AAF) or Material Exchange Format (MXF).

Some MAM users employ an optional triage step to filter out unwanted material before it goes further into the MAM system. For example, it is very common for a feed from the White House to start about 30 minutes before the President appears. There is no reason to keep a 30-minute recording of an empty podium, so, in the triage step, MAM users can delete it. Triage may also be used to identify material that should receive expedited handling. One of the powerful benefits of MAM is that it can make one clip available to many users at the same time. If a user knows that a particular clip will be heavily used, it might be a good idea to fast-track that content through the rest of the process.

During the annotation process, people make notes about what is happening while watching the scene on their computer or a separate monitor. MAM applications typically include VTR-like control of the playback, so that the person annotating the scene can pause playback while making notes. To some extent, annotation can be performed automatically. Some vendors have developed technology that harvests information from closed captions, subtitles and on-screen text. While speech-to-text technology has struggled, recent breakthroughs might make it possible to annotate scenes using words spoken either by the annotator herself or directly from the dialog.

Automatic indexing works by detecting naturally occurring changes in video content, such as fades to black and large changes in scene content from one frame to the next. Such changes usually indicate the end of one scene and the beginning of another. Computers automatically index ingested scenes using advanced technologies such as cut detection, scene-change detection and even image recognition. And, as research moves forward at a fast pace, driven to some extent by the events of Sept. 11, we can expect to benefit from these new advances.


Automatic cut detection and scene-change detection in asset management systems like blue order’s media archive 2.8 is used to produce a “light-table” view, summarizing an entire movie at a glance. Users can quickly navigate through the content to find the desired scene.

Cataloging is the process of organizing information about a scene so that it can be retrieved later. A cataloger organizes information into subject areas and may store it in a standard data model.

Standardized vocabularies and limited thesauri are used to aid in retrieval. Why is a standardized vocabulary important? Well, let’s say you wanted to locate images containing the basketball player Shaquille O’Neal from a MAM system used to catalog images from live sporting events. Let’s also assume that it was an early system that did not have a standardized vocabulary. If you typed in “Shaquille,” you might get about 20 hits (images that the system thought you would like). If you then typed in “Shak,” you might get another 10.

Entering “Shack” might yield two more. The author recently suffered this very experience. It illustrates that, if a piece of video is entered without accurate metadata, the video is lost. Actually, it is worse than lost — because you are paying to store something you will never be able to find.

During the cataloging process, different techniques can be used. Stratified documentation is one technique used by some vendors to make metadata more accessible. In this process, temporal stratification makes use of time lines to point to a piece of the audio or video object. Descriptions can be linked to this piece by storing them together with the time-code information. The media object itself may be stored at a completely different location. This allows describing the object in several layers. As Figure 1 shows, metadata about a videotaped meeting such as the persons present, the speakers, transcripts, subtitles, copyright and image content can be described on distinct layers, while each layer may employ a completely different time line. Considering the number of evolving new automatic indexing tools that can generate much of this information automatically from the media object, his can be especially useful. During retrieval, combinations of these strata may be referenced to identify exactly the media segments you are interested in. The same approach applies for spatial stratification, virtually breaking an image into pieces and describing them individually, without really touching the image.

Storage is not a separate step in the MAM process. Material is stored in the system as soon as it begins the ingest process, and metadata is added as it is entered, all in real time. As soon as the information is entered, it is available to other clients in the system.

All the processes discussed thus far involve putting information into the MAM system. The processes discussed below involve taking information out of the system.

Using a retrieval client, users query the system to locate material. For example, a user may search for specific text or image using different still-image views that were created using cut-detection technology (see Figure 2). The specific functions available in retrieval clients vary widely from manufacturer to manufacturer. How might a user use a retrieval client? Consider the following example. A producer is asked to make a 30-second promotional piece for a movie that will air next Saturday night. The producer views the movie in its entirety, making notes of scenes he thinks will be appropriate for the promo. Using a retrieval system, he looks at various scenes that were detected using cut-detection technology. He quickly identifies the scene he is looking for, and then calls up a low-resolution copy of the video on his desktop editor. The system allows him to go directly to the part of the movie he identified in the still-image view where he can then begin preparing a rough-cut edit. This rough-cut system produces an EDL that can then be used to expedite the on-line editing process.

Once the user identifies the material needed, the next step is packaging it. Some MAM systems allow users to output it in the resolution that is appropriate for their use--for example, a high-resolution copy, a browse copy or still images. Some MAM systems can deliver images directly to the user in an e-commerce or Web-based model. These solutions may include billing and rights-management options.

All MAM solutions also include maintenance and user access functionality. There are two important things to recognize about MAM systems. The first is that in some vendor implementations, some processes can occur in parallel. Consider the following example. For a breaking news story, once the ingest process begins, operators begin indexing and annotating the material. At the same time, automatic annotation processes can begin populating the MAM system with information. Finally, editors can begin rough-cut editing material before recording at the ingest station has ended. The second important thing to recognize about MAM systems is that many of them use inter-process communications. This means that when an ingest client completes an entry for a particular piece of media, those entries are immediately available to other clients on the system — without waiting for any files to close and without having to restart any applications.

Brad Gilmer is president of Gilmer & Associates, executive director of the AAF Association, and executive director of the Video Services Forum.

Brad Gilmer is president of Gilmer & Associates, executive director of the AAF Association, and executive director of the Video Services Forum.

Back to the topReturn to Broadcast Engineering

Send questions and comments to:brad_gilmer@primediabusiness.com

Do you have a comment about this article? To tell us your thoughts, click here.

CATEGORIES