The art and science of multi-channel playout

The adoption of centralcasting models for the distribution of multiple channels has presented broadcasters with significant challenges in terms of designing sufficiently robust and reliable playout infrastructures. Nearly all current infrastructures are based on video servers, and while many server designs can support the playout of multiple channels, a number of considerations for long-term practicality and quality of service should be examined for any new facility or facility upgrade.

Reliability

Establishing a server infrastructure with 24x7 reliability is of the utmost importance. Obviously, redundancy of items such as fans and power supplies is a must. Protection of storage is also critical to the system’s reliability, but not nearly so straightforward. Broadcasters have many options when choosing a RAID scheme for their facilities, and each solution offers its own unique benefits while also having a direct impact on the amount of storage available for media.

The type and size of files used by a facility should help to determine the appropriate solution for storage protection. Because not all files store efficiently on every RAID system, part of the art of implementing a reliable playout server is maximizing available storage, regardless of the file type. For example, most playout servers use a RAID stripe that is configured for storage of large files. As a consequence, small files can take up substantially more storage space than what the size of the file would indicate. The crux of the issue here is that there is a minimum block (or chunk) size that can be stored on a disk. If a small file is striped across many disks, as in a RAID3 partition, only a fraction of each chunk will be filled with real data. The rest remains unused, and the file takes up much more space than is absolutely necessary.

Figure 1. A typical RAID3 workflow. Click here to see an enlarged diagram.

The chunk size is set by the file system for the most efficient storage of large files, and it is regularly 64KB or larger. (See Figure 1 on page 28.) If we assume a chunk size of 64KB and a small file of 64KB, then in the RAID system shown, the original file would be split up into four chunks (16KB of data) plus another 16KB chunk of redundancy data. The system can only write in chunks of 64KB to each drive, so it zero fills each chunk of the file to make it 64KB, and then writes the chunks to each disk. So the 64KB file now takes up 320KB of storage.

Thus, for small files, perhaps a RAID1 methodology would be better. RAID1 is simple mirroring: The file is written in its entirety to two separate disks. If one disk fails, the other disk still has a full copy of the file. (See Figure 2). The file now occupies 128KB of total disk space, which is much more efficient storage while remaining redundantly protected. Intelligent use of the correct RAID technology is the only way of maximizing storage efficiency for varying file sizes. (By the way, there’s no reason why one can’t have more than one RAID striping configuration on an array; they just have to managed correctly.) This issue is of increasing importance as servers start to embrace the idea of standard networking topologies, with the resulting increase in the number of smaller files stored on the server.

Another important consideration with regard to reliability is the redundancy of the server’s operating system (OS). An OS that resides on a single drive is clear cause for concern. Drives do fail, and the loss of the OS will result in a complete server failure. Many manufacturers, therefore, go to significant lengths to offer redundancy of the main system OS.

Finally, all of this redundancy is flawed unless the user is made aware that a fault condition exists. A power supply can fail, and the system will most likely stay up, running on a back-up power supply. This back-up power supply system is not redundant and won’t be until the faulty supply is replaced. It is important, then, that an automatic scheme should notify the engineering department that maintenance is required. SNMP is often used, and some servers go so far as to send an e-mail or page an engineer to inform him or her when the system is running in an unprotected state.

Figure 5. Expanding a RAID system while on-air. Click here to see an enlarged diagram.

Managing server operations

Another area in which creativity is applied to the design of a playout server is in the control system. In and of itself, a playout server can be considered to be a bit bucket — video goes in, and some time later, video comes out. External devices control both the ingest and playout of media, and this requires the use of both standard and expanded control protocols.

The lowest common denominators in the world of control protocols are the ubiquitous Video Disk Control Protocol (VDCP), originally developed by Louth (now Harris), and the Sony BVW protocol. In almost all cases, servers respond to some subset of the full gamut of commands offered by these protocols, usually limited by the technical capabilities of the server.

The art to server control is in the extended protocols, either via RS?422 (point-to-point) connectivity or over the network via Ethernet. Often referred to as a control Application Programming Interface (API), it is here that a server manufacturer will expose the most intimate levels of control access to the system, and it is here that the maximum functionality can be eked out of the server. For example, API calls may allow controllers to examine the amount of free space available on the server, set up a delay parameter to allow for changes in timezone, and setup the in and out points of a loop for looping playback.

Managed growth for server infrastructures

The science behind the creation of a multi-channel playout server is widely understood. The many servers on the market today offer different numbers of channels for different applications and price points. However, to ensure that customers are able to expand their existing systems as their needs grow, manufacturers must design their systems for expandability from day one, even if such expansion is achieved simply by adding another complete server to the original. The failure to consider this likelihood will result in the need to duplicate material on both servers, creating a media-management nightmare.

Again, there are many possible solutions for creating an expandable system, and each has its own advantages and disadvantages. For example, adding channels of I/O will sooner or later require the addition of another server chassis. If that second chassis is to access the same material as the first, the two will require a shared storage system. In some cases, the user may be required to change the storage topology of the system to accommodate such sharing, and material is then transferred from the single-access file system to this new shared-file system. In other cases, the storage subsystem anticipates future expansion of the channel count and is designed from day one to accommodate sharing. While this second approach is obviously less traumatic at the time of expansion, it may (depending on topology) carry a cost burden in the initial installation, when sharing is not yet needed. Storage Area Network (SAN) and Network Attached Storage (NAS) are common approaches, the difference being whether the storage is connected to the server (client) via Fibre Channel switches or Gigabit Ethernet.

Figure 5. Expanding a RAID system while on-air. Click here to see an enlarged diagram.

In a typical SAN configuration, the servers (which, in the case of broadcast servers are also the client) are connected to high-speed disk arrays via Fibre Channel switches. (See Figure 3.) The use of Fibre Channel gives extremely high bandwidth (each FC loop can run at up to 2Gb/s), but each client must have a Fibre Channel disk controller, along with a software management layer, to ensure that only one server is trying to write to any individual disk at any given time. As a result, SAN solutions are generally favored only for playout applications, where their high guaranteed bandwidth is of great value.

In a typical NAS configuration, the clients (most likely nonlinear editors or graphics devices) are attached to the storage via standard networking — usually Gigabit Ethernet. (See Figure 4.) The advantages here are that the clients do not need expensive Fibre Channel disk controllers. There is no need for the software management layer; because the disks are accessed from their own local server, there can only be one computer (that server) actually writing to the disk, and they are “infinitely expandable,” meaning that more clients can be added at any time.

The disadvantage is that you cannot easily guarantee real-time performance. All of the clients must share the available network bandwidth, which may result in some pauses or dropped frames during real time playback. Some of this can be mitigated by the design of the network, but at press time, NAS is usually used for applications where guaranteed real-time performance is not critical. It can easily be argued that a “best of both worlds” approach that embraces both technologies is required, offering SAN via token passing for real-time playout, and NAS to external clients such as nonlinear editors, appearing as “local” drives on which the material to be edited exists.

Figure 4. In a typical NAS topology, each client connects to multiple disk arrays via a Gigabit EtherNet network. No dedicated disk controllers or disk access control software are required on the client(s). Click here to see an enlarged diagram.

As technology and the market changes, the preferred video format and storage compression technologies may change. Recall the early days of playout servers, when video I/O was often baseband video. Today it is becoming more and more common in a number of applications to use DVB/ASI as the I/O format of choice. It is quite a challenge to design a system in which the I/O formats can be changed while leaving the main server infrastructure untouched. Furthermore, manufacturers have to allow for a mix of I/O types and manage flow of material such that a specific I/O card is only presented with data that it is capable of playing back.

The same situation applies to compression formats. Early servers were uncompressed or perhaps compressed via MJPEG. Now the industry has MPEG-2 in all its flavors, DV in all of its flavors, and other up-and-coming formats promising ever-greater storage efficiency without perceived quality loss. Constructing a single file system that is capable of storing and retrieving these various file formats at the same time is, indeed, an art.

The final piece in expanding a system is increasing its storage capacity. Upsizing a storage system that stripes data across a number of drives — and increasing that storage without having to take the system down or forcing a major storage rebuild — is not a trivial matter. Once again, the file system needs to be designed and implemented in a manner that allows storage to be added while the system remains on-air. In principle, the solution is deceptively simple: The file system needs to make sure that it fills up all of the drives in the existing array(s) before placing data into the newly added array(s). In practice, however, the manufacturer needs to ensure that making the connection between the server and the new storage does not break that same connection between the existing storage and the server itself. Redundant cabling, hub topologies, switch topologies or a mixture of both must, therefore, be in place for this to be feasible. (See Figure 5.) Note that this capability does not happen by itself. Part of the art of designing playout servers is to take this issue into account when considering the initial architecture of the file system.

Figure 5. Expanding a RAID system while on-air. Click here to see an enlarged diagram.

Conclusion

The playout server is the workhorse of the broadcast business. In modern facilities, it is expected to accommodate channel counts anywhere from one or a couple of channels to dozens, and in some cases even hundreds. It is expected to operate without interruption, 24x7x365. It is expected to survive component failures while remaining constantly available. It is even expected to report when it has a problem. It is expected to be designed in such a way that repairs can be made while the system continues to do its job, and to stay up even while additional components are integrated into the playout infrastructure. All these expectations are based on the simple fact that if the server goes down, the rest of the operation grinds to a halt. When considered all together, the many and varied new demands and expectations placed on these systems presents a serious challenge to design engineers. It is a challenge that is best addressed through creatively applying the most advanced science. And that is, indeed, an art.

Paul Turner is vice president of product management for Omneon Video Networks.