Reshaping Media Workflows: How Multimodal and Generative AI Impact Video Storytelling

Moments Lab AI news image
(Image credit: Moments Lab)

A senior executive at a 24/7 news operation in New York recently shared his biggest concern: managing the ever-growing demand for content with fewer team members as the industry grapples with ongoing layoffs. Currently, it takes a producer an average of five minutes to locate a specific shot in an organization’s vast media library. For a 10-minute story package, that adds up to an entire eight-hour workday spent just gathering the clips needed for a rough cut. This is a process that’s becoming unsustainable with a reduced workforce.

With rapid advancements in technology, the days of working at this pace and under this burden with legacy systems are quickly coming to an end. Multimodal and generative AI are transforming media workflows, reducing content discovery times in some cases from eight hours to a matter of minutes, significantly accelerating story creation.

Enhancing Access and Collaboration
Cloud computing makes it possible to access digitized media libraries remotely, connecting previously siloed media departments and enabling real-time cross-team collaboration. However, the biggest paradigm shift for content sourcing and discovery in recent years has been multimodal and generative AI (GenAI).

Multimodal AI is a type of machine learning designed to mimic human perception. It differs from the more traditional, unimodal AI in that it ingests and processes multiple data sources including video, still images, speech, sound and text to achieve a more detailed and nuanced understanding of media content. The best-known example of GenAI is ChatGPT, which is used regularly now to answer questions and brainstorm ideas.

When used for media indexing, multimodal AI analyzes a video from all angles—recognizing faces, reading on-screen text, logos, landmarks, objects, actions, shot types and transcription to generate a semantic description. This allows content producers to search media-management systems for exact clips rather than full video files, dive into details such as shot types, scene summaries and the most compelling sound bites identified by AI. In essence, multimodal AI produces metadata on steroids, giving content teams a real edge, especially in live reporting scenarios such as covering the U.S. elections, where speed is critical to capturing key moments, clipping and editing stories together to publish first.

The deep-search experience enabled through multimodal AI also opens the possibility of creating niche content packages and collections around specific themes or genres to satisfy a variety of audiences and advertisers.

Slashing Production Costs
The media industry is not yet at the point of creating full-length blockbuster movies entirely with AI, but many GenAI applications are already proving to be a game-changer in pre- and postproduction, and more are swiftly coming down the line.

Lionsgate recently inked a deal with Runway to create and train a new model that enables its creatives to generate cinematic video. The Hollywood studio expects to save millions of dollars by “augmenting, enhancing and supplementing” its current operations with GenAI.

Last month at the Tokyo International Film Festival, film and tech leaders highlighted AI’s potential to deliver millions of dollars in production savings by dramatically reducing the traditional costs involved in location shoots.

Production companies are under increasing pressure to create more compelling content with less. Mixed ad-revenue results for linear TV networks and streamers have seen them slash content budgets and pull back from commissioning new shows. Multimodal and GenAI enable deeper exploration of vast media archives, unearthing never-before-seen footage that’s ripe for repurposing: think new docuseries, behind-the-scenes and best of specials, establishing new sources of revenue that don’t require expensive shoots.

In Tom Hanks’ latest film, “Here,” visual effects startup Metaphysic applied GenAI to age the actor up and down from age 18 to 80 — work that traditionally requires hundreds of artists and months to complete.

Prompt-driven experiences will soon allow for more accessibility and efficiency when building a rough cut. Content producers can simply tell the GenAI what type of story they’re wanting to create, and it will automatically scan and select clips in their media collections that align with the narrative. AI prompting can also be used to efficiently filter content to assist with quality control and compliance. Commands such as “find scenes with mature content” help editors to isolate and review specific video elements that may need to be modified or removed to meet audience standards in certain geographies.

Uncovering Revenue in Archived Content
AI’s ability to efficiently analyze and index large amounts of archive media is like holding the key to Aladdin’s cave. Film footage licensing can command close to $10,000 for 60 seconds. Comprehensively indexing a media organization’s hundreds of thousands of hours of video archives would take human loggers more than a lifetime. Multimodal and GenAI revolutionize the process not only in relation to indexing speed, but in how the technology can help to prioritize tapes for digitization, sale and reuse. Advanced AI models are showing enormous potential in their ability to accurately identify what’s on a physical tape purely by scanning paper labels and run sheets. With this approach, tapes with the highest potential for resale and reuse can be prioritized in large archive digitization projects.

The media and entertainment industry is going through a period of great change, and as restructuring continues, legacy content workflows and systems will become increasingly untenable. Advancements in multimodal and GenAI offer exciting opportunities for organizations to transform their processes so they can create more with less, uncover hidden content gems in their archives and establish new sources of revenue to drive future growth.

TOPICS
CATEGORIES
James Fraser
VP of U.S. Sales, Moments Lab

James Fraser is a skilled sales and management professional with more than 15 years of experience in the broadcast and media industry. He is accomplished in driving business development and securing key partnerships. As vice president of U.S. sales at Moments Lab (formerly Newsbridge), James leverages his deep industry knowledge and expertise in AI technologies to manage U.S. operations and achieve notable sales growth. Prior to joining Moments Lab, James refined his abilities in client acquisition, team leadership and strategic market expansion at ES Broadcast and WTS Broadcast.