• 10 mins read

A Breakthrough in AI Video Consistency

I never thought we would direct a live-action video for a leading dental brand without a single day on set. Yet here we are – our team at Now We Collide just delivered a cinematic-quality series of videos using our GenAI. This wasn’t a sci-fi experiment or hype for hype’s sake; it was a real client campaign, completed in weeks instead of months.

I’m excited to share how our Collide-AI post pipeline – using a quirky tool called Nano Banana alongside OpenAI’s Sora and Google’s Veo 3 – made it possible. In this post, I’ll pull back the curtain on how we achieved shot-to-shot consistency in AI-generated video and what it means for agencies, brands and the future of content production.

 


The Character Consistency Problem

Every video producer knows the pain points: desired production and creative value, cost and time. Our dental client needed a series of short product videos with consistent characters, settings, scenes and props across multiple shots – think a friendly dentist in a clinic, close-ups of a patient, an explainer graphic and product pack-shots. Typically, we’d shoot this live. But due to tight timelines and logistical hurdles, live-action simply wasn’t an option.

The catch? Generative AI video has a notorious consistency problem. One scene might render our dentist with different facial features or the clinic decor changing colour shot-to-shot – obviously a brand safety nightmare. Early text-to-video models struggled to maintain the same character or environment across cuts. We also had to ensure the output met strict brand guidelines (free from unusual AI artefacts or off-brand elements) and complied with regulatory limits (no unproven product claims in visuals). The challenge was clear: how could we leverage AI’s speed and creativity without sacrificing continuity or quality?

Talking_to_dentist_202509050918

The Breakthrough

Our breakthrough came when we combined multiple AI tools in a new way – and added some old-school production discipline to the mix. We treated the AI models as “junior directors” that needed guidance. We started by building a “character bible” for our hero dentist – a reference image set locking down his appearance, wardrobe, and even lighting. Likewise, we created a “scene bible” for the clinic and props. These weren’t just moodboards; they became the ground truth images we would feed into the generative models to anchor their outputs.

We then mapped out a shot graph – essentially a storyboard of the sequence – and carefully crafted text prompts for each shot, referencing our hero character and set pieces by name. Crucially, we used seed control (fixing random seeds) and image references so the AI started from the same visual DNA each time. Picture a “consistency stack” diagram: at the base are our character and scene bibles, above them a shot-by-shot plan, all feeding into the AI generator with consistent seeds. This stack ensured that as we generated each scene, the dentist looked like himself in every frame and the clinic setting didn’t magically morph. Shot-to-shot continuity – solved.

When the first fully AI-generated draft came through with a cohesive look and narrative flow, it felt like a moon landing. We had a smiling dentist character who remained identical across an entire 30-second spot – same face, same outfit, same office – without re-shoots or VFX tweaks. Cinematic framing and camera moves were on point, too. And thanks to careful prompting, even the product packaging appeared accurately in the final pack-shot, with no misspelt labels or off-brand colours. For the first time, we saw AI video consistency reach a level suitable for a brand ad. It was a genuine “aha” moment for our creative team and the client alike.

 

 

 

The Toolchain

Achieving this required a carefully chosen toolchain – each AI tool played a specific role in our pipeline. Here’s what we used, and why:

What we used:

  • Nano Banana – Google DeepMind’s code name for an advanced image generation & editing model (officially part of Gemini 2.5 Flash Image). We used Nano Banana to design and refine our key visuals, like the hero character’s look and the clinic set, and ensure visual consistency across imagesimagine.artimagine.art. It excels at maintaining image integrity through edits, so our character’s likeness stayed spot-on from one frame to the next.

  • Sora – OpenAI’s text-to-video model official page. We leveraged Sora to animate our still images into motion. Sora can generate up to one-minute videos with impressive fidelityopenai.com, and importantly, it accepts image inputs along with text prompts. By feeding our Nano Banana-crafted key frames into Sora, we let it handle the heavy lifting of turning a static scene into live-action – while preserving the details and continuity we establishedopenai.com. Notably, Sora supports multiple shots in one video and can persist characters and visual style across themopenai.com, a feature we definitely put to use.

  • Veo 3 – Google DeepMind’s latest generative video model official page. We tapped Veo 3 for its strengths in realistic motion and audio generation. Veo 3 produces hyper-realistic footage with proper physics and can natively generate sound effects and dialogue in sync with the videodeepmind.googleblog.google. In our project, Veo 3 helped us add lifelike touches: the subtle camera shake of a handheld clinic shot, the rustle of a lab coat, the ambient hum of a dental office – all AI-generated. It gave us a cinematic polish and audio design without a soundstage or Foley artist. Veo also introduced new levels of control and consistency; its prompt adherence and “creative control” tools ensured the output stayed on-script and on-branddeepmind.googledeepmind.google.

Using these in concert, we essentially built a virtual production studio in the cloud. Nano Banana provided the visual brain (ensuring each frame had the right look), Sora the animation engine (bringing those frames to life in sequence), and Veo 3 the finishing kit (enhancing realism and sound). The result: a brand-safe, high-quality video produced at a fraction of the usual time and cost – with consistency that made it feel entirely live-action to the viewer.

 

Smiling_202509050920

Pipeline

How did all these pieces come together in practice? Our Collide-AI pipeline closely mirrored a traditional production workflow, with a twist of AI at its core. We kept our creative process practitioner-led – our human team remained the directors and quality controllers at every step. Here’s how a project like this flows from brief to final QC:

Pipeline at a Glance:

  1. Brief & Concept – We kicked off with the client brief and creative concepting, as usual. No AI here – just aligning on the message (e.g. “introduce new dental device”), key visuals, and tone.

  2. Script & Storyboard – We wrote a script and sketched storyboards for ~5 key shots (hero dentist intro, instrument close-up, explainer graphic, product shot, etc.), ensuring it told a coherent story. These storyboards doubled as a shot graph for AI prompting.

  3. Look Development – Using Nano Banana, we generated high-res concept images: the dentist character (right down to his friendly eyes and blue scrubs), the clinic background, the product package render. We iterated until the client said “Yes, that’s our brand!”. These images became our character & scene bible.

  4. Prompt Engineering – For each shot, we crafted detailed prompts blending text and reference images. For example, “Medium close-up of [DentistReference] in [ClinicReference], warm lighting, smiling as he explains...” and so on. We fixed random seeds to maintain continuity and used consistent descriptors across prompts (e.g. same wardrobe keywords).

  5. Generative Production – We ran the prompts through Sora and Veo 3. In some cases, Sora animated a single image into a short clipopenai.com. In others, Veo 3 took text + image to generate a scene with sound. We experimented and chose the best outputs for each segment. The raw AI footage was generated in minutes per shot, then reviewed.

  6. Feedback & Refinement – Just like a director might call for another take, we did multiple AI “takes.” If the dentist’s hair changed or a prop looked off, we adjusted prompts or fed in the last good frame to nudge the model. Nano Banana’s consistency strengths helped here – we could tweak one frame and regenerate, and the model kept the rest intactimagine.art. This loop continued until each shot met our standards.

  7. Post-Production – We trimmed and sequenced the AI clips according to the edit plan. Minor touch-ups were done on a few frames (yes, sometimes you still need a bit of Photoshop for stray glitches). Veo 3’s native audio was supplemented with a background music bed and a professional voice-over for the narration. We also added captions and the brand’s animated logo in the end-board.

In short, our AI pipeline compressed what normally might be a two-month project into a matter of weeks. We moved from ideation to final cut in under a month, and at perhaps one-third the cost of a live-action shoot. The client got their content faster, without compromising on quality or brand consistency. And our creative team? We got a taste of a new, more agile way of working – one where we could try wild ideas (want an alternate background? just prompt it) without expensive re-shoots. It’s not less work, just different work – more upfront planning and iterating with AI, but far less logistics and physical constraints.

 

Implications

As a CCO who’s spent decades producing content the traditional way, this project was eye-opening. The implications for agencies and brands are huge. AI video production isn’t a novelty anymore; it’s a viable option when traditional shoots are impractical or impossible. We’ve proven that with the right approach, AI can deliver scene consistency and character consistency on par with a live shoot – which has been a major barrier until now.

For agencies, this means we can be more nimble in serving client needs. Imagine responding to a client brief by generating a content prototype in 48 hours – something unthinkable with physical production. We can also tackle projects that were once budget-prohibitive. Need 100 personalised video variations for a campaign? With AI, scaling up doesn’t linearly scale the cost like it would with 100 separate shoots.

Brands, on the other hand, gain a creative superpower: the ability to produce brand-safe AI content quickly without sacrificing quality. It’s important to note, this doesn’t eliminate the need for live action or real creative work – but it adds a powerful tool to the arsenal. For instance, sensitive scenarios (like a medical demonstration that’s hard to film) or imaginative storytelling (fantastical visuals, global settings) become easier to execute. We’re already discussing with clients how AI pipelines could create content libraries for always-on social media, international market versions, or rapid-response topical ads – things that used to be constrained by production lead times.

However, agencies will need to evolve. Our producers and art directors had to become prompt designers and AI curators. We spent as much time art directing the AI as we might with a human crew. This is a new skill set, and it will be in high demand. We also learned that cross-functional collaboration (creative, tech, strategy, legal) is key – you need everyone aligned when you venture into AI-driven production.

Bottom line: AI isn’t coming for filmmakers; it’s collaborating with them. The agencies and brands that embrace this, with the right ethical guardrails (more on that next), will be able to tell stories in ways and at speeds we never could before. The future of production will be a hybrid one – and after this project, I’m convinced that’s a good thing.

 


If you’re as intrigued as we were about the potential of AI in video production, let’s talk. We’re offering hands-on demos and workshops for brands and partners to experience the Collide-AI pipeline first-hand. Come see how a brand-safe, cinematic AI video comes together, and imagine what it could do for your content strategy. Whether you have a campaign that can’t wait for a full shoot, or you’re just curious about the workflow, we’d love to collide ideas and explore the possibilities with you. Reach out to set up a demo – the future of film-making is here, and it’s time to get your teams ready for it.