Six rules for AI video creative that compound across accounts

Most teams treat AI video like a toy. Generate a clip, hope the model gets the prompt right, regenerate when it does not. The credit meter runs. The deadline slips. Half the renders go in the trash because the phone screen glitched or the second character grew a third arm.

We treat AI video like a production pipeline. The same way we treat paid acquisition. Rules at the front, receipts at the back, no surprises in the middle.

Six rules came out of our last 10-spot sprint on Seedance 2 and GPT Image 2. Each one prevents a failure mode we paid for the first time so the next account never does. The rules compound across accounts the same way the rest of the system does. Every spot we ship trains the playbook. Every account inherits what worked.

One. Never show a phone screen.

AI text rendering on phone screens fails reliably across every video model we tested. UI text, form fields, app icons, button labels all glitch with garbled letters or misshapen elements.

The fix has a hierarchy. Best case, the phone is a magic-trigger object, not a screen-display object. Show only the back of the phone or use the phone as a prop that dispenses something. Second best, phone held in hand with the screen facing the character and the back facing the camera. Third, phone face-down on a table.

Negative-prompt the failure explicitly on every spec. No phone screen visible. No UI shapes, form fields, or buttons. No text rendered on the phone screen. The model will hallucinate a UI if you give it room. The result is a frame that cannot run in a paid ad.

The reason this rule is locked is that it took three full render passes in one of our recent spots before we gave up trying to coax a clean UI out of the model. Phone face-down worked. Phone in hand with screen away worked. Phone as a dispenser worked beautifully. Phone with a visible screen never worked once across either model.

Two. Stylize all currency. Never photoreal.

Photoreal US currency in AI renders produces malformed presidential faces, garbled serial numbers, and off-pattern Federal Reserve design. The model cannot render a clean dollar bill.

The fix is to spec stylized non-photoreal bills from the prompt. Generic green rectangular paper-bill designs with vague bill-like markings. No presidential face detail. No readable serial numbers. No recognizable US Treasury layout. The result reads as cash without crossing the uncanny line where the viewer notices the face is wrong.

If the stylized bills still malform, fall back to stylized golden coins or abstract green confetti. Both options read clean. Both avoid the dollar-bill failure mode entirely.

The same principle applies to any branded paper or official document on screen. Government forms, contracts, credit cards with readable numbers, anything with fine-grained text on a printed surface. Stylize the asset or hide the detail.

Three. Physically separate multi-character scenes.

Multi-character AI render is the highest-risk shot in the stack. Hand-touching interactions produce phantom limbs, face-shift mid-render, and character-bleed between figures.

The mitigation hierarchy goes safest to riskiest. Glass-pane separation is the safest. Two characters separated by a window or partition. No hand contact, no physics risk between the figures. Off-frame voice is second safest. One character on camera, one character voice-only with an arm visible at the edge of frame. Multi-character stampedes with depth layering are manageable when the front layer is detailed and the back layer is stylized silhouettes with no hand interactions.

The hard rule is to avoid two characters in close proximity with hand-touching. High-fives, fist bumps, shoulder pats, hand-passing-objects. The model cannot resolve the physics cleanly. The render produces phantom hands, twisted wrists, or a third arm. Re-block the scene to avoid touch before you spend a render on it.

Four. Match audio register to format.

The audio register depends on the format and the two are not interchangeable.

UGC peer-energy spots need the captured-real-world signature. Proximity warmth. Audible breath on the mic. Slight vocal fry on the tails. The audio sounds like a real person held a phone up and recorded themselves. That register is the entire reason UGC works.

Narrative-comedy with a brand narrator over a character punchline needs smooth radio voice. Polished FM-DJ quality. Light studio booth reverb. Clean recording. Minimal mouth sounds. Minimal vocal fry. Minimal audible breath. Reference vibes that work include Dollar Shave Club deadpan-smooth, Liev Schreiber narration, George Clooney voice-for-hire.

We learned this rule the wrong way around. The first spot in a narrative-comedy format used the UGC audio profile we already had locked. The render came back amateur. The captured-real-world signature works beautifully on a phone selfie. On a narrative spot with polished visuals, the same signature reads as a bad podcast guest, not as a professional narrator.

Spec the register before you render the audio. Selfie testimonial, peer-to-peer, ugc framing? Captured-real-world. Narrative-comedy with a brand-narrator? Smooth radio.

Five. Use continuation, not separate first frames.

The naive workflow is to generate one first-frame image per shot, render each shot independently, cut them together in post. The result is a visible jump between every cut. Lighting shifts, character clothing changes, the background loses a prop. The viewer reads the cuts as glitches.

The continuation workflow fixes this. Render shot one with a generated first-frame image. Render shot two by uploading the full shot-one video as the start-state reference. Seedance pixel-locks to the last frame of shot one and begins shot two from that exact state. The cut becomes invisible. The narrative reads continuous.

For shots with major visual transitions, upload both the prior shot and a generated end-frame as the target. The model interpolates the motion between them.

The other benefit is cost. Only shot one needs a generated first frame. Every shot after it inherits from the previous render. On a 5-shot piece that is four image generations saved. Across a 10-spot slate the saving compounds. The workflow rule does double duty. Better continuity, lower spend per spot.

Six. The absurdist-delivery format works for finance and B2B.

Comedic creative is the harder ask for high-ticket clients. Most agencies default to talking-head explainers because comedy reads as risk. The absurdist-delivery format unlocks comedy without putting the brand in a sketch.

The delivery mechanic itself is the joke. An unexpected object or character brings the outcome. Deadpan reaction. Brand narrator bookends the visual gag. The joke carries 80 percent of the spot. The brand lands the words.

Three of our recent spots used the format and all three worked. A phone that ejected cash like a slot machine. A stampede of lenders breaking down a door. A bank that stalked a small business owner like a clingy ex outside the kitchen window. Each rendered as a single visual gag. Each carried its meaning without a single line of dialogue from the hero character.

The template structure is simple. 15 to 30 seconds. Single shot or short multi-shot. Magic-realism delivery mechanic. Deadpan character reaction, never theatrical. Brand narrator voiceover bookends. The visual gag carries the spot. The words land the brand.

The format is generalizable beyond financial services. Any brand with a clear differentiator that can be rendered as an absurd delivery mechanic can use it. The trick is to find the differentiator that can be visualized literally, then render it at scale.

Why these rules compound.

Most AI video work today is one-off. Generate a clip. Ship the clip. Move on. The next clip starts from scratch because nobody wrote down what failed.

These six rules are the start of the playbook. Every new account inherits them on day one. Every new spot ships against a checklist of failure modes already mapped. The first spot in a new format still takes longer because the format is new. The fifth spot in that format ships in half the time because every prior failure is already negative-prompted out.

That is the same compounding logic we run against every other piece of the system. The buyer document gets sharper every month. The creative library gets deeper every month. The video production playbook gets stricter every month. Every account makes the next one better. Knowledge does not walk out the door.

The agencies still treating AI video like a toy will keep paying the render meter to relearn the same six lessons. The teams treating it like a production pipeline will ship the same volume at a fraction of the cost. That gap is the whole game.

Six rules for AI video creative that compound across accounts.