I’ve been wanting to play with AI video generation tools, so I decided to create a trailer for my PATH widget app. Here are some things I learned:
What model should I use?
I tried models from Google, Pika, and Runway. Personally, I found Google Veo to be the best. Pika had worse quality, and Runway had a very stingy free tier. This is slightly biased as I already had a Google Gemini subscription, and they also offer free API credits to new users.
I used Veo inside Flow (Google's video generation platform) but you can also access it inside Google AI Studio or directly with an api key. Flow has better scaffolding so it's easier to use, but it's also token based and rate-limited.
Why doesn't it work?
Like all AI tools, breaking down the task generally delivers better results. I found that motion and transition can introduce very unrealistic results, so I tried to separate my video into discrete scenes. Treat this as a storyboard exercise, and generate still images for each scene. Since each generated clip has a duration limit (often less than 10 seconds), these images can also help different clips to retain consistent figures and backgrounds.


For generating still images, I used both Sora from OpenAI and Gemini's NanoBanana Pro. (It's really important to use Pro as the results are much better!) These models are very similar, but Sora has a generous free tier so I would first try to max that out.
Finally, a good prompt goes a long way. Make sure to be succinct but also prescriptive about what you want and what to avoid. I used ChatGPT and Claude to help me write these, and tweaked them until I got the best results. I generally had a longer prompt for image generation since these were for generating scenes from scratch, whereas my video prompts are shorter as they are simply animating the image. Below are some examples:
Scene with girl holding camera (photo)
A cinematic, vertical phone-oriented still image. A young woman (same woman as in reference image) in her twenties sitting by a train window on a Japanese train. She has straight black hair, wearing a simple white t-shirt. She sits in side profile in the middle section of the train car — her body faces to the right of the frame, looking slightly downward or forward out the window, contemplative and still. She holds a film camera (compact SLR) resting in her lap with both hands. Her expression is peaceful, quietly content. Behind her: other train seats continuing down the car, not a back wall. Blue-gray fabric seats, beige or gray-green train interior walls. The composition shows depth — she's seated mid-car, not at the end. The train interior is muted and atmospheric — subdued lighting, overcast or late afternoon light. Slightly desaturated, film-like color palette. Through the large window beside her: a grassy spring field in soft motion blur beginning to streak past. Muted greens with scattered white wildflowers (daisies or similar small white blooms) creating gentle white speckles throughout. The field is natural and meadow-like, not rice paddies. Soft, painterly, pastoral. Diffused natural light suggesting overcast spring weather. Shot on 35mm film with organic grain, muted contrast, slightly underexposed. Shallow depth of field on her figure. Color grading: cooler tones, desaturated, with subtle warmth only from her skin and the white flowers outside. Vertical 9:16 aspect ratio. Art direction inspired by A24 and Japanese cinema — quiet, contemplative, emotionally understated. This moment transitions naturally into cherry blossom motion blur. No text, no logos.
Scene with girl holding camera (video)
Vertical 9:16 video, 3-4 seconds. Match reference image exactly. CAMERA: Locked-off, static shot. Only subtle train rocking. ACTION: Woman in side profile on Japanese train, film camera already held in both hands at lap level. She smoothly turns her upper body and head toward the window. Arms stay in same position - NO lifting motion. Just a gentle rotation to face the window. Movement is calm, natural, deliberate. Ends facing the window, camera still at lap level. SCENE: Japanese train, blue-gray seats, beige walls. Large window with spring meadow motion blur outside - muted greens with white wildflower speckles. COLOR: Muted warmth, slightly desaturated, soft diffused window light, medium film grain. 35mm film aesthetic. TECHNICAL: Shallow depth of field, natural exposure, organic grain, soft quality. Movement duration: 2-3 seconds of smooth turning, then hold position briefly.
How to put everything together?
Flow offers a very simple editing tool that allows you to stitch the generated clips together. However, I used OneShot on my iPad as I wanted to add some text as well as background music.
I need to find a better computer based app soon, as lining up photos and granular controls are a pain on OneShot. A lot of people talk about DaVinci Resolve and CapCut, so I want to try those soon!