
Generates video with audio from combined multimodal references. Accepts text, images, audio, and video together as input to guide subject, motion, style, and sound in the output.
Generates video with audio from combined multimodal references. Accepts text, images, audio, and video together as input to guide subject, motion, style, and sound in the output.
Gemini Omni Flash runs as a hosted API endpoint on fal.ai (google/gemini-omni-flash/reference-to-video), offered under a commercial license. No infrastructure needed — call it per generation.