As expected after days of leaks and rumors online, Google has unveiled Veo 3.1, its latest AI video generation model, bringing a suite of creative and technical upgrades aimed at improving narrative control, audio integration, and realism in AI-generated video.
While the updates expand possibilities for hobbyists and content creators using Google’s online AI creation app, Flow, the release also signals a growing opportunity for enterprises, developers, and creative teams seeking scalable, customizable video tools.
The quality is higher, the physics better, the pricing the same as before, and the control and editing features more robust and varied.
My initial tests showed it to be a powerful and performant model that immediately delights with each generation. However, the look is more cinematic, polished and a little more “artificial” than by default than rivals such as OpenAI’s new Sora 2, released late last month, which may or may not be what a particular user is going after (Sora excels at handheld and “candid” style videos).
Expanded Control Over Narrative and Audio
Veo 3.1 builds on its predecessor, Veo 3 (released back in May 2025) with enhanced support for dialogue, ambient sound, and other audio effects.
Native audio generation is now available across several key features in Flow, including “Frames to Video,” “Ingredients to Video,” and “Extend,” which give users the ability to, respectively: turn still images into video; use items, characters and objects from multiple images in a single video; and generate longer clips than the initial 8 seconds, to more than 30 seconds or even 1+ plus when continuing from a prior clip’s final frame.
Before, you had to add audio manually after using these features.
This addition gives users greater command over tone, emotion, and storytelling — capabilities that have previously required post-production work.
In enterprise contexts, this level of control may reduce the need for separate audio pipelines, offering an integrated way to create training content, marketing videos, or digital experiences with synchronized sound and visuals.
Google noted in a blog post that the updates reflect user feedback calling for deeper artistic control and improved audio support. Gallegos emphasizes the importance of making edits and refinements possible directly in Flow, without reworking scenes from scratch.
Richer Inputs and Editing Capabilities
With Veo 3.1, Google introduces support for multiple input types and more granular control over generated outputs. The model accepts text prompts, images, and video clips as input, and also supports:
-
Reference images (up to three) to guide appearance and style in the final output
-
First and last frame interpolation to generate seamless scenes between fixed endpoints
-
Scene extension that continues a video’s action or motion beyond its current duration
These tools aim to give enterprise users a way to fine-tune the look and feel of their content—useful for brand consistency or adherence to creative briefs.
Additional capabilities like “Insert” (add objects to scenes) and “Remove” (delete elements or characters) are also being introduced, though not all are immediately available through the Gemini API.
Deployment Across Platforms
Veo 3.1 is accessible through several of Google’s existing AI services:
-
Flow, Google’s own interface for AI-assisted filmmaking
-
Gemini API, targeted at developers building video capabilities into applications
-
Vertex AI, where enterprise integration will soon support Veo’s “Scene Extension” and other key features
Availability through these platforms allows enterprise customers to choose the right environment—GUI-based or programmatic—based on their teams and workflows.
Pricing and Access
The Veo 3.1 model is currently in preview and available only on the paid tier of the Gemini API. The cost structure is the same as Veo 3, the preceding generation of AI video models from Google.
-
Standard model: $0.40 per second of video
-
Fast model: $0.15 per second
There is no free tier, and users are charged only if a video is successfully generated. This model is consistent with previous Veo versions and provides predictable pricing for budget-conscious enterprise teams.
Technical Specs and Output Control
Veo 3.1 outputs video at 720p or 1080p resolution, with a 24 fps frame rate.
Duration options include 4, 6, or 8 seconds from a text prompt or uploaded images, with the ability to extend videos up to 148 seconds (more than 2 and half minutes!) when using the “Extend” feature.
New functionality also includes tighter control over subjects and environments. For example, enterprises can upload a product image or visual reference, and Veo 3.1 will generate scenes that preserve its appearance and stylistic cues across the video. This could streamline creative production pipelines for retail, advertising, and virtual content production teams.
Initial Reactions
The broader creator and developer community has responded to Veo 3.1’s launch with a mix of optimism and tempered critique—particularly when comparing it to rival models like OpenAI’s Sora 2.
Matt Shumer, an AI founder of Otherside AI/Hyperwrite, and early adopter, described his initial reaction as “disappointment,” noting that Veo 3.1 is “noticeably worse than Sora 2” and also “quite a bit more expensive.”
However, he acknowledged that Google’s tooling—such as support for references and scene extension—is a bright spot in the release.
Travis Davids, a 3D digital artist and AI content creator, echoed some of that sentiment. While he noted improvements in audio quality, particularly in sound effects and dialogue, he raised concerns about limitations that remain in the system.
These include the lack of custom voice support, an inability to select generated voices directly, and the continued cap at 8-second generations—despite some public claims about longer outputs.
Davids also pointed out that character consistency across changing camera angles still requires careful prompting, whereas other models like Sora 2 handle this more automatically. He questioned the absence of 1080p resolution for users on paid tiers like Flow Pro and expressed skepticism over feature parity.
On the more positive end, @kimmonismus, an AI newsletter writer, stated that “Veo 3.1 is amazing,” though still concluded that OpenAI’s latest model remains preferable overall.
Collectively, these early impressions suggest that while Veo 3.1 delivers meaningful tooling enhancements and new creative control features, expectations have shifted as competitors raise the bar on both quality and usability.
Adoption and Scale
Since launching Flow five months ago, Google says over 275 million videos have been generated across various Veo models.
The pace of adoption suggests significant interest not only from individuals but also from developers and businesses experimenting with automated content creation.
Thomas Iljic, Director of Product Management at Google Labs, highlights that Veo 3.1’s release brings capabilities closer to how human filmmakers plan and shoot. These include scene composition, continuity across shots, and coordinated audio—all areas that enterprises increasingly look to automate or streamline.
Safety and Responsible AI Use
Videos generated with Veo 3.1 are watermarked using Google’s SynthID technology, which embeds an imperceptible identifier to signal that the content is AI-generated.
Google applies safety filters and moderation across its APIs to help minimize privacy and copyright risks. Generated content is stored temporarily and deleted after two days unless downloaded.
For developers and enterprises, these features provide reassurance around provenance and compliance—critical in regulated or brand-sensitive industries.
Where Veo 3.1 Stands Among a Crowded AI Video Model Space
Veo 3.1 is not just an iteration on prior models—it represents a deeper integration of multimodal inputs, storytelling control, and enterprise-level tooling. While creative professionals may see immediate benefits in editing workflows and fidelity, businesses exploring automation in training, advertising, or virtual experiences may find even greater value in the model’s composability and API support.
The early user feedback highlights that while Veo 3.1 offers valuable tooling, expectations around realism, voice control, and generation length are evolving rapidly. As Google expands access through Vertex AI and continues refining Veo, its competitive positioning in enterprise video generation will hinge on how quickly these user pain points are addressed.