AI ModelNew

Grok Imagine Video 1.5 — AI Image-to-Video Generator

Feed it a photo, tell it how the shot should move, and Grok Imagine Video 1.5 hands you a clip. xAI's image-to-video model, ready to run.

Grok Imagine Video 1.5 is xAI's image-to-video model, and it answers a very specific need: you already have the perfect frame — now you want it to move. Hand it a single still and one line of direction, and it animates that exact image, keeping your composition, subject, and style intact while adding motion. The model reads the picture and the prompt together, then infers how the scene should behave: how the subject acts, how light shifts across surfaces, and where the virtual camera travels. Clips run 1 to 15 seconds (6 by default) at 480p or 720p, and the prompt is your director's chair — you call the camera move and the action in plain words. No timeline, no keyframes, no editing suite. A photo, a sentence, and a finished clip a few moments later.

How it works

  1. 1

    Upload a still image

    Start with one clear, well-lit photo or render — the frame you want to bring to life becomes the first frame of the video.

  2. 2

    Write a prompt

    Describe the motion and the camera move in plain language, like "slow push-in, hair drifting in the wind, warm light shifting."

  3. 3

    Pick length and resolution

    Choose a duration from 1 to 15 seconds (6 is the default) and render at 480p for speed or 720p for the sharper cut.

  4. 4

    Generate and iterate

    Hit generate, watch the clip come back in moments, then tweak the prompt or camera direction and re-run until the shot lands.

Key features

Animates the exact frame you give it

Because it starts from your still instead of generating from scratch, the composition, character, and color you already approved survive — Grok adds motion without redrawing the shot.

The prompt is the camera operator

Write "slow dolly push-in" or "orbit left, tilt up" and the model interprets it as a real camera move, layering the subject's action on top so a static photo reads like a directed shot.

1 to 15 seconds, your call

Six seconds is the default sweet spot, but drop to a one-second loop for a logo sting or stretch to fifteen for B-roll, paid ads, and longer social cuts.

Draft at 480p, finish at 720p

Run cheap, fast 480p passes to find the motion that works, then re-render the winner at 720p HD — same prompt, sharper output.

Subject-agnostic input

Headshots, packshots, AI renders, hand-drawn illustration, landscapes — any clean, well-lit image with a clear subject is fair game for the animation pass.

Built for iteration

Renders return quickly enough that you can try three different camera directions on the same still and keep only the take that lands, instead of committing to one guess.

See it in action

One still, one prompt, and the shot starts moving — camera and all.
Portraits, products, illustrations: a few of the things people have animated here.

Technical specs

Resolution
480p · 720p
Duration
1–15s (default 6s)
Input
Single image + text prompt
Output
Video clip (MP4)

Use cases

Portrait animation

Bring a headshot or character render to life with subtle breathing, a blink, and a slow camera push for a living-portrait effect.

Product spots

Turn a single product photo into a rotating, light-catching hero shot for a landing page or paid ad — no studio session required.

Social clips

Spin a still illustration or photo into a snappy vertical reel with motion that stops the scroll on TikTok, Reels, and Shorts.

Story sequences

Animate a series of stills into short beats you can cut together into a visual narrative or moody mood film.

Storyboard to motion

Hand a static storyboard frame the camera move you imagined and preview how the shot actually reads in motion before you shoot.

Prompt examples

Slow push-in portrait
Slow dolly push-in on her face, hair drifting gently in the breeze, soft shifting light, shallow depth of field, subtle blink.
Product hero rotation
The sneaker rotates slowly on a turntable, studio key light sweeping across the surface, faint floating dust, clean reflection below.
Cinematic landscape reveal
Camera tilts up from the valley floor to reveal the mountains, low clouds rolling through, golden-hour light, slow majestic pace.
Street scene with life
Handheld camera drifts forward down the rainy street, neon reflections shimmering on the wet pavement, people walking past in the background.
Illustration comes alive
The painted koi swim slowly across the pond, ripples spreading outward, lily pads bobbing, gentle parallax as the camera glides right.

Plans & pricing

Included in plans from $4.99

Every plan unlocks this model — no extra fees per model.

Coral

$4.99/ mo
See plans
  • Every plan unlocks this model — no extra fees per model.
  • Credits are shared across all models. Pick a plan and use them however you like.

Garra Pro

$9.99/ mo
See plans
  • Every plan unlocks this model — no extra fees per model.
  • Credits are shared across all models. Pick a plan and use them however you like.

Abissal Studio

$59.99/ mo
See plans
  • Every plan unlocks this model — no extra fees per model.
  • Credits are shared across all models. Pick a plan and use them however you like.

Frequently asked questions

What is Grok Imagine Video 1.5?

It's xAI's image-to-video model. You give it one still image and a short text prompt, and it generates a video clip — animating your photo with the motion and camera movement you described, rather than inventing a scene from scratch.

How do I animate an image into a video?

Upload a single still, then write a line describing the motion and camera move you want — something like "slow zoom in, hair blowing in the wind." The model treats your image as the first frame, animates it, and returns a clip of 1 to 15 seconds.

Is this text-to-video or image-to-video?

Image-to-video. It always needs a source image to start from — the prompt directs the motion, but the picture defines the subject, composition, and style.

How long can the videos be?

Anywhere from 1 to 15 seconds, with 6 as the default. Keep it short for loops and social posts, or go longer for B-roll and ads.

What resolutions does it support?

480p and 720p. Draft at 480p to move fast and cheap, then render the keeper at 720p HD for a sharper, more polished result.

Does it generate sound?

No. Grok Imagine Video 1.5 produces silent video; add voiceover, music, or sound effects in your editor afterward.

Can I control the camera and the motion?

Yes — that's the whole point. Your prompt drives everything: pans, zooms, tilts, orbits, and exactly how the subject moves in the shot.

What kind of images work best?

Clear, well-lit stills with an obvious subject. Photos, portraits, product shots, AI art, and illustrations all animate well, and a sharp, high-resolution input gives the model the most to work with, so the motion comes back more believable.

More about Grok Imagine Video 1.5 — AI Image-to-Video Generator

Grok Imagine Video 1.5 comes from xAI, the team behind the Grok assistant, and it belongs to the image-to-video family rather than text-to-video. The distinction matters: a text-to-video model invents a scene from nothing, while Grok starts from a frame you've already locked and only adds motion to it. Practically, that means the composition, the character's face, the product's branding, and the color grade you signed off on all stay put — the model isn't reimagining the shot, it's setting it in motion. It does this by reading the still and your prompt at the same time, then predicting a short sequence of frames that obey both: the action you described, the way light and shadow should travel, and the path of the virtual camera.

That photo-first approach is what makes it useful in real production. Marketers animate a single packshot into a hero loop, creators turn a portrait into a living image, and designers preview a storyboard panel as actual motion before committing a crew to it. You steer with plain English — name the camera move (push-in, orbit, tilt) and the subject's action — instead of wrestling with keyframes on a timeline.

It has clear edges. Clips top out at 15 seconds, resolution caps at 720p, there's no native audio, and the motion is only as believable as the input is sharp; a soft or cluttered still gives the model less to work with. The honest sweet spot is short, directed shots from strong images: product spots, portrait animation, social reels, and storyboard tests. If you have a great frame and a clear idea of how it should move, this is one of the fastest paths from still to clip.