Grok Imagen Video 深度研究

时代的肖像

解锁 Grok 视频模型的结构化参数控制

从信息论视角出发,探索真正有效的参数化控制方法,
结合 Meta Prompt 技术,创造连贯的视频叙事

核心观点

信息带宽:AI 创作的本质

在探索 Grok 视频模型之前,分享一下我的两个关键观点

观点一:信息论视角的创作

创作的发展方向不是点石成金——让自己说一句话,AI 变出一万句。这在信息学上是没有额外价值的,只是信息的编造和稀释。

真正有价值的创作,是你能够注入更多有效信息,而不是让 AI 凭空生成缺乏信息量的内容。

观点二:带宽决定创作质量

Vibe Coding 看起来可以跑出产品框架,但真正的产品人需要的是和自己产品有更大的信息带宽,而不是更窄。

哪怕到视频创作这里也一样——能够 Meta Prompting 才可能做出好东西来

带着这两个观点,让我们深入探索 Grok 视频模型的真正能力

Section 01

Grok 视频模型的参数结构

揭示真正的结构化参数控制,区别于"伪结构化输出"

Nano Banana 的问题

之前有网友分享基于 Nano Banana 的结构化输出,但这在信息论视角其实是编造和稀释了信息

这也违反了 Google 的最佳实践——该模型的核心优势在于其深厚的语言理解能力。与一连串不相关的字词相比,叙述性描述段落几乎总是能生成更好、更连贯的图片。

真实的参数结构

实际上 Grok 的视频模型是真有结构化参数且支持直接设置的。

通过直接设置这些参数,我们可以获得精确的控制能力,而不是依赖模型的"猜测"。

完整参数结构示例

{
  "videoPrompt": {
    "shot": {
      "motion_level": "medium",
      "camera_depth": "medium shot",
      "camera_view": "eye level",
      "camera_movement": "static shot"
    },
    "scene": {
      "location": "enchanted forest glade",
      "environment": "The setting is a lush woodland clearing with ancient trees, moss-covered rocks, and glowing fireflies. The lighting is dappled sunlight filtering through leaves, creating a magical, serene atmosphere."
    },
    "cinematography": {
      "lighting": "natural sunlight with soft, ethereal glows",
      "style": "3D CGI with a fantasy aesthetic",
      "texture": "lush, detailed foliage and smooth stone surfaces, with subtle sparkles on dew drops",
      "depth_of_field": "shallow focus on the foreground character"
    },
    "visual_details": {
      "objects": [
        "Fantasy guardian elf: A figure with flowing silver hair, light green tunic with leaf patterns, boots, and a wooden staff adorned with vines."
      ],
      "positioning": [
        "The Fantasy guardian elf is in the foreground, initially standing and then transitioning to a seated position on a mossy rock, with ancient trees and fireflies in the background."
      ],
      "text_elements": [
        "Watermark: 'Global Story' in the top right corner, remaining static throughout."
      ]
    },
    "motion": "The video begins with the Fantasy guardian elf in the foreground, standing in a forest glade with arms outstretched toward the canopy. A vine pattern on the staff starts to emit a soft green glow, gradually brightening with a steady, pulsating rhythm. She then smoothly transitions into a seated position on a mossy rock, her body relaxing as one hand rests on the ground and the other holds the staff upright, emphasizing gentle, harmonious waves from foreground center. Her fingers trace the glowing vines, while her head tilts upward slightly, eyes closing in quiet reflection. The green fluorescence on the staff intensifies, casting a subtle glow on the surrounding leaves and rocks, with light reflections shifting across the textured bark and forest floor. Her movements slow to a calm sway, staff tilting gently to accentuate the curve of the vines, while her gaze opens toward the horizon with a sense of wonder. The environment's dappled lighting interacts with her motions, shadows and highlights dancing across the background trees and fireflies. Throughout, the character's posture shifts from standing to seated, with continuous, fluid adjustments of her limbs and torso, maintaining the serene staff movements until the end, where the glow peaks and begins to fade softly. The camera remains static, capturing all actions within the medium shot frame without re-framing or occlusions, while the 'Global Story' watermark stays fixed in the top right. The dialogue, 'Awaken... the ancient bond,' is delivered in a calm, inspiring tone, syncing with the peak of the glowing effect.",
    "audio": {
      "music": "Soft, atmospheric fantasy music with a serene, rhythmic undertone, featuring subtle flutes and ambient harps.",
      "ambient": "Gentle rustle of leaves and distant bird calls.",
      "sound_effect": "A soft, pulsating hum corresponding to the glowing vines, mixed with light wind chimes.",
      "mix_level": "Music and sound effects are balanced prominently, with ambient sounds subtly in the background, enhancing the magical mood."
    },
    "dialogue": [
      {
        "characters": "Fantasy guardian elf",
        "content": "Awaken... the ancient bond",
        "accent": "Neutral English with a calm, inspiring inflection",
        "language": "English",
        "emotion": "serene wonder",
        "type": "spoken",
        "subtitles": false,
        "start_time": "00:00:01.000",
        "end_time": "00:00:04.500"
      }
    ],
    "tags": [
      "fantasy",
      "serene",
      "3D CGI",
      "nature elements",
      "glowing effects",
      "harmonious movement"
    ]
  }
}

验证和发现

验证方法

在浏览器 F12 的 Network 里查看 grok.com/rest/app-chat/conversations/new 的返回结果。

这个流式请求如果成功生成,最后倒数几次中会有一条回复确认实际使用的参数。直接使用参数输入,会发现输入和输出是没有损耗的

关键发现

1
首帧信息几乎不能跳过

无论提示词怎么强调,比如一个室内背景直接强调改成海滩,但也会立即变过去而不是一开始就是海滩。这个特性也就为后续的人为首尾帧的时代肖像创作提供了延续的前提

2
Dialog 支持列表精准控制

Dialog 列表支持多个元素,可控制文字和时间,对时间的依从性在不违反6秒总时长的前提下大概有6成。

绝大多数输入的中文都会被转换成英语再处理,但水印或台词等除外,只是就算直接写中文,发音的中文支持不够好,需要抽很多次卡,同音词也不太行

3
Tag 模式倒推指令

可以不写提示词而直接只写 tag,模型会根据 tag 倒过来补充指令

4
Spicy 模式与参数注入

Spicy 模式已经下线且在找不到,且 grok.com/rest/media/post/list 里也没找到,但可以通过注入参数直接覆盖默认的模式。

--mode=extremly-crazy

这就是还保留的 Fun 模式,但其运作内核并不清楚,只是看这个参数也能略知一二何谓 Fun。目前能查到的记录有 extremly-crazy/custom/speech/normal 四个模式。

Section 02

时代图景:15 个阶段的叙事

基于首帧能力的角色一致性设计 + 精心构思的历史穿越叙事

AI视频的分镜和传统分镜不同,为了尽可能高的保留角色一致性,分镜最后一定是要保留角色的姿态,而不是缩放到特定物件、场景或身体部位这种常见的剪辑手法。

片段 1
片段 2
片段 3
片段 4
片段 5

核心机制

穿越与融入

主角通过上一时代的"传送门"进入新场景,服饰随时代变化

时代海报定格

致敬该时代经典文化符号的标志性姿态(绘画、雕塑、历史照片)

传送门设计

在构图背景中巧妙融入下一个时代的入口

第一幕:神话与律法的基石

片段 1-5 (0-30秒)

初始图片设计(Nano Banana 生成)

《维纳斯的诞生》式初现

想象你从一片类似《维纳斯的诞生》中维纳斯诞生的贝壳或水波纹中缓缓"升起"或"走出"。但这个"贝壳"或"水波纹"是抽象化的光影效果,而不是具象的贝壳。

服装设计:根据你穿搭风格设计的古希腊女性 Chiton(希腊式连衣裙),剪裁更贴合身形,可能带有更精致的褶皱或装饰。
姿态特点:优雅而充满生命力,目光望向前方,脸上带着初临新世界的审视与好奇。
1
古希腊 · 掷铁饼者 (0-6秒)
穿越

你从一片混沌的光芒中走出,身上披着简单的古希腊运动服(Chiton)。

时代海报

你来到古代奥林匹亚的竞技场。在片段结尾,你定格成米隆雕塑《掷铁饼者》的经典姿势,肌肉紧绷,身体形成完美的S形。背景是宏伟的宙斯神庙,神庙巨大的石柱门廊就是传送门。

2
古罗马 · 沉思者 (6-12秒)
穿越

你走进神庙门廊。穿过石柱的阴影时,你身上的运动服变成罗马议员的托加长袍。

时代海报

你身处罗马万神殿巨大的穹顶之下,阳光从穹顶中心的天窗(Oculus)洒下。你定格成罗丹雕塑《沉思者》的姿势,坐在石阶上,手抵下颚,仿佛在思考帝国的命运与哲学。你的目光穿过人群,落在万神殿正面的青铜大门上。

3
古埃及 · 法老的权杖 (12-18秒)
穿越

你推开青铜大门。门后的光芒中,你身上的托加长袍变为华丽的埃及法老服饰。

时代海报

你站在卡纳克神庙雄伟的石柱林中。你定格成壁画中法老的经典站姿,一手持生命之符(Ankh),一手持权杖,身体侧转,眼神威严地平视前方。不远处,是一座通往陵墓的、刻满象形文字的墓道入口。

4
古巴比伦 · 汉谟拉比法典 (18-24秒)
穿越

你走进墓道。在黑暗的甬道中,你身上的法老服饰变为古巴比伦祭司的长袍。

时代海报

你来到一座神庙前,面前矗立着著名的汉谟拉比法典石碑。你定格成石碑顶端浮雕中汉谟拉比王从太阳神手中接过权杖的姿态,伸出手,庄重而肃穆。背景是宏伟的巴比伦空中花园(想象图),花园的一座拱形水渠是传送门。

5
维京 · 征服者的呐喊 (24-30秒)
穿越

你走向并穿过水渠。水流环绕中,你的祭司长袍被粗犷的维京锁子甲和毛皮取代。

时代海报

你站在一艘龙头长船的船头,船正乘风破浪。你定格成一个经典的维京战士姿势,一手持斧,一手持盾,张口发出无声的呐喊,充满了力量感。远方风暴的云层中,撕开了一个如同巨大漩涡的口子。

第二幕:文艺与探索的黎明

片段 6-10 (30秒-1分)

6
文艺复兴 · 维特鲁威人 (30-36秒)
穿越

你的船被卷入漩涡。在天旋地转中,你的维京战甲变为文艺复兴时期的学者服。

时代海报

你发现自己身处达芬奇的工作室。在片段结尾,你定格成达芬奇笔下《维特鲁威人》的姿态,在一个巨大的圆环和方框前伸展四肢,探索人体的完美比例。背景墙上,是一幅巨大的世界地图壁画。

7
大航海 · 指点江山 (36-42秒)
穿越

你走向那幅世界地图,伸出手触摸。当你穿过地图的瞬间,学者服变为大航海时代的船长外套。

时代海报

你站在一艘盖伦帆船的船长室里,面前是巨大的地球仪。你定格成一个经典的探险家姿势,一手按着地球仪,一手指向远方,充满了对未知世界的渴望。你身后船长室的木门微微开启。

8
日本 · 武士的居合 (42-48秒)
穿越

你推开木门。门外樱花飞舞,你的船长外套变为一身精致的武士铠甲。

时代海报

你身处一座日式城堡的天守阁下,樱花飘落。你定格成武士拔刀(居合)的瞬间姿势,身体微蹲,手已握住刀柄,眼神锐利如刀锋。不远处,是一扇通往庭院的日式纸拉门(障子)。

9
法国宫廷 · 芭蕾舞者的阿拉贝斯克 (48-54秒)
穿越

你拉开纸拉门。门后金碧辉煌,你的武士铠甲变为路易十四时期华丽的宫廷礼服和舞鞋。

时代海报

你置身于凡尔赛宫的镜厅,水晶灯璀璨。你定格成一个优雅的芭蕾舞阿拉贝斯克(Arabesque)姿势,单腿站立,另一腿向后高高抬起,手臂伸展,姿态如天鹅般高贵。镜厅尽头,是一扇通往歌剧院的华丽包厢门。

10
歌剧魅影 (54秒-1分6秒)
穿越

你走进包厢门。门后光线变暗,你的宫廷礼服变为燕尾服和白色面具。

时代海报

你站在巴黎歌剧院的屋顶,月光皎洁。你定格成《歌剧魅影》中魅影拉着克里斯汀的经典姿势(可以是你一人分饰两角的感觉,或者拉着一件斗篷),充满了神秘与浪漫。你身后,是一个通往剧院内部的天窗。

第三幕:革命与飞跃的世纪

片段 11-15 (1分-1分30秒)

11
工业革命 · 卓别林的拧螺丝 (1分6秒-1分12秒)
穿越

你从天窗跃下。在坠落中,你的燕尾服变为卓别林在《摩登时代》中的工人服。

时代海报

你身处巨大的工厂齿轮之间。你定格成卓别林手持双扳手,机械地拧着螺丝的经典滑稽姿势,反映了工业时代人的异化。背景是一个巨大的蒸汽机车头,正面的锅炉门是传送门。

12
西部开拓 · 牛仔的对决 (1分12秒-1分18秒)
穿越

你爬进火车头。在轰鸣和蒸汽中,你的工人服变为一身帅气的西部牛仔装束。

时代海报

你从火车上跳下,来到一个典型的西部小镇主街。你定格成正午对决前,手按枪套的经典姿势,眼神专注,气氛紧张。街尽头,是一家银行的保险库大门。

13
爱因斯坦的吐舌 (1分18秒-1分24秒)
穿越

你推开银行大门。在金光闪闪中,你的牛仔服变为一身学者的旧毛衣。

时代海报

你发现自己站在普林斯顿大学的办公室里,身后是写满公式的黑板。你定格成爱因斯坦最著名的吐舌头鬼脸照片的姿势,充满了智慧与俏皮。黑板上,一个虫洞的理论公式旁边,画了一个具象的虫洞图示。

14
摇滚之王 · 猫王的舞步 (1分24秒-1分30秒)
穿越

你触摸黑板上的虫洞图示。一阵扭曲后,你的学者毛衣变为闪亮的摇滚演出服。

时代海报

你站在一个巨大的舞台上,聚光灯照耀。你定格成猫王经典的扭胯舞姿,一手拿麦克风,一手指向天空,充满了激情与活力。舞台背景的巨大LED屏幕上,开始播放宇宙的影像。

15
2001太空漫游 (1分30秒-1分36秒)
穿越

你走向LED屏幕,穿身而过。在像素的海洋中,你的演出服变为一套简洁的未来宇航服。

时代海报

你漂浮在一个纯白色的、极简主义的宇宙空间站房间里(致敬《2001太空漫游》)。你定格成电影中宇航员伸出手,触摸巨大黑色石碑(Monolith)的姿态,充满了对未知宇宙的敬畏与探索。

最终画面

当你的指尖触碰到石碑的瞬间,整个宇宙的星辰倒映在石碑光滑的表面上,然后无限延伸,将你和整个画面吞没,结束。

Section 03

前五个阶段的制作与调优

保留更多有效信息,而非 Vibe Coding 式的黑盒操作

更大的信息带宽 = 更好的创作结果

理解“下属”的能力边界,掌握尽可能多的细节,通过 Meta Prompting 精确控制输出,而不是让 AI 在黑盒中"猜测"你的意图。

01

古希腊时代

《掷铁饼者》的经典瞬间

初始提示词

你来到古代奥林匹亚的竞技场。最终你定格成米隆雕塑《掷铁饼者》的经典姿势,肌肉紧绷,身体形成完美的S形。背景是宏伟的宙斯神庙,神庙巨大的石柱门廊就是传送门

Grok 初版提示词

(系统自动生成,无需手动输入)

Meta Prompt 要求

一开始没有什么负担,多重试抽几次比上来就 meta prompt 更划算

最终输入系统的提示词

(可直接复制使用)

使用初始提示词直接生成,无需额外调优

02

古罗马时代

《沉思者》的哲学思考

初始提示词

你回头走进神庙门廊。穿过石柱的阴影时,你身上的运动服变成罗马议员的托加长袍。
你身处罗马万神殿巨大的穹顶之下,阳光从穹顶中心的天窗(Oculus)洒下。你定格成罗丹雕塑《沉思者》的姿势,坐在石阶上,手抵下颚,仿佛在思考帝国的命运与哲学。你的目光穿过人群,落在万神殿正面的青铜大门

Grok 初版提示词

(系统自动生成,无需手动输入)

Meta Prompt 要求

meta prompt 效果不好所以加了回头的描述

最终输入grok系统的参数

(可直接复制使用)

"videoPrompt": "{\"shot\": {\"motion_level\": \"medium\", \"camera_depth\": \"medium shot followed by a full shot\", \"camera_view\": \"eye level\", \"camera_movement\": \"smooth tracking shot following the subject from behind and to the side\"}, \"scene\": {\"location\": \"From the exterior steps of an ancient Greek-style temple to the interior of the Roman Pantheon\", \"environment\": \"The scene transitions from an outdoor setting with a classical temple, stone steps, and natural daylight under a partly cloudy sky, to the interior of a grand architectural space with a massive dome, stone columns, and dramatic lighting from a central oculus. The atmosphere combines historical grandeur with a sense of timeless contemplation.\"}, \"cinematography\": {\"lighting\": \"Natural daylight, transitioning from diffused outdoor light to dramatic, focused light streaming through the oculus, creating high contrast and shadows.\", \"style\": \"realistic digital rendering with a classical and artistic aesthetic\", \"texture\": \"Smooth stone surfaces of the temple and Pantheon, flowing fabric textures of the clothing, and the matte finish of the stone floor and steps.\", \"depth_of_field\": \"Shallow focus on the subject, with the background architecture and crowd slightly softened to emphasize the dancer's journey and transformation.\"}, \"visual_details\": {\"objects\": [\"Woman in flowing white garment: A woman with dark hair, initially in a flowing white, Grecian-style dress or athletic garment, which transforms into a Roman senator's toga. She embodies a graceful, contemplative figure, culminating in a pose reminiscent of Rodin's 'The Thinker'.\"], \"positioning\": [\"The Woman in flowing white garment begins in the foreground on the steps of a temple, centered in the frame, and moves toward the temple's entrance or doorframe. She then walks through the colonnade and into the Pantheon, positioning herself under the oculus in the midground, seated on stone steps with the crowd and bronze doors in the background.\"], \"text_elements\": []}, \"motion\": \"The video opens with the Woman in flowing white garment in the foreground, poised on the steps of a classical temple, executing a fluid dance move with arms extended and one leg lifted, her white garment flowing. She turns her head and body gracefully to the right, beginning to walk backward toward the temple's entrance with steady, deliberate steps, her gaze shifting toward the camera as she moves from foreground to midground. As she passes through the shadows of the stone columns, her flowing white garment seamlessly transforms into a Roman senator's toga, the fabric draping and shifting with her movement in a smooth, magical transition. The camera follows her in a smooth tracking shot from behind and to the side, revealing her entry into the grand interior of the Pantheon. Under the massive dome, sunlight streams down from the central oculus, casting dynamic shadows and highlights across her form and the stone surroundings. She slows her pace, moving toward the stone steps in the midground, and settles into a seated position, striking the pose of Rodin's 'The Thinker'—hand to chin, elbow on knee, body leaned forward in contemplation. Her gaze, now directed forward and slightly to the left, penetrates the blurred crowd and fixes on the bronze doors in the background. The motion concludes with her in a fixed, statuesque pose, as if pondering the weight of empire and philosophy, while the camera gently stabilizes, framing her against the architectural majesty and the interplay of light and shadow from the oculus above.\", \"audio\": {\"music\": \"A grand, orchestral score with a contemplative and majestic tone, blending classical elements with a modern cinematic quality, featuring strings and subtle choral undertones.\", \"ambient\": \"Faint echoes of footsteps on stone, distant murmurs of a crowd, and the natural reverb of a large, domed interior space.\", \"sound_effect\": \"Soft rustling of fabric during the transformation, and a subtle, ethereal whooshing sound as the character moves through the columns and under the oculus.\", \"mix_level\": \"Music is dominant, with ambient sounds and effects mixed at a lower level to create a balanced, immersive atmosphere.\"}, \"dialogue\": [], \"tags\": [\"classical architecture\", \"transformation\", \"dance\", \"contemplation\", \"Roman Pantheon\", \"historical fantasy\", \"artistic pose\", \"sculpture reference\", \"cinematic\"]}"
03

古埃及时代

法老的威严与神圣

初始提示词

回头推开青铜大门。门后的光芒中,你身上的托加长袍变为华丽的埃及法老服饰。
你站在卡纳克神庙雄伟的石柱林中。你定格成壁画中法老的经典站姿,一手持生命之符(Ankh),一手持权杖,身体侧转,眼神威严地平视前方。不远处,是一座通往陵墓的、刻满象形文字的墓道入口。

Grok 初版提示词

"videoPrompt": "{"shot": {"motion_level": "medium", "camera_depth": "medium shot followed by a full shot", "camera_view": "eye level", "camera_movement": "smooth tracking shot from behind, following the subject towards the door and then panning slightly to capture the full scene"}, "scene": {"location": "A grand, domed classical hall transitioning to the Karnak Temple complex", "environment": "The initial scene is a large, ornate hall with a domed ceiling, classical columns, and a large crowd of figures in the background. It transitions to an outdoor temple setting with massive stone pillars, a decorated tomb entrance, and warm, ethereal lighting suggesting a sacred, historical atmosphere."}, "cinematography": {"lighting": "Natural and ethereal lighting, with a shift from soft, diffused light in the hall to a more dramatic, radiant glow from the open doorway and temple environment.", "style": "high-quality digital rendering with a classical and historical aesthetic, blending elements of realism and artistic idealization", "texture": "Matte stone textures of columns and steps, flowing fabric textures of robes, and intricate carved details on temple structures and tomb entrances.", "depth_of_field": "Shallow focus on the central figure, with the background crowd and temple elements slightly softer to emphasize the subject."}, "visual_details": {"objects": ["Woman in classical attire: A woman with dark hair, initially wearing a flowing toga-like robe, which transforms into an elaborate Egyptian pharaonic costume including a headdress, jewelry, and regalia.", "Bronze door: A large, ornate bronze door at the end of the hall, glowing with light from behind.", "Karnak Temple elements: Massive stone pillars, a tomb entrance with hieroglyphic carvings, and a sense of grandeur typical of ancient Egyptian architecture."], "positioning": ["The Woman in classical attire is initially centered in the foreground, seated on steps facing the camera, then moves towards the background to the Bronze door.", "The Bronze door is in the midground, centered behind the woman, leading to the temple exterior.", "In the final scene, the Woman in classical attire is positioned centrally among the Karnak Temple pillars, with the tomb entrance to her right in the midground."], "text_elements": []}, "motion": "The video begins with the Woman in classical attire seated thoughtfully on stone steps in the foreground center, her right hand supporting her chin. She slowly rises and turns her body to the right in a fluid motion, her flowing toga robe draping elegantly as she moves towards the midground Bronze door. With a steady push using both hands, she opens the Bronze door, revealing a radiant glow from behind it. As she steps through the doorway, the camera following smoothly from behind, her toga robe begins to transform, the fabric and style shifting into a luxurious Egyptian pharaonic costume with intricate patterns and accessories. Emerging into the Karnak Temple setting, she stands tall among the towering stone pillars in the background, her body transitioning into a classic pharaonic pose: one hand holding an Ankh (life symbol) and the other a royal scepter, her form side-turned with a commanding gaze directed forward. Her expression becomes regal and authoritative as she fixes her eyes on the distant horizon, embodying the essence of an ancient ruler. The camera pans slightly to capture the full transformation and the surrounding temple environment, including the hieroglyph-covered tomb entrance in the midground right. The crowd from the initial hall fades into the background as the scene solidifies into a timeless, mural-like depiction, with the Woman in classical attire now a frozen, majestic figure in the temple's grandeur, her movements ceasing as she holds the powerful stance, the light from the doorway continuing to cast a divine glow across the scene.", "audio": {"music": "A grand, orchestral score with a mix of classical and ancient Egyptian-inspired motifs, building in intensity from contemplative to majestic.", "ambient": "Echoing footsteps on stone, faint crowd murmurs transitioning to the sound of wind through temple pillars and distant ceremonial echoes.", "sound_effect": "The creaking and resonant push of the bronze door, followed by a shimmering, transformative sound effect as the attire changes, and a final, powerful chime symbolizing the assumption of divine authority.", "mix_level": "Music is dominant, with ambient sounds and effects layered underneath to create a cohesive, immersive atmosphere."}, "dialogue": [], "tags": ["historical transformation", "ancient architecture", "pharaoh", "Karnak Temple", "classical to Egyptian transition", "regal pose", "digital art", "ceremonial"]}"

Meta Prompt 要求

执行到现在角色已经基本识别不出来了,加上埃及法老的妆造尤其是,请检查系统提示词里,并做部分调整。注意不要用否定指令,因为系统的输入只有尾帧图片和完整提示词指令,没有额外上下文。
这是一个首尾帧的逻辑,我的诉求是指代和换装,而不是描述。如果提示词里没有让角色"变形"或其他不合理的变化,或者过于风格化的妆造,那么角色自然会一致。

最终输入系统的提示词

(可直接复制使用)

{
  "videoPrompt": "{\"shot\": {\"motion_level\": \"medium\", \"camera_depth\": \"medium shot followed by a full shot\", \"camera_view\": \"eye level\", \"camera_movement\": \"smooth tracking shot from behind, following the central figure towards the door and then panning slightly to capture the full scene\"}, \"scene\": {\"location\": \"A grand, domed classical hall transitioning to the Karnak Temple complex\", \"environment\": \"The initial scene is a large, ornate hall with a domed ceiling and classical columns. This environment transitions to an outdoor temple setting with massive stone pillars, a decorated tomb entrance, and warm, ethereal lighting suggesting a sacred, historical atmosphere.\"}, \"cinematography\": {\"lighting\": \"Natural and ethereal lighting, with a shift from soft, diffused light in the hall to a more dramatic, radiant glow from the open doorway and temple environment, ensuring the protagonist's face remains clearly visible.\", \"style\": \"high-quality, photorealistic digital rendering, maintaining a consistent and realistic depiction of the protagonist across the scene change.\", \"texture\": \"Matte stone textures of columns and steps, flowing fabric textures of robes, and intricate carved details on temple structures and tomb entrances.\", \"depth_of_field\": \"Shallow focus on the central figure, ensuring her features remain sharp and clear, with the background elements slightly softer to emphasize her.\"}, \"visual_details\": {\"objects\": [\"The Protagonist: The central female figure carried over from the initial frame. Her specific facial identity must be preserved with absolute fidelity.\", \"Attire Change: The protagonist's toga robe is replaced by an elaborate Egyptian pharaonic costume, including a Nemes headdress, a decorative collar, an Ankh, and a scepter. Any makeup is applied subtly to complement her existing features, not alter them.\", \"Bronze Door: A large, ornate bronze door at the end of the hall.\", \"Karnak Temple Elements: Massive stone pillars, and a tomb entrance with hieroglyphic carvings located to the right of the protagonist.\"], \"positioning\": [\"The protagonist is initially centered, seated on steps, then rises and moves towards the Bronze door.\", \"The Bronze door is in the midground, centered, acting as the transition point.\", \"In the final scene, the protagonist is positioned centrally among the Karnak Temple pillars, with the tomb entrance to her right.\"], \"text_elements\": []}, \"motion\": \"The video begins with the protagonist seated thoughtfully on stone steps. She rises, turns right, and moves towards the Bronze door. As she pushes the door open, a radiant glow is revealed. Stepping through the doorway, her toga robe smoothly transforms into a complete Egyptian pharaonic costume. She emerges into the Karnak Temple setting, standing tall among the towering stone pillars. She then adopts the classic pharaonic pose: one hand holding an Ankh, the other a scepter, her body side-turned with a commanding forward gaze. Her personal expression remains visible, showing her own reaction to this role. The scene solidifies into a powerful, photorealistic composition. She holds this majestic stance, a living person in historical attire, as her movements cease.\", \"audio\": {\"music\": \"A grand, orchestral score with a mix of classical and ancient Egyptian-inspired motifs, building in intensity.\", \"ambient\": \"Echoing footsteps on stone, transitioning to the sound of wind through temple pillars.\", \"sound_effect\": \"The resonant push of the bronze door, followed by a shimmering sound effect for the attire change.\", \"mix_level\": \"Music is dominant, with ambient and effects layered underneath.\"}, \"dialogue\": [], \"tags\": [\"historical transformation\", \"ancient architecture\", \"pharaoh\", \"Karnak Temple\", \"classical to Egyptian transition\", \"regal pose\", \"photorealistic character\", \"ceremonial\"]}"
}
04

古巴比伦时代

汉谟拉比法典的权威

初始提示词

你走进象形文字的墓道入口。在黑暗的甬道中,你身上的法老服饰变为古巴比伦祭司的长袍。
你来到一座神庙前,面前矗立着著名的汉谟拉比法典石碑。你定格成石碑顶端浮雕中汉谟拉比王从太阳神手中接过权杖的姿态,伸出手,庄重而肃穆。背景是宏伟的巴比伦空中花园,花园有一座拱形水渠。

Grok 初版提示词

"videoPrompt": "{"shot": {"motion_level": "medium", "camera_depth": "medium shot followed by a full shot", "camera_view": "eye level", "camera_movement": "smooth tracking shot following the subject forward, then panning up and around to capture the full scene and the final pose"}, "scene": {"location": "ancient Egyptian temple transitioning to a Babylonian temple and hanging gardens", "environment": "The scene begins in a sunlit ancient Egyptian temple with tall columns, hieroglyphs, and a warm, golden atmosphere. It transitions to a dark, narrow passage and then opens up to a grand Babylonian temple front with the Hanging Gardens in the background, featuring lush greenery, an arched aqueduct, and a mystical, warm-toned ambiance under natural daylight."}, "cinematography": {"lighting": "natural daylight with warm, golden hues transitioning to mystical, warm lighting in the Babylonian setting", "style": "high-quality digital rendering with a blend of realistic and fantastical elements, evoking ancient historical and mythological aesthetics", "texture": "smooth stone textures on columns and monuments, flowing fabric textures on clothing, intricate engravings on stone surfaces", "depth_of_field": "shallow focus on the subject initially, transitioning to a deeper focus encompassing the background elements"}, "visual_details": {"objects": ["Pharaoh-priest figure: A regal figure initially in Egyptian pharaoh attire with a striped headdress, white and blue draped garments, holding a staff, which transforms into Babylonian priestly robes with long, layered garments and ornate details.", "Hammurabi Code Stele: A large, engraved stone stele depicting the famous legal code, with detailed carvings of figures and text.", "Babylonian temple and Hanging Gardens: A grand architectural setting with columns, a temple front, lush gardens, and an arched aqueduct in the background."], "positioning": ["The Pharaoh-priest figure is initially centered in the foreground of the Egyptian temple, moving forward into the dark passage and emerging at the Babylonian temple front, centered before the Hammurabi Code Stele.", "The Hammurabi Code Stele is positioned directly in front of the figure in the midground, with the Babylonian temple and Hanging Gardens extending into the background."], "text_elements": ["Hieroglyphs: Visible on the Egyptian temple walls and entrance, in traditional ancient Egyptian script.", "Cuneiform inscriptions: Engraved on the Hammurabi Code Stele, detailed and prominent in the Babylonian setting."]}, "motion": "The video opens with the Pharaoh-priest figure standing regally in the foreground center of a sunlit Egyptian temple, holding a staff and gazing forward with a composed expression. The figure begins to walk forward at a steady pace toward the hieroglyph-covered tomb entrance on the right, entering the dark, narrow passage where the camera follows smoothly, tracking from behind. As the figure moves deeper, the Egyptian pharaoh vestments fluidly transform into ancient Babylonian priestly robes, with fabrics shifting and layering in a seamless, mystical transition, colors changing from white and blue to richer, earth-toned layers. Emerging from the passage into the open air, the figure approaches a grand Babylonian temple front in the midground, where the Hammurabi Code Stele stands prominently. The figure slows to a deliberate halt before the stele, extending their right arm forward and upward in a solemn, reaching gesture toward an imagined sun god, mimicking the stele's top relief of Hammurabi receiving a staff, holding the pose with a fixed, reverent expression. The camera pans up and around the figure in a smooth, clockwise arc, capturing the full transformation and the outstretched hand in the foreground, while revealing the background of the majestic Babylonian Hanging Gardens, including lush foliage and an arched aqueduct, which come into view with a sense of grandeur. The figure's body stiffens slightly into the pose, emulating stone-like stillness, as the warm, golden light shifts to highlight the intricate details of the robes and the stele's engravings, creating a final tableau of historical reverence against the expansive, verdant backdrop.", "audio": {"music": "A grand, orchestral score with mystical and historical tones, featuring strings, flutes, and percussion that evoke ancient civilizations, building from serene to majestic.", "ambient": "Echoing footsteps in the passage, transitioning to the sound of flowing water from the aqueduct and distant rustling of leaves in the gardens.", "sound_effect": "Subtle fabric rustling during the transformation, a soft chime as the figure reaches toward the stele, and a faint, reverent hum in the final pose.", "mix_level": "Music is dominant, with ambient sounds and effects layered underneath to create a rich, immersive atmosphere."}, "dialogue": [], "tags": ["ancient history", "Egyptian pharaoh", "Babylonian priest", "Hammurabi Code", "Hanging Gardens", "transformation", "historical fantasy", "digital art", "regal procession", "mythological"]}"

Meta Prompt 要求

两个问题是角色并没有进入画面左边的墓道入口,并且在切换之后也没有变成古巴比伦的服装,而是还是保持了埃及长老的服饰,这是我输入的和系统执行的,请按照我们前面讨论的保持角色指代一致的逻辑修改系统提示词,并给我说明和完整的提示词

最终输入系统的提示词

(可直接复制使用)

"videoPrompt": "{\"shot\": {\"motion_level\": \"medium\", \"camera_depth\": \"medium shot followed by a full shot\", \"camera_view\": \"eye level\", \"camera_movement\": \"smooth tracking shot following the protagonist into the passage, then panning to capture the full scene and the final pose\"}, \"scene\": {\"location\": \"ancient Egyptian temple transitioning to a Babylonian temple and Hanging Gardens\", \"environment\": \"The scene begins in a sunlit ancient Egyptian temple. It transitions through a dark, narrow passage to a grand Babylonian temple front with the Hanging Gardens in the background, featuring lush greenery, an arched aqueduct, and natural daylight.\"}, \"cinematography\": {\"lighting\": \"Natural daylight transitioning through darkness to warm, natural lighting in the Babylonian setting, ensuring the protagonist is clearly distinguishable.\", \"style\": \"high-quality, photorealistic digital rendering, maintaining a consistent and realistic depiction of the protagonist across the scene change.\", \"texture\": \"smooth stone textures on columns and stele, realistic fabric textures on clothing, intricate engravings on stone surfaces.\", \"depth_of_field\": \"shallow focus on the protagonist initially, transitioning to a deeper focus encompassing the background elements in the final pose.\"}, \"visual_details\": {\"objects\": [\"The Protagonist: The central figure carried over from the initial frame. Her specific facial identity must be preserved with absolute fidelity.\", \"Attire Change: Her Egyptian pharaonic costume is replaced by ancient Babylonian priestly robes, which are long and layered.\", \"Hammurabi Code Stele: A large, engraved stone stele depicting the famous legal code.\", \"Babylonian Setting: A grand temple front, lush Hanging Gardens, and a prominent arched aqueduct in the background.\"], \"positioning\": [\"The protagonist moves from the center towards and enters the tomb entrance located on the right side of the frame.\", \"She emerges from the passage at the Babylonian temple front, centered before the Hammurabi Code Stele.\", \"The arched aqueduct is visible in the background of the final composition.\"], \"text_elements\": []}, \"motion\": \"The video opens with the protagonist standing in the Egyptian temple. She turns and walks decisively towards the hieroglyph-covered tomb entrance on the right side of the frame, entering the dark, narrow passage. The camera follows her into the darkness. In the darkness of the passage, her Egyptian pharaonic costume is completely replaced by ancient Babylonian priestly robes. Emerging from the passage into the open air of the Babylonian setting, she approaches the Hammurabi Code Stele in the midground. She stops before the stele and adopts the classic pose depicted on its top relief: extending her right arm forward and upward as if receiving a staff from a god, holding the pose with a solemn, dignified expression. Her personal identity remains clear. The camera pans to capture the final tableau, revealing the majestic Babylonian Hanging Gardens and the arched aqueduct in the background. She holds this historical stance, a living person recreating a moment in time, as her movements cease.\", \"audio\": {\"music\": \"A grand, orchestral score with mystical and historical tones.\", \"ambient\": \"Echoing footsteps in the passage, transitioning to the sound of flowing water from the aqueduct.\", \"sound_effect\": \"Subtle sound of fabric changing in the darkness.\", \"mix_level\": \"Music is dominant, with ambient sounds and effects layered underneath.\"}, \"dialogue\": [], \"tags\": [\"ancient history\", \"Babylon\", \"Hammurabi Code\", \"Hanging Gardens\", \"transformation\", \"photorealistic character\", \"historical pose\"]}"
05

维京时代

龙船战士的呐喊

初始提示词

你走向并穿过水渠。水流环绕中,你的祭司长袍被粗犷的维京锁子甲和毛皮取代。
之后你站在一艘龙头长船的船头,船正乘风破浪。你定格成一个经典的维京战士姿势,一手持斧,一手持盾,张口发出无声的呐喊,充满了力量感。远方风暴的云层中,撕开了一个如同巨大漩涡的口子。

Grok 初版提示词

"videoPrompt": "{"shot": {"motion_level": "medium", "camera_depth": "medium shot followed by a full shot", "camera_view": "eye level", "camera_movement": "tracking shot following the subject, then panning to capture the longship and the storm"}, "scene": {"location": "A transformation from an ancient temple garden to a stormy sea with a Viking longship", "environment": "The scene begins in a sunlit, manicured garden with terraced greenery, water channels, and classical architecture. It transitions to a turbulent, stormy sea under dark, swirling clouds, with wind and waves."}, "cinematography": {"lighting": "Natural daylight transitioning to dramatic, stormy lighting with contrasts between dark clouds and lighter openings", "style": "high-quality digital rendering with a realistic and cinematic style", "texture": "Smooth stone and flowing water textures in the garden, transitioning to rough, metallic, and organic textures of armor, fur, and churning sea water", "depth_of_field": "Shallow focus on the woman initially, deepening to include the longship and storm in the background"}, "visual_details": {"objects": ["Woman in priestess robe: A woman with dark hair, initially wearing a flowing, beige priestess robe with ornate trim, which transforms into rugged Viking chainmail and fur armor. She holds an axe and shield.", "Dragon-headed longship: A traditional Viking longship with a carved dragon head prow, wooden structure, and furled sails, riding the waves."], "positioning": ["Initially, the Woman in priestess robe is in the foreground, center, next to a stone stele and water channel, with the temple structure in the background.", "After transformation, she is at the bow of the Dragon-headed longship, in the foreground, moving across the midground sea towards the background storm."], "text_elements": []}, "motion": "The video opens with the Woman in priestess robe standing in the foreground of a sunlit temple garden, her right arm raised in a gesture towards a stone stele. She begins to walk forward at a steady pace, moving towards and through a water channel. As she wades into the flowing water, her flowing beige robe is progressively replaced by the rough textures of Viking chainmail and fur, with water currents swirling around her legs and the transformation unfolding in a seamless, magical transition. The camera tracks her movement, following her as she emerges from the water channel onto the deck of a Dragon-headed longship positioned in the midground. The scene shifts to a stormy sea, with the longship now riding the waves, moving from left to right across the frame. The Woman in priestess robe, now in full Viking attire, positions herself at the bow of the Dragon-headed longship. She strikes a powerful, classic Viking warrior pose: her left hand grips a shield, her right hand raises an axe, and she opens her mouth wide in a silent, forceful shout. Her long hair and the fur elements of her armor whip in the wind. The camera pans slightly to capture the full scope of the Dragon-headed longship cutting through the waves, with water splashing against its hull. In the background, dark storm clouds swirl dramatically, and a massive, vortex-like opening tears through the cloud layer, revealing lighter sky beyond. The Woman in priestess robe holds her pose with intense energy, embodying strength and defiance as the longship surges forward towards the storm's eye, with the camera maintaining a dynamic tracking shot that emphasizes the motion and power of the scene until the end.", "audio": {"music": "Epic, orchestral score with building tension, transitioning from mystical, ambient tones to powerful, rhythmic percussion evoking Viking themes.", "ambient": "Flowing water sounds in the garden, transitioning to crashing waves, howling wind, and the creaking of the wooden longship.", "sound_effect": "Splashing water during the transformation, metallic clinking of armor, and the rhythmic thudding of waves against the ship.", "mix_level": "Music dominates the mix, with ambient sounds and effects layered underneath to create a cohesive, immersive soundscape."}, "dialogue": [], "tags": ["transformation", "Viking warrior", "ancient temple", "stormy sea", "epic fantasy", "digital art", "cinematic"]}"

Meta Prompt 要求

问题:

1. 之前首帧里没预留好水渠的空间距离,太远了,结果角色的转场变化完全失效,变成了奇怪的渐变,人物特质也全部丢失了。
2. 考虑到转场,这次的维京预留的天空上的云也不行,角色到底怎么转场飞过去?

调整要求:

3. 角色触摸石碑也是渐变,太差劲了。改成角色走到石碑背面,从左边进去,并转换造型,注意不是变形,是换装,然后从右边出来的是特写,再变成维京船上的呐喊全身照。
4. 不要在这一个片段里直接开始写下一个片段的转场,有盾牌就可以,但考虑到之前转场的失败,是不是在画面里预留一个船舱的门更好,而不是强调从盾牌变进去。
5. 不要在这个片段里直接揭示下一个阶段的变化,只要最后的尾帧,能够用来做新首帧的图片里包括了船舱的门和角色就行。

最终输入系统的提示词

(可直接复制使用)

"videoPrompt": {
    "shot": {
      "motion_level": "medium",
      "camera_depth": "medium shot, close-up, then full shot",
      "camera_view": "eye level, dynamic",
      "camera_movement": "Camera pushes into the stele, tracks protagonist behind it, emerges with a close-up, then pulls back to full shot of longship, finally settling on the ship's stern, showing both protagonist and cabin door."
    },
    "scene": {
      "location": "Transition from ancient temple garden to stormy North Sea with Viking longship",
      "environment": "Sunlit temple garden with stone stele. Transitions through stele to turbulent, stormy sea under dark clouds with a Viking longship.",
      "setting_details": "Temple garden features cascading water. Stormy sea includes dramatic waves and sea spray."
    },
    "cinematography": {
      "lighting": "Bright daylight in garden, transitioning to dramatic, high-contrast stormy lighting with lightning flashes over sea.",
      "style": "high-quality, photorealistic digital rendering, cinematic, absolute facial consistency, realistic depiction of protagonist.",
      "texture": "Smooth stone of stele and temple, transitioning to rough wood, leather, metal (chainmail), fur, and churning sea water.",
      "depth_of_field": "Focus on protagonist/stele, brief shallow focus during transformation, then deepens to longship and stormy sea, before settling on the ship's stern with protagonist and door in focus."
    },
    "visual_details": {
      "objects": [
        "The Protagonist: Central female figure from initial frame. Facial identity preserved with absolute fidelity.",
        "Attire Change: Protagonist's Babylonian priestly robe is replaced by rugged, authentic-looking female Viking chainmail, leather, and fur armor.",
        "Stone Stele: Large, carved monument from initial frame, acting as a two-sided magical gateway.",
        "Dragon-headed Longship: Traditional Viking longship with carved dragon prow, riding stormy waves.",
        "Viking Shield: Prominent, circular wooden shield with a distinct Viking design, held by protagonist.",
        "Viking Axe: Hand-axe wielded by protagonist.",
        "Ship's Cabin/Door: Clearly defined wooden cabin or visible door at the stern of the longship, visible in the final frame alongside the protagonist."
      ],
      "positioning": [
        "Protagonist initially in foreground, approaches front of stele.",
        "She walks to the left side of the stele, moving out of view behind it.",
        "After transformation, she emerges from the right side of the stele in a close-up.",
        "Then, powerfully on the bow of the Dragon-headed longship, facing the storm.",
        "Final shot frames the protagonist, standing on the longship, with the ship's stern cabin/door clearly visible in the background."
      ],
      "text_elements": []
    },
    "motion": "The video opens with the protagonist in her Babylonian robe, in the sunlit temple garden, arm raised towards the stele. She then walks left, behind the stele, out of sight. The camera tracks around the back of the stele. A close-up emerges from the right side, revealing her in rugged female Viking armor, face consistent and determined. This expands to a full shot of her standing heroically on the bow of a Dragon-headed longship, navigating a stormy sea. She strikes a classic Viking warrior pose: shield in one hand, axe raised in the other, a silent, commanding shout. Waves crash, lightning flashes. For the final moments, the camera subtly tracks towards the stern of the longship, settling on a frame where the protagonist is visible on the ship, with the wooden cabin or a distinct door at the stern clearly visible in the background.",
    "audio": {
      "music": "Epic, orchestral score, transitioning from mystical, ambient tones to powerful, rhythmic percussion and horns. Builds to a crescendo during longship scene.",
      "ambient": "Garden sounds/water transitioning to crashing waves, howling wind, ship creaking, distant thunder.",
      "sound_effect": "Hum/energy as stele activates, whoosh during transformation, metallic armor clink, sea roar, ship groans.",
      "mix_level": "Music and powerful sound effects dominate, creating immersive, high-energy soundscape."
    },
    "dialogue": [],
    "tags": [
      "transformation",
      "Viking warrior",
      "magical portal",
      "stormy sea",
      "epic fantasy",
      "photorealistic character",
      "cinematic",
      "gender_consistent_attire"
    ]
  }
Section 04

加入创作:制作你的时代叙事

这套方法已经分享给你,现在轮到你创造自己的版本

推荐工具与资源

创作经验与反思

语言挑战

大段的英语驾驭起来有困难,没有仔细检阅和重写的能力。如果和 nano banana 一样支持中文就好了,或者大家可以测试一下直接注入中文版本效果如何。

尾帧切换规划

一开始没有仔细规划好预留穿越时代的距离和合理性——6秒视频时间窗口很紧张,导致中途的穿越需要重新设计。最好是把穿越物设计得里角色更近。

对于后续问题较多的时候,可以把尾帧和问题一起发给AI做meta prompt,就比如从巴比伦切换到维京那里。

完成后别忘了 @ 我 推: Fisheam / 微博:神楽坂泉

期待看到你创造的时代叙事!感兴趣的朋友欢迎自己尝试制作并分享

Take Away

最后的思考

AI能力的边界是一个特性

掌握边界才可以知人善用。Grok的首帧特性限制了用一个图片生成各种各样的视频,但也恰恰保留了这整个例子里的管线能力。理解工具的局限性,才能更好地发挥其优势。

AI时代的学习观

想要取得超出常人的创作成果,就需要有超出常人的知识储备。AI时代恰恰不是什么都不值得学,而是啥都值得学,也学得懂,学得快。知识储备的深度决定了创作的高度。

信息带宽决定创作能力

无论是编程、做视频还是其他的AI创作,更大的信息带宽才是更好的。做一个负责人而不只是干活的人。掌握更多领域的知识,才能在AI辅助下创造出更有价值的作品。

掌握边界,扩展知识,提升带宽

在AI时代,理解工具特性、持续学习、拓宽视野,才是创作出优秀作品的关键