什么是 SCAIL-2?
SCAIL-2 是一个基于 Wan2.1 的模型,专门用于人物和角色的动作迁移。
它和 Wan-Animate 以及前作 SCAIL-1 最大的区别是,不会先转换成火柴人之类的中间表示。
用 ViTPose 或 OpenPose 做出火柴人,再把它作为条件来驱动人物。过去这算是很自然的想法,但一旦转换成火柴人,很多信息都会丢失。
深度、接触、多人之间的交错动作、非人类角色的动作等等……
所以 SCAIL-2 会把参考图像和动作视频几乎原样传给 DiT。
与其由人来手搓复杂的处理流水线,不如准备合适的数据集,让 AI 理解这个任务。这样得到的东西往往更灵活,也更好用。这种思路今后应该会越来越常见。
模型下载
- checkpoints
- clip_vision
- diffusion_models
- loras
- text_encoders
- vae
📂ComfyUI/
└── 📂models/
├── 📂checkpoints/
│ └── sam3.1_multiplex_fp16.safetensors
├── 📂clip_vision/
│ └── clip_vision_h.safetensors
├── 📂diffusion_models/
│ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors
├── 📂loras/
│ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
├── 📂text_encoders/
│ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
└── 📂vae/
└── wan_2.1_vae.safetensors
Animation 模式
用动作视频来驱动 参考图像。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00016.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00016.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00016.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917366,
"offset": [
782.6885946831094,
-564.1607851229405
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
基础 workflow 和 Wan-Animate 很接近,但这里简单很多,所以轻松看下去就好。
参考图像・动作视频
参考图像和动作视频会在内部 resize,所以一开始不需要做成相同尺寸。
- 长宽比接近会更容易处理。
- 图像和视频中的姿势不需要完全一致。
- 但是差得太多会失败。
- 参考图像最好选择接近动作视频第 1 帧的图像。
Prompt
因为只是转移动作,所以不需要很详细的 prompt。
- 但是,如果 prompt 太短,尤其是在 Replacement 模式 中会更容易失败。
- 这次的话,可以像
穿着衬衫的男性一只手扶着腰,另一只手摸着头发这样,写到足够说明想生成什么样的视频。
分辨率・帧数
生成尺寸和帧数输入到 WanSCAILToVideo。
- 推荐分辨率为 480p(864×480)到接近 720p(1280×704),并且是 32 的倍数
- 最大帧数为 81
- 这个 workflow 会 resize 参考图像,并把那个尺寸作为生成分辨率。
使用 SAM3.1 生成 Mask
使用 SAM 3 / 3.1 对参考图像和动作视频中的人物生成 mask。
- 这不是 inpainting 用的严格 mask,而是告诉 SCAIL-2 人物对应关系的辅助信息,所以稍微有点偏差也没问题。
Create SCAIL-2 Colored Mask
生成的 mask 会被适当地着色。
- 多人场景下这里会稍微重要一些。后面会说明。
输出例

Replacement 模式
将 视频中的人物 替换为 参考图像中的人物。

{
"id": "37e9470f-e8a2-4649-85ba-c52ea13698d7",
"revision": 0,
"last_node_id": 128,
"last_link_id": 259,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
249.51499634267563,
1322.4027070242782
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
988.4888916015625
],
"flags": {},
"order": 30,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00014.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00014.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00014.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-89.07294054914254,
932.3954062570297
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-82.0947074157867,
1153.890241223387
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
211.178864135982,
839.3429690539708
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-421.7081701072348,
858.7029690539712
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 126,
"type": "ImageFromBatch",
"pos": [
-454.9257483587344,
514.9424306127546
],
"size": [
270,
82
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 253
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "ImageFromBatch"
},
"widgets_values": [
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-749.4014607863867,
514.9424306127546
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
253
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 123,
"type": "ResizeImageMaskNode",
"pos": [
-155.7273543312722,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 252
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
250
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 127,
"type": "Reroute",
"pos": [
90.39715479542042,
1419.8866138867568
],
"size": [
75,
26
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 255
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
256,
257,
258
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
530.832763671875,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
565.5166625976562,
1392.2200491685596
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 258
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
216.54664624797653,
1093.0480138500996
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 257
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-752.3296518502294,
1419.8866138867568
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
255
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt is standing in the park, hands on his hips, touching his hair."
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 124,
"type": "ResizeImageMaskNode",
"pos": [
151.7680306369739,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 250
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
254
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 128,
"type": "MarkdownNote",
"pos": [
-567.3573478819718,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 119,
"type": "Reroute",
"pos": [
71.0720574680444,
667.8860270852282
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
618.0927392578126,
793.1637633142003
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 254
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
497.73577880859375,
995.6266686688589
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
237
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1292.2580838705644,
1145.2233455980718
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 237
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
603.56829029821
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 256
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1045.8199574761238,
1144.9217621465461
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
237,
107,
1,
117,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
123,
0,
124,
0,
"IMAGE"
],
[
252,
126,
0,
123,
0,
"IMAGE"
],
[
253,
113,
0,
126,
0,
"IMAGE"
],
[
254,
124,
0,
104,
0,
"IMAGE"
],
[
255,
58,
0,
127,
0,
"IMAGE"
],
[
256,
127,
0,
101,
5,
"IMAGE"
],
[
257,
127,
0,
116,
0,
"IMAGE"
],
[
258,
127,
0,
56,
1,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5222486944025904,
"offset": [
648.8591196607266,
210.7355447246406
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
基本上只要把 Create SCAIL-2 Colored Mask 和 WanSCAILToVideo 的 replacement_mode 设为 true。
分辨率
Replacement 会以视频尺寸为基准。
- 这个 workflow 会 resize 视频的第 1 帧,取得那个尺寸并进行设置。
Create SCAIL-2 Colored Mask 与 WanSCAILToVideo
将 replacement_mode 设为 true。
- 顺便一提,
Create SCAIL-2 Colored Mask的输出只是让 pose_video 侧的背景变白。
输出例

Animation 模式(多人)
SCAIL-2 也支持多人视频和图像。
不需要特别操作。和前面一样,输入视频和参考图像即可。

{
"id": "30265d26-6d42-46e0-9a45-d84500678056",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00019.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00019.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00019.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pasted/image (4).png",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A black dog mascot character and a green-and-cream bird mascot character, with a similar build, are holding hands and performing a ballroom dance on a white stage."
]
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "8281169-hd_1080_1920_24fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "8281169-hd_1080_1920_24fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4665073802097337,
"offset": [
1011.0979865375539,
-44.36445395816628
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Create SCAIL-2 Colored Mask
多人时,哪个人物对应哪段动作会变得重要。SCAIL-2 使用彩色 mask 来控制这一点。
- 当 SAM3.1 分割出多个目标时,
Create SCAIL-2 Colored Mask会按顺序把它们涂成不同颜色。 - 基本上同色之间会被关联起来,所以请使用
sort_by等方式对齐颜色。
不过,如下面的输出例所示,颜色对应和动作并不一定总能对上。它只是一个较弱的条件,模型也可能单纯选择构图上更接近的一方。
输出例

Animation 模式(81 帧以上)
SCAIL-2 基本上生成到 81 帧为止,但使用 WAN Context Windows (Manual),就可以沿时间方向分段生成更长的视频。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 124,
"last_link_id": 252,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00025.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00025.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00025.mp4"
}
}
}
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
133,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 252
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123456,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
251
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 124,
"type": "WanContextWindowsManual",
"pos": [
921.0647655256324,
-84.75058912563465
],
"size": [
316.1412109375,
202
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 251
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "WanContextWindowsManual"
},
"widgets_values": [
81,
29,
"standard_static",
1,
false,
"pyramid",
true
],
"color": "#223",
"bgcolor": "#335"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
],
[
251,
48,
0,
124,
0,
"MODEL"
],
[
252,
124,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5644739300537773,
"offset": [
420.4411568691918,
395.94500319536957
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
WAN Context Windows (Manual)
可以理解为时间轴方向的 tiling,或者 context sliding。
- 将
context_length设为 81 时,内部会按 81 帧为一段进行生成。 - 如果直接这样分段,接缝会很明显,所以用
context_overlap设置适当的重叠帧数。
输出例
