What is SCAIL-2?
SCAIL-2 is a Wan2.1-based model specialized for motion transfer to people and characters.
The major difference from Wan-Animate and the previous SCAIL-1 is that it does not convert the input into an intermediate representation such as a stick figure.
The usual idea has been to make a stick figure with ViTPose or OpenPose, then use that as the condition for moving the person. But once you convert the video into a stick figure, a lot of information is lost.
Depth, contact, intertwined multi-person motion, non-human character motion, and so on...
So SCAIL-2 passes the reference image and motion video almost directly to the DiT.
Rather than humans building a complicated processing pipeline by hand, it is often more flexible to prepare the right dataset and let the AI understand the task. That way of thinking will probably become more common from here.
Model Download
- checkpoints
- clip_vision
- diffusion_models
- loras
- text_encoders
- vae
📂ComfyUI/
└── 📂models/
├── 📂checkpoints/
│ └── sam3.1_multiplex_fp16.safetensors
├── 📂clip_vision/
│ └── clip_vision_h.safetensors
├── 📂diffusion_models/
│ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors
├── 📂loras/
│ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
├── 📂text_encoders/
│ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
└── 📂vae/
└── wan_2.1_vae.safetensors
Animation Mode
Move a reference image using a motion video.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00016.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00016.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00016.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917366,
"offset": [
782.6885946831094,
-564.1607851229405
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
The base workflow is similar to Wan-Animate, but this one is much simpler, so let's look through it.
Reference Image / Motion Video
The reference image and motion video are resized internally, so they do not need to be the same size.
- Similar aspect ratios are easier to handle.
- The pose in the image and the pose in the video do not need to match perfectly.
- However, if they are too different, generation will fail.
- It is usually safer to choose a reference image close to the first frame of the motion video.
Prompt
Since this is just motion transfer, you do not need a detailed prompt.
- However, if the prompt is too short, generation can fail more easily, especially in Replacement Mode.
- For this example, write enough to describe the intended video, such as
a man in a shirt is standing with one hand on his waist and touching his hair.
Resolution / Frame Count
Set the generation size and frame count in WanSCAILToVideo.
- Recommended resolution is 480p (864×480) to roughly 720p (1280×704), and a multiple of 32
- Maximum frame count is 81
- In this workflow, the reference image is resized and that size is used as the generation resolution.
Mask Generation with SAM3.1
Mask the people in the reference image and motion video with SAM 3 / 3.1.
- This is not a strict inpainting mask. It is just a helper that tells SCAIL-2 which people correspond to each other, so a little misalignment is fine.
Create SCAIL-2 Colored Mask
The generated masks are colored appropriately.
- This becomes a little more important when there are multiple people. More on that later.
6-Step Generation
SCAIL-2 can also use the Lightx2v LoRA for fast Wan2.1 generation.
cfgis 1.0stepsis 6
Output Example

Replacement Mode
Replace the person in the video with the person in the reference image.

{
"id": "37e9470f-e8a2-4649-85ba-c52ea13698d7",
"revision": 0,
"last_node_id": 128,
"last_link_id": 259,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
249.51499634267563,
1322.4027070242782
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
988.4888916015625
],
"flags": {},
"order": 30,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00014.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00014.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00014.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-89.07294054914254,
932.3954062570297
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-82.0947074157867,
1153.890241223387
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
211.178864135982,
839.3429690539708
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-421.7081701072348,
858.7029690539712
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 126,
"type": "ImageFromBatch",
"pos": [
-454.9257483587344,
514.9424306127546
],
"size": [
270,
82
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 253
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "ImageFromBatch"
},
"widgets_values": [
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-749.4014607863867,
514.9424306127546
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
253
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 123,
"type": "ResizeImageMaskNode",
"pos": [
-155.7273543312722,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 252
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
250
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 127,
"type": "Reroute",
"pos": [
90.39715479542042,
1419.8866138867568
],
"size": [
75,
26
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 255
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
256,
257,
258
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
530.832763671875,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
565.5166625976562,
1392.2200491685596
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 258
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
216.54664624797653,
1093.0480138500996
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 257
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-752.3296518502294,
1419.8866138867568
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
255
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt is standing in the park, hands on his hips, touching his hair."
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 124,
"type": "ResizeImageMaskNode",
"pos": [
151.7680306369739,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 250
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
254
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 128,
"type": "MarkdownNote",
"pos": [
-567.3573478819718,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 119,
"type": "Reroute",
"pos": [
71.0720574680444,
667.8860270852282
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
618.0927392578126,
793.1637633142003
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 254
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
497.73577880859375,
995.6266686688589
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
237
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1292.2580838705644,
1145.2233455980718
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 237
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
603.56829029821
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 256
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1045.8199574761238,
1144.9217621465461
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
237,
107,
1,
117,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
123,
0,
124,
0,
"IMAGE"
],
[
252,
126,
0,
123,
0,
"IMAGE"
],
[
253,
113,
0,
126,
0,
"IMAGE"
],
[
254,
124,
0,
104,
0,
"IMAGE"
],
[
255,
58,
0,
127,
0,
"IMAGE"
],
[
256,
127,
0,
101,
5,
"IMAGE"
],
[
257,
127,
0,
116,
0,
"IMAGE"
],
[
258,
127,
0,
56,
1,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5222486944025904,
"offset": [
648.8591196607266,
210.7355447246406
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Basically, just set replacement_mode to true in Create SCAIL-2 Colored Mask and WanSCAILToVideo.
Resolution
Replacement uses the video size as the base.
- In this workflow, it resizes the first frame of the video, reads that size, and sets it as the output size.
Create SCAIL-2 Colored Mask and WanSCAILToVideo
Set replacement_mode to true.
- By the way, the output of
Create SCAIL-2 Colored Maskonly makes the pose_video background white.
Output Example

Animation Mode (Multiple People)
SCAIL-2 also supports videos and images with multiple people.
No special operation is required. As before, just input the video and reference image.

{
"id": "30265d26-6d42-46e0-9a45-d84500678056",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00019.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00019.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00019.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pasted/image (4).png",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A black dog mascot character and a green-and-cream bird mascot character, with a similar build, are holding hands and performing a ballroom dance on a white stage."
]
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "8281169-hd_1080_1920_24fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "8281169-hd_1080_1920_24fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4665073802097337,
"offset": [
1011.0979865375539,
-44.36445395816628
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Create SCAIL-2 Colored Mask
When there are multiple people, it becomes important to control which person should follow which motion. SCAIL-2 uses colored masks for this.
- When SAM3.1 segments multiple targets,
Create SCAIL-2 Colored Maskpaints them in different colors in order. - Basically, matching colors are linked together, so use options such as
sort_byto align the colors.
However, as in the output example below, the color correspondence and the motion may not always match. This is only a light condition, and the model may simply choose the closer composition.
Output Example

Animation Mode (Over 81 Frames)
SCAIL-2 basically generates up to 81 frames, but with WAN Context Windows (Manual), you can generate longer videos by splitting along the time direction.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 124,
"last_link_id": 252,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00025.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00025.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00025.mp4"
}
}
}
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
133,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 252
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123456,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
251
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 124,
"type": "WanContextWindowsManual",
"pos": [
921.0647655256324,
-84.75058912563465
],
"size": [
316.1412109375,
202
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 251
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "WanContextWindowsManual"
},
"widgets_values": [
81,
29,
"standard_static",
1,
false,
"pyramid",
true
],
"color": "#223",
"bgcolor": "#335"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
],
[
251,
48,
0,
124,
0,
"MODEL"
],
[
252,
124,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5644739300537773,
"offset": [
420.4411568691918,
395.94500319536957
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
WAN Context Windows (Manual)
It is like tiling along the time axis, or context sliding.
- Set
context_lengthto 81, and it generates internally in chunks of 81 frames. - If you leave it as-is, the seams will be obvious, so set an appropriate number of frames in
context_overlapas overlap.
Output Example
