SCAIL-2とは?
SCAIL-2 は、人やキャラクターへのモーション転送に特化した Wan2.1 ベースのモデルです。
Wan-Animate や前作の SCAIL-1 と大きく違うのは、棒人間などの中間表現に変換しない点です。
ViTPose や OpenPose で棒人間を作り、それを条件として人物を動かす。これまではこれが当然のアイデアだったわけですが、いったん棒人間に変換すると多くの情報が落ちます。
奥行き、接触、複数人の絡まりあい、非人間キャラの動きなどなど…
そこで SCAIL-2 では、参照画像とモーション用動画をほぼそのまま DiT に渡します。
複雑な処理パイプラインを人間がこねるよりも、適切なデータセットを作って AI にタスクを理解してもらったほうが柔軟で使いやすいものができる。これはこれから増えていく考え方でしょうね。
モデルのダウンロード
- checkpoints
- clip_vision
- diffusion_models
- loras
- text_encoders
- vae
📂ComfyUI/
└── 📂models/
├── 📂checkpoints/
│ └── sam3.1_multiplex_fp16.safetensors
├── 📂clip_vision/
│ └── clip_vision_h.safetensors
├── 📂diffusion_models/
│ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors
├── 📂loras/
│ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
├── 📂text_encoders/
│ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
└── 📂vae/
└── wan_2.1_vae.safetensors
Animation モード
参照画像 をモーション用動画で動かします。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00016.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00016.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00016.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917366,
"offset": [
782.6885946831094,
-564.1607851229405
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
ベースの workflow は Wan-Animate と同じですが、かなりシンプルになっているので気楽に見ていきましょう。
参照画像・モーション用動画
参照画像とモーション用動画は、内部でリサイズされるため、同じサイズに揃える必要はありません。
- アスペクト比は近いほうが扱いやすいです。
- 画像と動画のポーズが完全に一致している必要はありません。
- ただし、あまりにも違うと失敗します。
- 参照画像は、モーション用動画の 1 フレーム目に近いものを選んだ方がよいでしょう。
プロンプト
モーションを転送するだけなので、詳細なプロンプトは必要ありません。
- ただ、短すぎると、特に Replacement モード では失敗しやすくなります。
- 今回であれば、
シャツを着た男性が腰に手を当てて髪を触っているというように、どんな動画にしたいかが十分に伝わるようなプロンプトを書きます。
解像度・フレーム数
生成サイズとフレーム数は WanSCAILToVideo に入力します。
- 推奨解像度は 480p(864×480)〜 720p 相当(1280×704)かつ 32 の倍数
- 最大フレーム数は 81
- 今回は、参照画像をリサイズし、そのサイズを生成解像度として使っています。
SAM3.1 によるマスク生成
参照画像とモーション用動画の人物を、SAM 3 / 3.1 でマスクします。
- inpainting 用の厳密なマスクではなく、人物の対応関係を SCAIL-2 に伝えるための補助なので、多少ズレていても問題ありません。
Create SCAIL-2 Colored Mask
作ったマスクが適切に色付けされます。
- 多人数の場合は少し重要です。詳しくは後述します。
出力例

Replacement モード
動画内の人物 を 参照画像の人物 に入れ替えます。

{
"id": "37e9470f-e8a2-4649-85ba-c52ea13698d7",
"revision": 0,
"last_node_id": 128,
"last_link_id": 259,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
249.51499634267563,
1322.4027070242782
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
988.4888916015625
],
"flags": {},
"order": 30,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00014.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00014.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00014.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-89.07294054914254,
932.3954062570297
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-82.0947074157867,
1153.890241223387
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
211.178864135982,
839.3429690539708
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-421.7081701072348,
858.7029690539712
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 126,
"type": "ImageFromBatch",
"pos": [
-454.9257483587344,
514.9424306127546
],
"size": [
270,
82
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 253
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "ImageFromBatch"
},
"widgets_values": [
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-749.4014607863867,
514.9424306127546
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
253
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 123,
"type": "ResizeImageMaskNode",
"pos": [
-155.7273543312722,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 252
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
250
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 127,
"type": "Reroute",
"pos": [
90.39715479542042,
1419.8866138867568
],
"size": [
75,
26
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 255
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
256,
257,
258
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
530.832763671875,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
565.5166625976562,
1392.2200491685596
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 258
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
216.54664624797653,
1093.0480138500996
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 257
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-752.3296518502294,
1419.8866138867568
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
255
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt is standing in the park, hands on his hips, touching his hair."
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 124,
"type": "ResizeImageMaskNode",
"pos": [
151.7680306369739,
514.9424306127546
],
"size": [
270,
106
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 250
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
254
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 128,
"type": "MarkdownNote",
"pos": [
-567.3573478819718,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 119,
"type": "Reroute",
"pos": [
71.0720574680444,
667.8860270852282
],
"size": [
75,
26
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
618.0927392578126,
793.1637633142003
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 254
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
497.73577880859375,
995.6266686688589
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
237
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1292.2580838705644,
1145.2233455980718
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 237
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
603.56829029821
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 256
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1045.8199574761238,
1144.9217621465461
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
237,
107,
1,
117,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
123,
0,
124,
0,
"IMAGE"
],
[
252,
126,
0,
123,
0,
"IMAGE"
],
[
253,
113,
0,
126,
0,
"IMAGE"
],
[
254,
124,
0,
104,
0,
"IMAGE"
],
[
255,
58,
0,
127,
0,
"IMAGE"
],
[
256,
127,
0,
101,
5,
"IMAGE"
],
[
257,
127,
0,
116,
0,
"IMAGE"
],
[
258,
127,
0,
56,
1,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5222486944025904,
"offset": [
648.8591196607266,
210.7355447246406
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
基本的には Create SCAIL-2 Colored Mask と WanSCAILToVideo の replacement_mode を true にするだけです。
解像度
Replacement は動画のサイズが基準になります。
- この workflow では、動画の 1 フレーム目をリサイズしたもののサイズを取得して設定しています。
Create SCAIL-2 Colored Mask と WanSCAILToVideo
replacement_mode を true にします。
- ちなみに、
Create SCAIL-2 Colored Maskの出力は pose_video 側の背景が白くなるだけです。
出力例

Animation モード (複数人)
SCAIL-2 は複数人の動画・画像にも対応しています。
特別な操作は必要ありません。これまでと同様に動画と参照画像を入力するだけです。

{
"id": "30265d26-6d42-46e0-9a45-d84500678056",
"revision": 0,
"last_node_id": 123,
"last_link_id": 250,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00019.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00019.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00019.mp4"
}
}
}
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
120
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pasted/image (4).png",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A black dog mascot character and a green-and-cream bird mascot character, with a similar build, are holding hands and performing a ballroom dance on a white stage."
]
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "8281169-hd_1080_1920_24fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "8281169-hd_1080_1920_24fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 81,
"skip_first_frames": 30,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 120
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
81,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
120,
48,
0,
3,
0,
"MODEL"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4665073802097337,
"offset": [
1011.0979865375539,
-44.36445395816628
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Create SCAIL-2 Colored Mask
複数人の場合は、どの人物にどの動きを対応させるかが重要になりますが、SCAIL-2 では色付きマスクを使ってそれを制御します。
Create SCAIL-2 Colored Maskは、SAM3.1 が複数対象をセグメンテーションしたとき、それぞれを順番に違う色に塗っていきます。- 基本的には、同じ色同士が紐づけられるため、
sort_byなどで色を合わせてください。
ただし、以下の出力例のように色の対応と動きが合わないことがあります。あくまでひとつの軽い条件であり、単純に構図的に近い方を選ぶこともあります。
出力例

Animation モード (81 フレーム以上)
SCAIL-2 は基本的に 81 フレームまでの生成ですが、WAN Context Windows (Manual) を使うと、時間方向に分割しながら長めの動画を生成できます。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 124,
"last_link_id": 252,
"nodes": [
{
"id": 57,
"type": "CLIPVisionLoader",
"pos": [
251.28673034667955,
1235.5962450561526
],
"size": [
270,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
106
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"clip_vision_h.safetensors"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
190.560140854834
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1831.305390050556,
583.6899778882896
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 28,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 218
},
{
"name": "vae",
"type": "VAE",
"link": 245
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
96
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
417.8738708496094,
266.8154509134282
],
"size": [
419.3189392089844,
138.8924560546875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
199
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 "
]
},
{
"id": 49,
"type": "VHS_VideoCombine",
"pos": [
2021.0685736443058,
583.6899778882896
],
"size": [
372.2688903808594,
876.4033355712891
],
"flags": {},
"order": 29,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 96
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine",
"cnr_id": "comfyui-videohelpersuite",
"ver": "a7ce59e381934733bfae03b1be029756d6ce936d"
},
"widgets_values": {
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "SCAIL-2",
"format": "video/h264-mp4",
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false,
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "SCAIL-2_00025.mp4",
"subfolder": "",
"type": "output",
"format": "video/h264-mp4",
"frame_rate": 16,
"workflow": "SCAIL-2_00025.png",
"fullpath": "/home/nomax/working-linux/ComfyUI-dev/output/SCAIL-2_00025.mp4"
}
}
}
},
{
"id": 102,
"type": "ResizeImageMaskNode",
"pos": [
-461.91705157795747,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 204
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
203
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
0.5,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
-59.809774398803675,
-84.75058912563465
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
183
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan2.1_14B_SCAIL_2_fp8_scaled.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 96,
"type": "LoraLoaderModelOnly",
"pos": [
276.2320190445451,
-84.75058912563465
],
"size": [
314.38576392812183,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 183
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
184
]
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly",
"cnr_id": "comfy-core",
"ver": "0.3.60"
},
"widgets_values": [
"Wan2.1/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 120,
"type": "Reroute",
"pos": [
1715.7130963493837,
468.5779949512076
],
"size": [
75,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 244
}
],
"outputs": [
{
"name": "",
"type": "VAE",
"links": [
245
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 39,
"type": "VAELoader",
"pos": [
525.0368322106934,
468.5779949512076
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
200,
244
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 56,
"type": "CLIPVisionEncode",
"pos": [
577.9174829101565,
1301.870476013184
],
"size": [
271.6761474609375,
78
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 242
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
202
]
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"none"
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-800.6210036236067,
1277.5627543619055
],
"size": [
308.07680913429937,
543.642446368963
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
204
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pexels-photo-31438123.jpg",
"image"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 119,
"type": "Reroute",
"pos": [
25.7972264472059,
669.0146817503652
],
"size": [
75,
26
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "",
"type": "*",
"link": 239
}
],
"outputs": [
{
"name": "",
"type": "IMAGE",
"links": [
240,
241
]
}
],
"properties": {
"showOutputText": false,
"horizontal": false
}
},
{
"id": 123,
"type": "MarkdownNote",
"pos": [
-548.6500412597653,
-84.75058912563465
],
"size": [
461.4852607760207,
466.4268689388148
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- checkpoints\n - [sam3.1_multiplex_fp16.safetensors](https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors)\n- clip_vision\n - [clip_vision_h.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors)\n- diffusion_models\n - [wan2.1_14B_SCAIL_2_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/diffusion_models/wan2.1_14B_SCAIL_2_fp8_scaled.safetensors)\n- loras\n - [Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors)\n- text_encoders\n - [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)\n- vae\n - [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors)\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂checkpoints/\n │ └── sam3.1_multiplex_fp16.safetensors\n ├── 📂clip_vision/\n │ └── clip_vision_h.safetensors\n ├── 📂diffusion_models/\n │ └── wan2.1_14B_SCAIL_2_fp8_scaled.safetensors\n ├── 📂loras/\n │ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors\n ├── 📂text_encoders/\n │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 103,
"type": "ResizeImageMaskNode",
"pos": [
-154.42166660971037,
1277.5627543619055
],
"size": [
270,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 203
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
209,
212,
233,
242
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
32,
"nearest-exact"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 104,
"type": "GetImageSize",
"pos": [
630.4935595703129,
1448.041027605726
],
"size": [
219.10007080078117,
136
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 212
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
213
]
},
{
"name": "height",
"type": "INT",
"links": [
214
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 110,
"type": "CheckpointLoaderSimple",
"pos": [
-472.9277129444554,
773.4790318695161
],
"size": [
297.3094587159344,
98
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
223,
238
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
220,
243
]
},
{
"name": "VAE",
"type": "VAE",
"links": null
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"sam3.1_multiplex_fp16.safetensors"
]
},
{
"id": 109,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
847.1714690725746
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 220
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
224
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 115,
"type": "CLIPTextEncode",
"pos": [
-140.2924833863619,
1068.66630403893
],
"size": [
250.42005327504967,
109.84222389181082
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 243
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
232
]
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
"human"
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 116,
"type": "SAM3_VideoTrack",
"pos": [
165.3271034107571,
1007.8240766656425
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 233
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 238
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 232
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
234
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#233",
"bgcolor": "#355"
},
{
"id": 112,
"type": "SAM3_VideoTrack",
"pos": [
159.95932129876255,
754.1190318695158
],
"size": [
215.80859375,
166
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"label": "images",
"name": "images",
"type": "IMAGE",
"link": 240
},
{
"label": "model",
"name": "model",
"type": "MODEL",
"link": 223
},
{
"label": "initial_mask",
"name": "initial_mask",
"shape": 7,
"type": "MASK",
"link": null
},
{
"label": "conditioning",
"name": "conditioning",
"shape": 7,
"type": "CONDITIONING",
"link": 224
}
],
"outputs": [
{
"name": "track_data",
"type": "SAM3_TRACK_DATA",
"links": [
229
]
}
],
"properties": {
"Node name for S&R": "SAM3_VideoTrack",
"cnr_id": "comfy-core",
"ver": "0.19.3"
},
"widgets_values": [
0.5,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 107,
"type": "SCAIL2ColoredMask",
"pos": [
506.50542810498916,
905.2770955134855
],
"size": [
339.45703125,
126
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "driving_track_data",
"type": "SAM3_TRACK_DATA",
"link": 229
},
{
"name": "ref_track_data",
"shape": 7,
"type": "SAM3_TRACK_DATA",
"link": 234
}
],
"outputs": [
{
"name": "pose_video_mask",
"type": "IMAGE",
"links": [
230,
236
]
},
{
"name": "reference_image_mask",
"type": "IMAGE",
"links": [
231,
250
]
}
],
"properties": {
"Node name for S&R": "SCAIL2ColoredMask"
},
"widgets_values": [
"",
"area",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 117,
"type": "PreviewImage",
"pos": [
1287.634225650107,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 250
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
417.9232177734375,
63.8154509134279
],
"size": [
419.26959228515625,
148.8194122314453
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
198
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A short-haired man wearing a striped shirt, hands on his hips, touching his hair.full body"
]
},
{
"id": 118,
"type": "PreviewImage",
"pos": [
1041.5040075988006,
1137.316846620137
],
"size": [
210,
258
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 236
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 101,
"type": "WanSCAILToVideo",
"pos": [
1082.8079278436192,
604.0725314640773
],
"size": [
344.02734375,
434
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 198
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 199
},
{
"name": "vae",
"type": "VAE",
"link": 200
},
{
"name": "pose_video",
"shape": 7,
"type": "IMAGE",
"link": 241
},
{
"name": "pose_video_mask",
"shape": 7,
"type": "IMAGE",
"link": 230
},
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 209
},
{
"name": "reference_image_mask",
"shape": 7,
"type": "IMAGE",
"link": 231
},
{
"name": "clip_vision_output",
"shape": 7,
"type": "CLIP_VISION_OUTPUT",
"link": 202
},
{
"name": "previous_frames",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 213
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 214
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
215
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
216
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
217
]
},
{
"name": "video_frame_offset",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "WanSCAILToVideo"
},
"widgets_values": [
512,
896,
133,
1,
1,
0,
1,
0,
5,
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 113,
"type": "VHS_LoadVideo",
"pos": [
-800.6210036236067,
455.00750561396774
],
"size": [
261.6533203125,
753.272357822205
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239
]
},
{
"name": "frame_count",
"type": "INT",
"links": null
},
{
"name": "audio",
"type": "AUDIO",
"links": null
},
{
"name": "video_info",
"type": "VHS_VIDEOINFO",
"links": null
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo",
"cnr_id": "comfyui-videohelpersuite",
"ver": "2984ec4c4b93292421888f38db74a5e8802a8ff8"
},
"widgets_values": {
"video": "14637751_2160_3840_30fps.mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1,
"format": "None",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "14637751_2160_3840_30fps.mp4",
"type": "input",
"format": "video/mp4",
"force_rate": 16,
"custom_width": 0,
"custom_height": 0,
"frame_load_cap": 133,
"skip_first_frames": 0,
"select_every_nth": 1
}
}
},
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1475.7130963493837,
583.6899778882896
],
"size": [
315,
262
],
"flags": {},
"order": 27,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 252
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 215
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 216
},
{
"name": "latent_image",
"type": "LATENT",
"link": 217
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
218
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
123456,
"fixed",
6,
1,
"euler",
"simple",
1
]
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
621.2813720703125,
-84.75058912563465
],
"size": [
210,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 184
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
251
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
5
]
},
{
"id": 124,
"type": "WanContextWindowsManual",
"pos": [
921.0647655256324,
-84.75058912563465
],
"size": [
316.1412109375,
202
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 251
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
252
]
}
],
"properties": {
"Node name for S&R": "WanContextWindowsManual"
},
"widgets_values": [
81,
29,
"standard_static",
1,
false,
"pyramid",
true
],
"color": "#223",
"bgcolor": "#335"
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
96,
8,
0,
49,
0,
"IMAGE"
],
[
106,
57,
0,
56,
0,
"CLIP_VISION"
],
[
183,
37,
0,
96,
0,
"MODEL"
],
[
184,
96,
0,
48,
0,
"MODEL"
],
[
198,
6,
0,
101,
0,
"CONDITIONING"
],
[
199,
7,
0,
101,
1,
"CONDITIONING"
],
[
200,
39,
0,
101,
2,
"VAE"
],
[
202,
56,
0,
101,
7,
"CLIP_VISION_OUTPUT"
],
[
203,
102,
0,
103,
0,
"IMAGE"
],
[
204,
58,
0,
102,
0,
"IMAGE"
],
[
209,
103,
0,
101,
5,
"IMAGE"
],
[
212,
103,
0,
104,
0,
"IMAGE"
],
[
213,
104,
0,
101,
9,
"INT"
],
[
214,
104,
1,
101,
10,
"INT"
],
[
215,
101,
0,
3,
1,
"CONDITIONING"
],
[
216,
101,
1,
3,
2,
"CONDITIONING"
],
[
217,
101,
2,
3,
3,
"LATENT"
],
[
218,
3,
0,
8,
0,
"LATENT"
],
[
220,
110,
1,
109,
0,
"CLIP"
],
[
223,
110,
0,
112,
1,
"MODEL"
],
[
224,
109,
0,
112,
3,
"CONDITIONING"
],
[
229,
112,
0,
107,
0,
"SAM3_TRACK_DATA"
],
[
230,
107,
0,
101,
4,
"IMAGE"
],
[
231,
107,
1,
101,
6,
"IMAGE"
],
[
232,
115,
0,
116,
3,
"CONDITIONING"
],
[
233,
103,
0,
116,
0,
"IMAGE"
],
[
234,
116,
0,
107,
1,
"SAM3_TRACK_DATA"
],
[
236,
107,
0,
118,
0,
"IMAGE"
],
[
238,
110,
0,
116,
1,
"MODEL"
],
[
239,
113,
0,
119,
0,
"IMAGE"
],
[
240,
119,
0,
112,
0,
"IMAGE"
],
[
241,
119,
0,
101,
3,
"IMAGE"
],
[
242,
103,
0,
56,
1,
"IMAGE"
],
[
243,
110,
1,
115,
0,
"CLIP"
],
[
244,
39,
0,
120,
0,
"VAE"
],
[
245,
120,
0,
8,
1,
"VAE"
],
[
250,
107,
1,
117,
0,
"IMAGE"
],
[
251,
48,
0,
124,
0,
"MODEL"
],
[
252,
124,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.5644739300537773,
"offset": [
420.4411568691918,
395.94500319536957
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
WAN Context Windows (Manual)
時間軸方向のタイリング、あるいは context sliding のようなものです。
context_lengthを 81 にすると、内部で 81 フレームずつ区切って生成します。- そのままだと継ぎ目がはっきり見えてしまうので、のりしろとして
context_overlapに適当なフレーム数を設定します。
出力例
