Z-Imageとは?
Z-Image は、Alibaba / Tongyi-MAI による 画像生成モデルファミリー です。

Z-Image という名前自体がモデル群の総称なので少し分かりにくいですが、このページでは、派生元となる ベースモデルとしての Z-Image を扱います。 (区別のために Z-Image-Base と呼ばれることもあります。)
Z-Image は、(ファインチューニング元となる)ベースモデルとして、素直な特性を持っています。
Z-Image-Turbo のような蒸留・強化学習による安定化が入っていないため、シードや初期ノイズの違いが出力に反映されやすく、創造性とバリエーションが広い反面、パラメータにシビアで結果が大きく振れる難しいモデルでもあります。
モデルのダウンロード
- diffusion_models
- z_image_bf16.safetensors (12.3 GB)
- text_encoders
- qwen_3_4b.safetensors (8.04 GB)
- vae
- ae.safetensors(335 MB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── z_image_bf16.safetensors
├── 📂text_encoders/
│ └── qwen_3_4b.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.71437377929687
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
69.71437377929689
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
192.6578574704594
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-127.09132385253906,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A lone figure walking through dense morning fog in a pine forest, strong backlight piercing through trees, visible volumetric light beams, soft haze layering, atmospheric perspective. High dynamic range but gentle roll-off in highlights, rich shadow detail, filmic color grading. 35mm lens, slight handheld feel, cinematic realism, no text, no extra objects."
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
156.43924904699273,
391.3474029631308
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
steps: サンプラーにもよりますが、30〜40 ほどと少し多めのほうが安定します
Z-Image-Turbo によるリファイン
Z-Image の生成結果を、Z-Image-Turbo で短いステップでリファインする方法です。 Z-Image の創造性と、Z-Image-Turbo の品質の安定感の両取りを狙います。
image2image してもいいですが、ここでは少しオシャレにサンプリングを2段に分けてみましょう。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 71,
"last_link_id": 126,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 63,
"type": "ModelSamplingAuraFlow",
"pos": [
983.4242401123047,
-103.90322308435528
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 110
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 64,
"type": "UNETLoader",
"pos": [
636.4279720527976,
-103.90322308435528
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
111
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107,
108
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A candid, high-end documentary photograph of an elderly man seated in the cool shade beneath a large tree, gently playing an acoustic guitar, relaxed posture with slightly hunched shoulders and weathered hands on the strings, a calm content expression and soft smile, sun-dappled light filtering through leaves creating natural mottled patterns across his face and clothing, warm late-afternoon ambience with subtle rim light along his hair and shoulders, shallow depth of field isolating him from a softly blurred park background, realistic skin texture and fine wrinkles, detailed wood grain on the guitar body with tasteful specular highlights, muted earthy color palette, filmic contrast with smooth highlight roll-off, natural bokeh, quiet peaceful mood, clean composition with the subject placed slightly off-center, no text, no logos, no extra people, ultra-realistic photographic detail.\n"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
106,
109
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1616.8647959733044,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1818.4798111474565,
188.1918182373047
],
"size": [
618.2016653999137,
726.9413389038397
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 60,
"type": "KSamplerAdvanced",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 111
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 107
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 106
},
{
"name": "latent_image",
"type": "LATENT",
"link": 105
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
103
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"enable",
1234,
"fixed",
30,
4,
"euler",
"simple",
0,
15,
"enable"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 62,
"type": "KSamplerAdvanced",
"pos": [
1257.809808875324,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 112
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 108
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 109
},
{
"name": "latent_image",
"type": "LATENT",
"link": 103
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"disable",
0,
"fixed",
8,
1,
"euler",
"simple",
4,
10000,
"disable"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
1337.0098088753239,
69.71437377929686
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-131.18940458472943,
-27.062555636842433
],
"size": [
330.23245000298687,
242.5974748774147
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── z_image_bf16.safetensors\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
103,
60,
0,
62,
3,
"LATENT"
],
[
105,
53,
0,
60,
3,
"LATENT"
],
[
106,
7,
0,
60,
2,
"CONDITIONING"
],
[
107,
6,
0,
60,
1,
"CONDITIONING"
],
[
108,
6,
0,
62,
1,
"CONDITIONING"
],
[
109,
7,
0,
62,
2,
"CONDITIONING"
],
[
110,
64,
0,
63,
0,
"MODEL"
],
[
111,
54,
0,
60,
0,
"MODEL"
],
[
112,
63,
0,
62,
0,
"MODEL"
],
[
113,
62,
0,
8,
0,
"LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909092,
"offset": [
71.64929504493259,
442.37738257756666
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
今回は前半50%、後半50%で分けます。 (cf. サンプリングを分割する)
- 🟪 Z-Image : 30 steps のうち 15 steps
- 🟨 Z-Image-Turbo : 8 steps のうち 4 steps
比較


Z-Image-Fun-Controlnet-Union-2.1
Z-Image 用の ControlNet 風パッチです。
モデルのダウンロード
-
model_patches
📂ComfyUI/
└── 📂models/
└── 📂model_patches/
└── Z-Image-Fun-Controlnet-Union-2.1.safetensors
workflow

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 70,
"last_link_id": 124,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1543.4527151869986,
186
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 114
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1739.4158111474596,
186
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
108
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 62,
"type": "VAEEncode",
"pos": [
681.8294099357819,
843.6709899023072
],
"size": [
148.78459999999995,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 104
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 65,
"type": "QwenImageDiffsynthControlnet",
"pos": [
872.6726754282345,
186
],
"size": [
278.97390399018593,
138
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 108
},
{
"name": "model_patch",
"type": "MODEL_PATCH",
"link": 105
},
{
"name": "vae",
"type": "VAE",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 123
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
109
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "QwenImageDiffsynthControlnet"
},
"widgets_values": [
0.8
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "LoadImage",
"pos": [
-94.28508725933216,
698.0254172619354
],
"size": [
359.21847812500005,
533.241
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
111
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (138).png",
"image"
]
},
{
"id": 60,
"type": "PreviewImage",
"pos": [
872.6726754282345,
698.0254172619354
],
"size": [
254.1998000000001,
361.313
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 124
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 61,
"type": "VAELoader",
"pos": [
301.5928496741561,
868.703383195522
],
"size": [
235.45454545454538,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
104,
106,
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 63,
"type": "ModelPatchLoader",
"pos": [
552.2443630537383,
576.3798446215637
],
"size": [
278.3696468820435,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL_PATCH",
"type": "MODEL_PATCH",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "ModelPatchLoader"
},
"widgets_values": [
"Z-Image\\Z-Image-Fun-Controlnet-Union-2.1.safetensors"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 68,
"type": "ResizeImageMaskNode",
"pos": [
301.5928496741561,
698.0254172619354
],
"size": [
236.556640625,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 111
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
112,
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1.5,
"area"
]
},
{
"id": 64,
"type": "DepthAnythingV2Preprocessor",
"pos": [
571.9494396232819,
698.0254172619354
],
"size": [
258.6645703124999,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 112
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
123,
124
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "DepthAnythingV2Preprocessor"
},
"widgets_values": [
"depth_anything_v2_vitl.pth",
512
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"semi-3D toon illustration, clean studio look, smooth shading, soft global illumination, crisp outlines (subtle), high readability, simple but not flat, minimal background, white backdrop. a black cat peeking out from a blue shopping bag, one paw resting on the bag edge, a human hand holding the bag handles. cute face, large eyes, glossy but controlled highlights, natural proportions, clean materials"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.6492042321686
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"photorealisti, text, logo, watermark, signature, noise, jpeg artifacts"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1190.0496473027094,
186
],
"size": [
315,
262
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 109
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 110
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-159.02895116299885,
-24.088770293079595
],
"size": [
372.9441184528023,
255.0671111260163
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [Z-Image-Fun-Controlnet-Union-2.1.safetensors](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1/blob/main/Z-Image-Fun-Controlnet-Union-2.1.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂model_patches/\n │ └── Z-Image-Fun-Controlnet-Union-2.1.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
104,
61,
0,
62,
1,
"VAE"
],
[
105,
63,
0,
65,
1,
"MODEL_PATCH"
],
[
106,
61,
0,
65,
2,
"VAE"
],
[
108,
54,
0,
65,
0,
"MODEL"
],
[
109,
65,
0,
3,
0,
"MODEL"
],
[
110,
62,
0,
3,
3,
"LATENT"
],
[
111,
67,
0,
68,
0,
"IMAGE"
],
[
112,
68,
0,
64,
0,
"IMAGE"
],
[
113,
68,
0,
62,
0,
"IMAGE"
],
[
114,
61,
0,
8,
1,
"VAE"
],
[
123,
64,
0,
65,
3,
"IMAGE"
],
[
124,
64,
0,
60,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
373.20923781815407,
471.9741601249983
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟩
QwenImageDiffsynthControlnetにモデルと制御画像を追加 - 🟩 この workflow では Depth Anything V2 で深度マップを作成します。