什么是 Z-Image?
Z-Image 是,Alibaba / Tongyi-MAI 开发的 图像生成模型家族。

Z-Image 这个名字本身是模型群的总称所以稍微难以理解,但在这个页面,处理作为派生源的基础模型的 Z-Image。 (为了区别有时也被呼为 Z-Image-Base。)
Z-Image,作为(微调源的)基础模型,持有坦率的特性。
因为没有加入像 Z-Image-Turbo 那样的基于蒸馏・强化学习的安定化,所以 Seed 或初期噪声的区别容易反映在输出上,虽然创造性和变化广泛,但反面也是参数严峻且结果大幅摆动的困难模型。
模型的下载
- diffusion_models
- z_image_bf16.safetensors (12.3 GB)
- text_encoders
- qwen_3_4b.safetensors (8.04 GB)
- vae
- ae.safetensors(335 MB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── z_image_bf16.safetensors
├── 📂text_encoders/
│ └── qwen_3_4b.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.71437377929687
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
69.71437377929689
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
192.6578574704594
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-127.09132385253906,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A lone figure walking through dense morning fog in a pine forest, strong backlight piercing through trees, visible volumetric light beams, soft haze layering, atmospheric perspective. High dynamic range but gentle roll-off in highlights, rich shadow detail, filmic color grading. 35mm lens, slight handheld feel, cinematic realism, no text, no extra objects."
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
156.43924904699273,
391.3474029631308
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
steps: 虽然也根据采样器,但 30〜40 左右稍多的一方比较安定。
使用 Z-Image-Turbo 进行精炼
是将 Z-Image 的生成结果,用 Z-Image-Turbo 以短的步数进行精炼的方法。 目标是两全 Z-Image 的创造性和 Z-Image-Turbo 的品质的安定感。
虽然 image2image 也可以,但在这里稍微时髦地将采样分为 2 段试试吧。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 71,
"last_link_id": 126,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 63,
"type": "ModelSamplingAuraFlow",
"pos": [
983.4242401123047,
-103.90322308435528
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 110
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 64,
"type": "UNETLoader",
"pos": [
636.4279720527976,
-103.90322308435528
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
111
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107,
108
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A candid, high-end documentary photograph of an elderly man seated in the cool shade beneath a large tree, gently playing an acoustic guitar, relaxed posture with slightly hunched shoulders and weathered hands on the strings, a calm content expression and soft smile, sun-dappled light filtering through leaves creating natural mottled patterns across his face and clothing, warm late-afternoon ambience with subtle rim light along his hair and shoulders, shallow depth of field isolating him from a softly blurred park background, realistic skin texture and fine wrinkles, detailed wood grain on the guitar body with tasteful specular highlights, muted earthy color palette, filmic contrast with smooth highlight roll-off, natural bokeh, quiet peaceful mood, clean composition with the subject placed slightly off-center, no text, no logos, no extra people, ultra-realistic photographic detail.\n"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
106,
109
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1616.8647959733044,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1818.4798111474565,
188.1918182373047
],
"size": [
618.2016653999137,
726.9413389038397
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 60,
"type": "KSamplerAdvanced",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 111
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 107
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 106
},
{
"name": "latent_image",
"type": "LATENT",
"link": 105
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
103
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"enable",
1234,
"fixed",
30,
4,
"euler",
"simple",
0,
15,
"enable"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 62,
"type": "KSamplerAdvanced",
"pos": [
1257.809808875324,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 112
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 108
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 109
},
{
"name": "latent_image",
"type": "LATENT",
"link": 103
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"disable",
0,
"fixed",
8,
1,
"euler",
"simple",
4,
10000,
"disable"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
1337.0098088753239,
69.71437377929686
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-131.18940458472943,
-27.062555636842433
],
"size": [
330.23245000298687,
242.5974748774147
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── z_image_bf16.safetensors\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
103,
60,
0,
62,
3,
"LATENT"
],
[
105,
53,
0,
60,
3,
"LATENT"
],
[
106,
7,
0,
60,
2,
"CONDITIONING"
],
[
107,
6,
0,
60,
1,
"CONDITIONING"
],
[
108,
6,
0,
62,
1,
"CONDITIONING"
],
[
109,
7,
0,
62,
2,
"CONDITIONING"
],
[
110,
64,
0,
63,
0,
"MODEL"
],
[
111,
54,
0,
60,
0,
"MODEL"
],
[
112,
63,
0,
62,
0,
"MODEL"
],
[
113,
62,
0,
8,
0,
"LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909092,
"offset": [
71.64929504493259,
442.37738257756666
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
这次以前半 50%,后半 50% 分开。 (cf. 分割采样)
- 🟪 Z-Image : 30 steps 中的 15 steps
- 🟨 Z-Image-Turbo : 8 steps 中的 4 steps
比较


Z-Image-Fun-Controlnet-Union-2.1
Z-Image 用的 ControlNet 风补丁。
模型的下载
-
model_patches
📂ComfyUI/
└── 📂models/
└── 📂model_patches/
└── Z-Image-Fun-Controlnet-Union-2.1.safetensors
工作流

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 70,
"last_link_id": 124,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1543.4527151869986,
186
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 114
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1739.4158111474596,
186
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
108
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 62,
"type": "VAEEncode",
"pos": [
681.8294099357819,
843.6709899023072
],
"size": [
148.78459999999995,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 104
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 65,
"type": "QwenImageDiffsynthControlnet",
"pos": [
872.6726754282345,
186
],
"size": [
278.97390399018593,
138
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 108
},
{
"name": "model_patch",
"type": "MODEL_PATCH",
"link": 105
},
{
"name": "vae",
"type": "VAE",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 123
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
109
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "QwenImageDiffsynthControlnet"
},
"widgets_values": [
0.8
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "LoadImage",
"pos": [
-94.28508725933216,
698.0254172619354
],
"size": [
359.21847812500005,
533.241
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
111
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (138).png",
"image"
]
},
{
"id": 60,
"type": "PreviewImage",
"pos": [
872.6726754282345,
698.0254172619354
],
"size": [
254.1998000000001,
361.313
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 124
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 61,
"type": "VAELoader",
"pos": [
301.5928496741561,
868.703383195522
],
"size": [
235.45454545454538,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
104,
106,
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 63,
"type": "ModelPatchLoader",
"pos": [
552.2443630537383,
576.3798446215637
],
"size": [
278.3696468820435,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL_PATCH",
"type": "MODEL_PATCH",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "ModelPatchLoader"
},
"widgets_values": [
"Z-Image\\Z-Image-Fun-Controlnet-Union-2.1.safetensors"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 68,
"type": "ResizeImageMaskNode",
"pos": [
301.5928496741561,
698.0254172619354
],
"size": [
236.556640625,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 111
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
112,
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1.5,
"area"
]
},
{
"id": 64,
"type": "DepthAnythingV2Preprocessor",
"pos": [
571.9494396232819,
698.0254172619354
],
"size": [
258.6645703124999,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 112
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
123,
124
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "DepthAnythingV2Preprocessor"
},
"widgets_values": [
"depth_anything_v2_vitl.pth",
512
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"semi-3D toon illustration, clean studio look, smooth shading, soft global illumination, crisp outlines (subtle), high readability, simple but not flat, minimal background, white backdrop. a black cat peeking out from a blue shopping bag, one paw resting on the bag edge, a human hand holding the bag handles. cute face, large eyes, glossy but controlled highlights, natural proportions, clean materials"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.6492042321686
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"photorealisti, text, logo, watermark, signature, noise, jpeg artifacts"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1190.0496473027094,
186
],
"size": [
315,
262
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 109
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 110
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-159.02895116299885,
-24.088770293079595
],
"size": [
372.9441184528023,
255.0671111260163
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [Z-Image-Fun-Controlnet-Union-2.1.safetensors](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1/blob/main/Z-Image-Fun-Controlnet-Union-2.1.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂model_patches/\n │ └── Z-Image-Fun-Controlnet-Union-2.1.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
104,
61,
0,
62,
1,
"VAE"
],
[
105,
63,
0,
65,
1,
"MODEL_PATCH"
],
[
106,
61,
0,
65,
2,
"VAE"
],
[
108,
54,
0,
65,
0,
"MODEL"
],
[
109,
65,
0,
3,
0,
"MODEL"
],
[
110,
62,
0,
3,
3,
"LATENT"
],
[
111,
67,
0,
68,
0,
"IMAGE"
],
[
112,
68,
0,
64,
0,
"IMAGE"
],
[
113,
68,
0,
62,
0,
"IMAGE"
],
[
114,
61,
0,
8,
1,
"VAE"
],
[
123,
64,
0,
65,
3,
"IMAGE"
],
[
124,
64,
0,
60,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
373.20923781815407,
471.9741601249983
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟩 向
QwenImageDiffsynthControlnet追加模型和控制图像 - 🟩 在这个 workflow 用 Depth Anything V2 制作深度图。