What is Z-Image?
Z-Image is a family of image generation models by Alibaba / Tongyi-MAI.

The name Z-Image refers to the entire model family, which can be confusing, but this page covers Z-Image as the base model (sometimes referred to as Z-Image-Base to distinguish it).
Z-Image has straightforward characteristics as a base model (source for fine-tuning).
Unlike Z-Image-Turbo which is stabilized by distillation and reinforcement learning, Z-Image directly reflects differences in seeds and initial noise in its output. While this offers high creativity and variation, it is also a difficult model where results can vary significantly and parameters are sensitive.
Model Download
- diffusion_models
- z_image_bf16.safetensors (12.3 GB)
- text_encoders
- qwen_3_4b.safetensors (8.04 GB)
- vae
- ae.safetensors (335 MB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── z_image_bf16.safetensors
├── 📂text_encoders/
│ └── qwen_3_4b.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.71437377929687
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
69.71437377929689
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
192.6578574704594
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-127.09132385253906,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A lone figure walking through dense morning fog in a pine forest, strong backlight piercing through trees, visible volumetric light beams, soft haze layering, atmospheric perspective. High dynamic range but gentle roll-off in highlights, rich shadow detail, filmic color grading. 35mm lens, slight handheld feel, cinematic realism, no text, no extra objects."
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
156.43924904699273,
391.3474029631308
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
steps: Depending on the sampler, 30-40 steps (slightly higher) is more stable.
Refine with Z-Image-Turbo
This method uses Z-Image-Turbo to refine the generation results of Z-Image in a few steps. It aims to combine the creativity of Z-Image with the stability of Z-Image-Turbo.
You can use image2image, but let's try splitting the sampling into two stages for a smarter approach.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 71,
"last_link_id": 126,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
584.737218645886
],
"size": [
237,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 63,
"type": "ModelSamplingAuraFlow",
"pos": [
983.4242401123047,
-103.90322308435528
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 110
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 64,
"type": "UNETLoader",
"pos": [
636.4279720527976,
-103.90322308435528
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
111
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107,
108
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A candid, high-end documentary photograph of an elderly man seated in the cool shade beneath a large tree, gently playing an acoustic guitar, relaxed posture with slightly hunched shoulders and weathered hands on the strings, a calm content expression and soft smile, sun-dappled light filtering through leaves creating natural mottled patterns across his face and clothing, warm late-afternoon ambience with subtle rim light along his hair and shoulders, shallow depth of field isolating him from a softly blurred park background, realistic skin texture and fine wrinkles, detailed wood grain on the guitar body with tasteful specular highlights, muted earthy color palette, filmic contrast with smooth highlight roll-off, natural bokeh, quiet peaceful mood, clean composition with the subject placed slightly off-center, no text, no logos, no extra people, ultra-realistic photographic detail.\n"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.392333984375
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
106,
109
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1616.8647959733044,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1818.4798111474565,
188.1918182373047
],
"size": [
618.2016653999137,
726.9413389038397
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 60,
"type": "KSamplerAdvanced",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 111
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 107
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 106
},
{
"name": "latent_image",
"type": "LATENT",
"link": 105
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
103
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"enable",
1234,
"fixed",
30,
4,
"euler",
"simple",
0,
15,
"enable"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 62,
"type": "KSamplerAdvanced",
"pos": [
1257.809808875324,
188.1918182373047
],
"size": [
315,
334
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 112
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 108
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 109
},
{
"name": "latent_image",
"type": "LATENT",
"link": 103
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"disable",
0,
"fixed",
8,
1,
"euler",
"simple",
4,
10000,
"disable"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
1337.0098088753239,
69.71437377929686
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-131.18940458472943,
-27.062555636842433
],
"size": [
330.23245000298687,
242.5974748774147
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── z_image_bf16.safetensors\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
103,
60,
0,
62,
3,
"LATENT"
],
[
105,
53,
0,
60,
3,
"LATENT"
],
[
106,
7,
0,
60,
2,
"CONDITIONING"
],
[
107,
6,
0,
60,
1,
"CONDITIONING"
],
[
108,
6,
0,
62,
1,
"CONDITIONING"
],
[
109,
7,
0,
62,
2,
"CONDITIONING"
],
[
110,
64,
0,
63,
0,
"MODEL"
],
[
111,
54,
0,
60,
0,
"MODEL"
],
[
112,
63,
0,
62,
0,
"MODEL"
],
[
113,
62,
0,
8,
0,
"LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909092,
"offset": [
71.64929504493259,
442.37738257756666
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Here we split it into the first 50% and the last 50%. (cf. Split Sampling)
- 🟪 Z-Image : 15 steps out of 30 steps
- 🟨 Z-Image-Turbo : 4 steps out of 8 steps
Comparison


Z-Image-Fun-Controlnet-Union-2.1
A ControlNet-like patch for Z-Image.
Model Download
-
model_patches
📂ComfyUI/
└── 📂models/
└── 📂model_patches/
└── Z-Image-Fun-Controlnet-Union-2.1.safetensors
workflow

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 70,
"last_link_id": 124,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1543.4527151869986,
186
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 114
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1739.4158111474596,
186
],
"size": [
535.0608199082301,
683.4737593989388
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
603.9390258789062,
45.71437377929687
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
108
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 62,
"type": "VAEEncode",
"pos": [
681.8294099357819,
843.6709899023072
],
"size": [
148.78459999999995,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 104
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 65,
"type": "QwenImageDiffsynthControlnet",
"pos": [
872.6726754282345,
186
],
"size": [
278.97390399018593,
138
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 108
},
{
"name": "model_patch",
"type": "MODEL_PATCH",
"link": 105
},
{
"name": "vae",
"type": "VAE",
"link": 106
},
{
"name": "image",
"type": "IMAGE",
"link": 123
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
109
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "QwenImageDiffsynthControlnet"
},
"widgets_values": [
0.8
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
267.6552734375,
45.714373779296864
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "LoadImage",
"pos": [
-94.28508725933216,
698.0254172619354
],
"size": [
359.21847812500005,
533.241
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
111
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (138).png",
"image"
]
},
{
"id": 60,
"type": "PreviewImage",
"pos": [
872.6726754282345,
698.0254172619354
],
"size": [
254.1998000000001,
361.313
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 124
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 61,
"type": "VAELoader",
"pos": [
301.5928496741561,
868.703383195522
],
"size": [
235.45454545454538,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
104,
106,
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 63,
"type": "ModelPatchLoader",
"pos": [
552.2443630537383,
576.3798446215637
],
"size": [
278.3696468820435,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL_PATCH",
"type": "MODEL_PATCH",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "ModelPatchLoader"
},
"widgets_values": [
"Z-Image\\Z-Image-Fun-Controlnet-Union-2.1.safetensors"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 68,
"type": "ResizeImageMaskNode",
"pos": [
301.5928496741561,
698.0254172619354
],
"size": [
236.556640625,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 111
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
112,
113
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.11.1",
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1.5,
"area"
]
},
{
"id": 64,
"type": "DepthAnythingV2Preprocessor",
"pos": [
571.9494396232819,
698.0254172619354
],
"size": [
258.6645703124999,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 112
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
123,
124
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "DepthAnythingV2Preprocessor"
},
"widgets_values": [
"depth_anything_v2_vitl.pth",
512
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"semi-3D toon illustration, clean studio look, smooth shading, soft global illumination, crisp outlines (subtle), high readability, simple but not flat, minimal background, white backdrop. a black cat peeking out from a blue shopping bag, one paw resting on the bag edge, a human hand holding the bag handles. cute face, large eyes, glossy but controlled highlights, natural proportions, clean materials"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415,
405.6492042321686
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"photorealisti, text, logo, watermark, signature, noise, jpeg artifacts"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1190.0496473027094,
186
],
"size": [
315,
262
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 109
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 110
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
30,
4,
"euler",
"simple",
1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-159.02895116299885,
-24.088770293079595
],
"size": [
372.9441184528023,
255.0671111260163
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [Z-Image-Fun-Controlnet-Union-2.1.safetensors](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1/blob/main/Z-Image-Fun-Controlnet-Union-2.1.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_bf16.safetensors\n ├── 📂model_patches/\n │ └── Z-Image-Fun-Controlnet-Union-2.1.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
104,
61,
0,
62,
1,
"VAE"
],
[
105,
63,
0,
65,
1,
"MODEL_PATCH"
],
[
106,
61,
0,
65,
2,
"VAE"
],
[
108,
54,
0,
65,
0,
"MODEL"
],
[
109,
65,
0,
3,
0,
"MODEL"
],
[
110,
62,
0,
3,
3,
"LATENT"
],
[
111,
67,
0,
68,
0,
"IMAGE"
],
[
112,
68,
0,
64,
0,
"IMAGE"
],
[
113,
68,
0,
62,
0,
"IMAGE"
],
[
114,
61,
0,
8,
1,
"VAE"
],
[
123,
64,
0,
65,
3,
"IMAGE"
],
[
124,
64,
0,
60,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
373.20923781815407,
471.9741601249983
]
},
"frontendVersion": "1.37.11",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟩 Add model and control image to
QwenImageDiffsynthControlnet. - 🟩 In this workflow, Depth Anything V2 is used to create a depth map.