Z-Image

What is Z-Image?

Z-Image is a family of image generation models by Alibaba / Tongyi-MAI.

The name Z-Image refers to the entire model family, which can be confusing, but this page covers Z-Image as the base model (sometimes referred to as Z-Image-Base to distinguish it).

Z-Image has straightforward characteristics as a base model (source for fine-tuning).

Unlike Z-Image-Turbo which is stabilized by distillation and reinforcement learning, Z-Image directly reflects differences in seeds and initial noise in its output. While this offers high creativity and variation, it is also a difficult model where results can vary significantly and parameters are sensitive.

Model Download

diffusion_models
- z_image_bf16.safetensors (12.3 GB)
text_encoders
- qwen_3_4b.safetensors (8.04 GB)
vae
- ae.safetensors (335 MB)

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── z_image_bf16.safetensors
    ├── 📂text_encoders/
    │   └── qwen_3_4b.safetensors
    └── 📂vae/
        └── ae.safetensors

text2image

Z-Image.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 59,
  "last_link_id": 102,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        603.9390258789062,
        45.71437377929687
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        267.6552734375,
        45.71437377929687
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        977.9548217773436,
        69.71437377929689
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        597.2695922851562,
        584.737218645886
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptySD3LatentImage"
      },
      "widgets_values": [
        1104,
        1472,
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1443.3798111474612,
        192.6578574704594
      ],
      "size": [
        535.0608199082301,
        683.4737593989388
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -127.09132385253906,
        -13.402286529541016
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        30,
        4,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        405.392333984375
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "A lone figure walking through dense morning fog in a pine forest, strong backlight piercing through trees, visible volumetric light beams, soft haze layering, atmospheric perspective. High dynamic range but gentle roll-off in highlights, rich shadow detail, filmic color grading. 35mm lens, slight handheld feel, cinematic realism, no text, no extra objects."
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.7513148009015777,
      "offset": [
        156.43924904699273,
        391.3474029631308
      ]
    },
    "frontendVersion": "1.37.11",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

steps : Depending on the sampler, 30-40 steps (slightly higher) is more stable.

Refine with Z-Image-Turbo

This method uses Z-Image-Turbo to refine the generation results of Z-Image in a few steps. It aims to combine the creativity of Z-Image with the stability of Z-Image-Turbo.

You can use image2image, but let's try splitting the sampling into two stages for a smarter approach.

Z-Image_refine-turbo.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 71,
  "last_link_id": 126,
  "nodes": [
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ]
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        597.2695922851562,
        584.737218645886
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            105
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptySD3LatentImage"
      },
      "widgets_values": [
        1104,
        1472,
        1
      ]
    },
    {
      "id": 63,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        983.4242401123047,
        -103.90322308435528
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 110
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            112
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 64,
      "type": "UNETLoader",
      "pos": [
        636.4279720527976,
        -103.90322308435528
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_turbo_bf16.safetensors",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        267.6552734375,
        45.714373779296864
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        603.9390258789062,
        45.71437377929687
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            111
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107,
            108
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "A candid, high-end documentary photograph of an elderly man seated in the cool shade beneath a large tree, gently playing an acoustic guitar, relaxed posture with slightly hunched shoulders and weathered hands on the strings, a calm content expression and soft smile, sun-dappled light filtering through leaves creating natural mottled patterns across his face and clothing, warm late-afternoon ambience with subtle rim light along his hair and shoulders, shallow depth of field isolating him from a softly blurred park background, realistic skin texture and fine wrinkles, detailed wood grain on the guitar body with tasteful specular highlights, muted earthy color palette, filmic contrast with smooth highlight roll-off, natural bokeh, quiet peaceful mood, clean composition with the subject placed slightly off-center, no text, no logos, no extra people, ultra-realistic photographic detail.\n"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        405.392333984375
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            106,
            109
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "bad quality, oversaturated, visual artifacts, bad anatomy, deformed hands, facial distortion"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1616.8647959733044,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 113
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1818.4798111474565,
        188.1918182373047
      ],
      "size": [
        618.2016653999137,
        726.9413389038397
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 60,
      "type": "KSamplerAdvanced",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        334
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 111
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 106
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 105
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.11.1",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "enable",
        1234,
        "fixed",
        30,
        4,
        "euler",
        "simple",
        0,
        15,
        "enable"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 62,
      "type": "KSamplerAdvanced",
      "pos": [
        1257.809808875324,
        188.1918182373047
      ],
      "size": [
        315,
        334
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 112
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 108
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 109
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.11.1",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "disable",
        0,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        4,
        10000,
        "disable"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        1337.0098088753239,
        69.71437377929686
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -131.18940458472943,
        -27.062555636842433
      ],
      "size": [
        330.23245000298687,
        242.5974748774147
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   ├── z_image_bf16.safetensors\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ]
    }
  ],
  "links": [
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      103,
      60,
      0,
      62,
      3,
      "LATENT"
    ],
    [
      105,
      53,
      0,
      60,
      3,
      "LATENT"
    ],
    [
      106,
      7,
      0,
      60,
      2,
      "CONDITIONING"
    ],
    [
      107,
      6,
      0,
      60,
      1,
      "CONDITIONING"
    ],
    [
      108,
      6,
      0,
      62,
      1,
      "CONDITIONING"
    ],
    [
      109,
      7,
      0,
      62,
      2,
      "CONDITIONING"
    ],
    [
      110,
      64,
      0,
      63,
      0,
      "MODEL"
    ],
    [
      111,
      54,
      0,
      60,
      0,
      "MODEL"
    ],
    [
      112,
      63,
      0,
      62,
      0,
      "MODEL"
    ],
    [
      113,
      62,
      0,
      8,
      0,
      "LATENT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909092,
      "offset": [
        71.64929504493259,
        442.37738257756666
      ]
    },
    "frontendVersion": "1.37.11",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Here we split it into the first 50% and the last 50%. (cf. Split Sampling)

🟪 Z-Image : 15 steps out of 30 steps
🟨 Z-Image-Turbo : 4 steps out of 8 steps

Comparison

Z-Image-Fun-Controlnet-Union-2.1

A ControlNet-like patch for Z-Image.

Model Download

model_patches
- Z-Image-Fun-Controlnet-Union-2.1.safetensors (6.71 GB)

📂ComfyUI/
└── 📂models/
    └── 📂model_patches/
        └── Z-Image-Fun-Controlnet-Union-2.1.safetensors

workflow

Z-Image-Fun-Controlnet-Union-2.1.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 70,
  "last_link_id": 124,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1543.4527151869986,
        186
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 114
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1739.4158111474596,
        186
      ],
      "size": [
        535.0608199082301,
        683.4737593989388
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        603.9390258789062,
        45.71437377929687
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 62,
      "type": "VAEEncode",
      "pos": [
        681.8294099357819,
        843.6709899023072
      ],
      "size": [
        148.78459999999995,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 113
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 65,
      "type": "QwenImageDiffsynthControlnet",
      "pos": [
        872.6726754282345,
        186
      ],
      "size": [
        278.97390399018593,
        138
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 108
        },
        {
          "name": "model_patch",
          "type": "MODEL_PATCH",
          "link": 105
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 106
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 123
        },
        {
          "name": "mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            109
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "QwenImageDiffsynthControlnet"
      },
      "widgets_values": [
        0.8
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        267.6552734375,
        45.714373779296864
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 67,
      "type": "LoadImage",
      "pos": [
        -94.28508725933216,
        698.0254172619354
      ],
      "size": [
        359.21847812500005,
        533.241
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            111
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (138).png",
        "image"
      ]
    },
    {
      "id": 60,
      "type": "PreviewImage",
      "pos": [
        872.6726754282345,
        698.0254172619354
      ],
      "size": [
        254.1998000000001,
        361.313
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 124
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 61,
      "type": "VAELoader",
      "pos": [
        301.5928496741561,
        868.703383195522
      ],
      "size": [
        235.45454545454538,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            104,
            106,
            114
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 63,
      "type": "ModelPatchLoader",
      "pos": [
        552.2443630537383,
        576.3798446215637
      ],
      "size": [
        278.3696468820435,
        58
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL_PATCH",
          "type": "MODEL_PATCH",
          "links": [
            105
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "ModelPatchLoader"
      },
      "widgets_values": [
        "Z-Image\\Z-Image-Fun-Controlnet-Union-2.1.safetensors"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 68,
      "type": "ResizeImageMaskNode",
      "pos": [
        301.5928496741561,
        698.0254172619354
      ],
      "size": [
        236.556640625,
        106
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 111
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            112,
            113
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.11.1",
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale total pixels",
        1.5,
        "area"
      ]
    },
    {
      "id": 64,
      "type": "DepthAnythingV2Preprocessor",
      "pos": [
        571.9494396232819,
        698.0254172619354
      ],
      "size": [
        258.6645703124999,
        82
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 112
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            123,
            124
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui_controlnet_aux",
        "ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
        "Node name for S&R": "DepthAnythingV2Preprocessor"
      },
      "widgets_values": [
        "depth_anything_v2_vitl.pth",
        512
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "semi-3D toon illustration, clean studio look, smooth shading, soft global illumination, crisp outlines (subtle), high readability, simple but not flat, minimal background, white backdrop. a black cat peeking out from a blue shopping bag, one paw resting on the bag edge, a human hand holding the bag handles. cute face, large eyes, glossy but controlled highlights, natural proportions, clean materials"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        405.6492042321686
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "photorealisti, text, logo, watermark, signature, noise, jpeg artifacts"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1190.0496473027094,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 109
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 110
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        30,
        4,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -159.02895116299885,
        -24.088770293079595
      ],
      "size": [
        372.9441184528023,
        255.0671111260163
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors)\n- [Z-Image-Fun-Controlnet-Union-2.1.safetensors](https://huggingface.co/alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1/blob/main/Z-Image-Fun-Controlnet-Union-2.1.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_bf16.safetensors\n      ├── 📂model_patches/\n      │   └── Z-Image-Fun-Controlnet-Union-2.1.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      104,
      61,
      0,
      62,
      1,
      "VAE"
    ],
    [
      105,
      63,
      0,
      65,
      1,
      "MODEL_PATCH"
    ],
    [
      106,
      61,
      0,
      65,
      2,
      "VAE"
    ],
    [
      108,
      54,
      0,
      65,
      0,
      "MODEL"
    ],
    [
      109,
      65,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      110,
      62,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      111,
      67,
      0,
      68,
      0,
      "IMAGE"
    ],
    [
      112,
      68,
      0,
      64,
      0,
      "IMAGE"
    ],
    [
      113,
      68,
      0,
      62,
      0,
      "IMAGE"
    ],
    [
      114,
      61,
      0,
      8,
      1,
      "VAE"
    ],
    [
      123,
      64,
      0,
      65,
      3,
      "IMAGE"
    ],
    [
      124,
      64,
      0,
      60,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        373.20923781815407,
        471.9741601249983
      ]
    },
    "frontendVersion": "1.37.11",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟩 Add model and control image to QwenImageDiffsynthControlnet.
🟩 In this workflow, Depth Anything V2 is used to create a depth map.

Z-Image

What is Z-Image?

Model Download

text2image

Refine with Z-Image-Turbo

Z-Image-Fun-Controlnet-Union-2.1

Model Download

workflow

Reference

What is the JSON copy button?

This page has an issue!

Please explain more!

Thank you