What is Z-Image?

Z-Image is a family of image generation models by Alibaba / Tongyi-MAI.

The name Z-Image refers to the entire model family, which can be confusing, but this page covers Z-Image as the base model (sometimes referred to as Z-Image-Base to distinguish it).

Z-Image has straightforward characteristics as a base model (source for fine-tuning).

Unlike Z-Image-Turbo which is stabilized by distillation and reinforcement learning, Z-Image directly reflects differences in seeds and initial noise in its output. While this offers high creativity and variation, it is also a difficult model where results can vary significantly and parameters are sensitive.


Model Download

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── z_image_bf16.safetensors
    ├── 📂text_encoders/
    │   └── qwen_3_4b.safetensors
    └── 📂vae/
        └── ae.safetensors

text2image

Z-Image.json
  • steps : Depending on the sampler, 30-40 steps (slightly higher) is more stable.

Refine with Z-Image-Turbo

This method uses Z-Image-Turbo to refine the generation results of Z-Image in a few steps. It aims to combine the creativity of Z-Image with the stability of Z-Image-Turbo.

You can use image2image, but let's try splitting the sampling into two stages for a smarter approach.

Z-Image_refine-turbo.json

Here we split it into the first 50% and the last 50%. (cf. Split Sampling)

  • 🟪 Z-Image : 15 steps out of 30 steps
  • 🟨 Z-Image-Turbo : 4 steps out of 8 steps

Comparison

Z-Image only
Z-Image only
Z-Image + Turbo
Z-Image + Turbo

Z-Image-Fun-Controlnet-Union-2.1

A ControlNet-like patch for Z-Image.

Model Download

📂ComfyUI/
└── 📂models/
    └── 📂model_patches/
        └── Z-Image-Fun-Controlnet-Union-2.1.safetensors

workflow

Z-Image-Fun-Controlnet-Union-2.1.json
  • 🟩 Add model and control image to QwenImageDiffsynthControlnet.
  • 🟩 In this workflow, Depth Anything V2 is used to create a depth map.

Reference