SDXL

What is SDXL?

SDXL (specifically SDXL 1.0) is the legitimate successor model by Stability AI, who developed Stable Diffusion 1.5. (There was a lineage called Stable Diffusion 2.1, but well... the performance was...)

There are roughly two main differences from Stable Diffusion:

Two-stage configuration of base and refiner

Basic text2image is completed only with the base model.
After that, it is designed to perform "finishing" to adjust details and texture by performing image2image with the refiner model.

Change in learning resolution

Stable Diffusion 1.5
- Trained mainly on 512 x 512px square images
SDXL
- Trained mainly on 1024 x 1024px with various aspect ratios
- It handles high-resolution image generation and portrait/landscape compositions more easily from the beginning.

Model Download

📂ComfyUI/
  └── 📂models/
      └── 📂checkpoints/
          ├── sd_xl_base_1.0_0.9vae.safetensors
          └── sd_xl_refiner_1.0_0.9vae.safetensors

text2image with only base model

First, let's simply do text2image with only base.

Basic generation is possible just by replacing the Checkpoint with SDXL base in the text2image workflow of SD1.5.

SDXL_text2image_base.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 10,
  "last_link_id": 9,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        582.1350317382813,
        606.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1,
      "offset": [
        57.89999999999999,
        -86
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

Set the resolution to approximately 1M pixels (around 1024 x 1024px).
- Examples: 1024 x 1024 / 896 x 1152 / 1152 x 896, etc.

CLIPTextEncodeSDXL

SDXL base is configured by combining two types of CLIP (OpenCLIP-ViT/G, CLIP-ViT/L) as text encoders.

ComfyUI has nodes that can input separate texts into each CLIP, but let me say first, you don't need to use them.

SDXL_text2image_base_CLIPTextEncodeSDXL.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 11,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863.0000000000001,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 11
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 13
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        567.8709090909092,
        607.7089478601074
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        47.38181818181817,
        189.8718181818187
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            10,
            12
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 11,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        253.1908935375169
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 12
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "text, watermark, worst quality",
        "text, watermark, worst quality"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 10,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        -106.71203512855799
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            11
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "RAW photo,vase,lily flower,brully background",
        "RAW photo,vase,lily flower,brully background"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      4,
      1,
      10,
      0,
      "CLIP"
    ],
    [
      11,
      10,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      12,
      4,
      1,
      11,
      0,
      "CLIP"
    ],
    [
      13,
      11,
      0,
      3,
      2,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        52.61818181818183,
        206.712035128558
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

If you input the same prompt to both CLIPs, the behavior will result in almost the same as when using the CLIP Text Encode node.
It is known from experimental results that the output tends to be most stable when the same text is input to both CLIPs.

base + refiner

Next, let's finish the image generated by base with refiner.

Generated by base → image2image with refiner

SDXL base and SDXL refiner use the same latent representation. Therefore, the latent generated by base can be input to the KSampler of the refiner side as is for image2image.

SDXL_text2image_base-refiner.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 17,
  "last_link_id": 27,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        283.39399999999995
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            10
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 13
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            23
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 11,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 23
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 16
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 17
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        10,
        8,
        "euler",
        "normal",
        0.25
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            17
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      3,
      0,
      11,
      3,
      "LATENT"
    ],
    [
      13,
      11,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      16,
      12,
      0,
      11,
      1,
      "CONDITIONING"
    ],
    [
      17,
      13,
      0,
      11,
      2,
      "CONDITIONING"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      23,
      14,
      0,
      11,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650711,
      "offset": [
        564.1167298218162,
        -39.22301019595235
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1078.7664667739357,
          564.0676093271748
        ],
        "linkIds": [
          10
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          720.9764667739358,
          834.9976093271746
        ],
        "linkIds": [
          10
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 10,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

1. 🟪 text2image as usual with SDXL base (output latent)
1. 🟨 Connect that latent to KSampler using SDXL refiner
1. 🟨 image2image with low denoise (e.g., 0.2 to 0.3)
- Since it specializes in increasing details, really a little is enough.

The image is that the original style of base is utilized, and only the details and textures are adjusted by refiner.

Switching during sampling (KSampler Advanced)

As a slightly smarter way, there is also a way to switch from base to refiner during sampling. Use KSampler (Advanced) Node.

SDXL_text2image_base-refiner_Advanced.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 21,
  "last_link_id": 40,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 37
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            28
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            29
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            30
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            33
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            34
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            38
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            40
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 18,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        253.50849999999977
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 40
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 28
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 29
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 30
        },
        {
          "name": "end_at_step",
          "type": "INT",
          "widget": {
            "name": "end_at_step"
          },
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "enable",
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "enable"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 21,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 38
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 33
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 34
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "start_at_step",
          "type": "INT",
          "widget": {
            "name": "start_at_step"
          },
          "link": 39
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            37
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "disable",
        0,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "disable"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 19,
      "type": "PrimitiveInt",
      "pos": [
        474.75654307507364,
        702.5630454545455
      ],
      "size": [
        210,
        82
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "INT",
          "type": "INT",
          "links": [
            31,
            39
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveInt"
      },
      "widgets_values": [
        15,
        "fixed"
      ],
      "color": "#223",
      "bgcolor": "#335"
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      28,
      6,
      0,
      18,
      1,
      "CONDITIONING"
    ],
    [
      29,
      7,
      0,
      18,
      2,
      "CONDITIONING"
    ],
    [
      30,
      5,
      0,
      18,
      3,
      "LATENT"
    ],
    [
      31,
      19,
      0,
      18,
      4,
      "INT"
    ],
    [
      33,
      12,
      0,
      21,
      1,
      "CONDITIONING"
    ],
    [
      34,
      13,
      0,
      21,
      2,
      "CONDITIONING"
    ],
    [
      35,
      18,
      0,
      21,
      3,
      "LATENT"
    ],
    [
      37,
      21,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      38,
      14,
      0,
      21,
      0,
      "MODEL"
    ],
    [
      39,
      19,
      0,
      21,
      4,
      "INT"
    ],
    [
      40,
      4,
      0,
      18,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        497.94863087903946,
        -35.01040000000002
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1052.4388634681484,
          619.3758737899849
        ],
        "linkIds": [
          35
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          735.2514479910659,
          833.4949797253714
        ],
        "linkIds": [
          35
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 35,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

🟪 Sample with SDXL base until the middle
🟨 Switch remaining steps to SDXL refiner to sample
🟦 Set the timing to switch with int node.

Personally, I prefer image2image because it is easier to understand, but it might be good to remember that switching between base and refiner within one sampling pass is possible.

refiner is not necessary, but "refiner-like thinking" is important

Refiner-less SDXL Models

There are many derivative models based on SDXL (community models and commercial models), but many models are adjusted to produce sufficient image quality without using refiner.

To put it a bit strongly, the design of "post-processing with refiner" was also a compromise to compensate for the performance of base alone at that time.

Refiner-like Thinking

However, the idea of "finishing one image across multiple models" itself is still a valid idea.

Models whose style is good but do not follow prompts very well
Conversely, models that follow prompts well but whose style is not preferred

There are plenty of such "not quite right" models.

In such situations, SDXL's refiner-like thinking is useful.

First, generate a base image with a model excellent in composition and prompt reproducibility
Finish that image by image2image-ing with a model whose style you prefer

By making a two-stage configuration like this, you can build a "best of both worlds" workflow where "Composition is model A" and "Style is model B".

base / refiner in SDXL is just one specific example of that. Please look for your own combination of "how to multiply multiple models".

What is SDXL?

Model Download

text2image with only base model

CLIPTextEncodeSDXL

base + refiner

Generated by base → image2image with refiner

Switching during sampling (KSampler Advanced)

refiner is not necessary, but "refiner-like thinking" is important

Refiner-less SDXL Models

Refiner-like Thinking

What is the JSON copy button?

This page has an issue!

Please explain more!

Feedback / Other

Thank you