SDXL

SDXLとは？

SDXL（正確には SDXL 1.0）は、Stable Diffusion 1.5 を開発した Stability AI による正統後継モデルです。（Stable Diffusion 2.1 という系統もありましたが、まあ…性能がね……）

Stable Diffusion からの大きな違いとして、だいたい次の2点があります。

base と refiner の二段構成

基本的な text2image は base モデルだけで完結します。
その後、refiner モデルで image2image することで、ディテールや質感を整える「仕上げ」を行う設計になっています。

学習時の解像度の変更

Stable Diffusion 1.5
- 512 × 512px の正方形画像を中心に学習
SDXL
- 1024 × 1024px を中心に、さまざまなアスペクト比で学習
- 元から解像度の高い画像生成や、縦長・横長の構図にもある程度対応しやすくなっています。

モデルのダウンロード

📂ComfyUI/
  └── 📂models/
      └── 📂checkpoints/
          ├── sd_xl_base_1.0_0.9vae.safetensors
          └── sd_xl_refiner_1.0_0.9vae.safetensors

baseモデルだけで text2image

まずは base だけで、シンプルに text2image してみましょう。

SD1.5 の text2image の workflow で、Checkpoint を SDXL base に差し替えるだけで基本的な生成はできます。

SDXL_text2image_base.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 10,
  "last_link_id": 9,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        582.1350317382813,
        606.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1,
      "offset": [
        57.89999999999999,
        -86
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

解像度は、おおよそ 1M ピクセル（1024 × 1024px 前後）を目安にします。
- 例：1024 × 1024 / 896 × 1152 / 1152 × 896 など

CLIPTextEncodeSDXL

SDXL base はテキストエンコーダとして、2種類の CLIP（OpenCLIP-ViT/G, CLIP-ViT/L）を組み合わせた構成になっています。

ComfyUI には、それぞれの CLIP に別々のテキストを入力できるノードもありますが、先に言っておくと 使う必要はありません。

SDXL_text2image_base_CLIPTextEncodeSDXL.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 11,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863.0000000000001,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 11
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 13
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        567.8709090909092,
        607.7089478601074
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        47.38181818181817,
        189.8718181818187
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            10,
            12
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 11,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        253.1908935375169
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 12
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "text, watermark, worst quality",
        "text, watermark, worst quality"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 10,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        -106.71203512855799
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            11
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "RAW photo,vase,lily flower,brully background",
        "RAW photo,vase,lily flower,brully background"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      4,
      1,
      10,
      0,
      "CLIP"
    ],
    [
      11,
      10,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      12,
      4,
      1,
      11,
      0,
      "CLIP"
    ],
    [
      13,
      11,
      0,
      3,
      2,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        52.61818181818183,
        206.712035128558
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

両方の CLIP に同じプロンプトを入れた場合、結果としては CLIP Text Encode ノードを使ったときとほぼ同じ挙動になります。
実験の結果でも、両方の CLIP に同じテキストを入れたときが、もっとも安定した出力になりやすいことが分かっています。

base + refiner

次に、base が生成した画像を refiner で仕上げてみます。

base で生成 → refiner で image2image

SDXL base と SDXL refiner は、同じ latent 表現を使います。そのため、base で生成した latent を、そのまま refiner 側の KSampler に入力して image2image できます。

SDXL_text2image_base-refiner.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 17,
  "last_link_id": 27,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        283.39399999999995
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            10
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 13
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            23
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 11,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 23
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 16
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 17
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        10,
        8,
        "euler",
        "normal",
        0.25
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            17
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      3,
      0,
      11,
      3,
      "LATENT"
    ],
    [
      13,
      11,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      16,
      12,
      0,
      11,
      1,
      "CONDITIONING"
    ],
    [
      17,
      13,
      0,
      11,
      2,
      "CONDITIONING"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      23,
      14,
      0,
      11,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650711,
      "offset": [
        564.1167298218162,
        -39.22301019595235
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1078.7664667739357,
          564.0676093271748
        ],
        "linkIds": [
          10
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          720.9764667739358,
          834.9976093271746
        ],
        "linkIds": [
          10
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 10,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

1. 🟪 SDXL base で通常どおり text2image（latent を出力）
1. 🟨 その latent を、SDXL refiner を使った KSampler に接続
1. 🟨 低い denoise（例：0.2〜0.3）で image2image
- ディテールを増やすことに特化しているため、本当に少しで十分です。

もともとの base の絵柄は活かしつつ、細部や質感だけを refiner に整えてもらうイメージです。

サンプリング途中で切り替え（KSampler Advanced）

もう少しスマートにやる方法として、サンプリング途中で base → refiner に切り替えるやり方もあります。 KSampler (Advanced)ノードを使います。

SDXL_text2image_base-refiner_Advanced.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 21,
  "last_link_id": 40,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 37
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            28
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            29
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            30
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            33
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            34
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            38
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            40
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 18,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        253.50849999999977
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 40
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 28
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 29
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 30
        },
        {
          "name": "end_at_step",
          "type": "INT",
          "widget": {
            "name": "end_at_step"
          },
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "enable",
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "enable"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 21,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 38
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 33
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 34
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "start_at_step",
          "type": "INT",
          "widget": {
            "name": "start_at_step"
          },
          "link": 39
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            37
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "disable",
        0,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "disable"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 19,
      "type": "PrimitiveInt",
      "pos": [
        474.75654307507364,
        702.5630454545455
      ],
      "size": [
        210,
        82
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "INT",
          "type": "INT",
          "links": [
            31,
            39
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveInt"
      },
      "widgets_values": [
        15,
        "fixed"
      ],
      "color": "#223",
      "bgcolor": "#335"
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      28,
      6,
      0,
      18,
      1,
      "CONDITIONING"
    ],
    [
      29,
      7,
      0,
      18,
      2,
      "CONDITIONING"
    ],
    [
      30,
      5,
      0,
      18,
      3,
      "LATENT"
    ],
    [
      31,
      19,
      0,
      18,
      4,
      "INT"
    ],
    [
      33,
      12,
      0,
      21,
      1,
      "CONDITIONING"
    ],
    [
      34,
      13,
      0,
      21,
      2,
      "CONDITIONING"
    ],
    [
      35,
      18,
      0,
      21,
      3,
      "LATENT"
    ],
    [
      37,
      21,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      38,
      14,
      0,
      21,
      0,
      "MODEL"
    ],
    [
      39,
      19,
      0,
      21,
      4,
      "INT"
    ],
    [
      40,
      4,
      0,
      18,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        497.94863087903946,
        -35.01040000000002
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1052.4388634681484,
          619.3758737899849
        ],
        "linkIds": [
          35
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          735.2514479910659,
          833.4949797253714
        ],
        "linkIds": [
          35
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 35,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

🟪 中盤までは SDXL base でサンプリング
🟨 残りのステップを SDXL refiner に切り替えサンプリング
🟦 int ノードで切り替えるタイミングを設定します。

個人的には image2image のほうが分かりやすいので好みではありますが、このように、1回のサンプリングパスの中で base と refiner の切り替えができるというのは覚えておいてもいいかもしれません。

refinerはいらないが、「refiner的な考え方」は大事

refiner レスな SDXLモデル

SDXL をベースにした派生モデル（コミュニティモデルや商用モデル）は数多くありますが、多くのモデルは refiner を使わなくても十分な画質が出るように調整 されています。

少し強い言い方をすれば、「refiner で後処理をする」という設計は、当時の base 単体の性能を補うための妥協策でもありました。

refiner的な考え方

とはいえ、「複数のモデルにまたがって1枚の画像を仕上げる」という考え方自体は、今でも十分通用する発想です。

絵柄は好みだが、プロンプトにはあまり従ってくれないモデル
逆に、プロンプトにはよく従うが、絵柄が好みではないモデル

といった「帯に短し」のようなモデルは、いくらでもあります。

こうした場面では、SDXL の refiner 的な考え方が役に立ちます。

構図やプロンプト再現性に優れたモデルで、まずベースとなる画像を生成する
その画像を、絵柄が好みのモデルで image2image して仕上げる

という二段構成にすることで、「構図は A モデル」「絵柄は B モデル」といった、いいとこ取りの workflow を組むことができます。

SDXL における base / refiner は、その一つの具体例に過ぎません。「複数モデルをどう掛け合わせるか」自分なりの組み合わせを探してみてください。

SDXLとは？

モデルのダウンロード

baseモデルだけで text2image

CLIPTextEncodeSDXL

base + refiner

base で生成 → refiner で image2image

サンプリング途中で切り替え（KSampler Advanced）

refinerはいらないが、「refiner的な考え方」は大事

refiner レスな SDXLモデル

refiner的な考え方

jsonコピーボタンとは？

修正・誤字報告

記事リクエスト

感想・その他

ありがとうございます