SDXL

什么是 SDXL？

SDXL（准确地说是 SDXL 1.0），是由开发了 Stable Diffusion 1.5 的 Stability AI 推出的正统后继模型。（虽说也有叫 Stable Diffusion 2.1 的系统，但那个……性能嘛……）

作为与 Stable Diffusion 的巨大区别，大概有以下 2 点。

base 和 refiner 的两段构成

基本的 text2image 仅用 base 模型就可以完成。
之后，设计为通过用 refiner 模型进行 image2image，来进行调整细节和质感的“完稿”。

学习时的分辨率的变更

Stable Diffusion 1.5
- 以 512 × 512px 的正方形图像为中心进行学习
SDXL
- 以 1024 × 1024px 为中心，以各种各样的纵横比进行学习
- 原本就更容易对应分辨率高的图像生成，和纵长・横长的构图。

模型的下载

📂ComfyUI/
  └── 📂models/
      └── 📂checkpoints/
          ├── sd_xl_base_1.0_0.9vae.safetensors
          └── sd_xl_refiner_1.0_0.9vae.safetensors

仅用 base 模型 text2image

首先仅用 base，简单地 text2image 看看吧。

在 SD1.5 的 text2image 的工作流中，只要将 Checkpoint 替换为 SDXL base 就能进行基本的生成。

SDXL_text2image_base.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 10,
  "last_link_id": 9,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        582.1350317382813,
        606.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1,
      "offset": [
        57.89999999999999,
        -86
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

分辨率以大体 1M 像素（1024 × 1024px 前后）为标准。
- 例：1024 × 1024 / 896 × 1152 / 1152 × 896 等

CLIPTextEncodeSDXL

SDXL base 作为文本编码器，采用了组合 2 种 CLIP（OpenCLIP-ViT/G, CLIP-ViT/L）的构成。

ComfyUI 中也有可以向各个 CLIP 输入不同文本的节点，但先说好 没有使用的必要。

SDXL_text2image_base_CLIPTextEncodeSDXL.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 11,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863.0000000000001,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 11
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 13
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        567.8709090909092,
        607.7089478601074
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        47.38181818181817,
        189.8718181818187
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            10,
            12
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            8
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    },
    {
      "id": 11,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        253.1908935375169
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 12
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "text, watermark, worst quality",
        "text, watermark, worst quality"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 10,
      "type": "CLIPTextEncodeSDXL",
      "pos": [
        412.69090909090914,
        -106.71203512855799
      ],
      "size": [
        400,
        286
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            11
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "CLIPTextEncodeSDXL"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        0,
        1024,
        1024,
        "RAW photo,vase,lily flower,brully background",
        "RAW photo,vase,lily flower,brully background"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      4,
      1,
      10,
      0,
      "CLIP"
    ],
    [
      11,
      10,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      12,
      4,
      1,
      11,
      0,
      "CLIP"
    ],
    [
      13,
      11,
      0,
      3,
      2,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        52.61818181818183,
        206.712035128558
      ]
    },
    "frontendVersion": "1.33.10",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG"
  },
  "version": 0.4
}

如果向两个 CLIP 输入了相同的提示词，结果将变为与使用了 CLIP Text Encode 节点时几乎相同的举动。
在实验结果中也明白，向两个 CLIP 输入相同文本时，最容易成为安定的输出。

base + refiner

接下来，试着用 refiner 完成 base 生成的图像。

用 base 生成 → 用 refiner image2image

SDXL base 和 SDXL refiner 使用相同的 latent 表现。因此，可以将通过 base 生成的 latent，原样输入到 refiner 侧的 KSampler 进行 image2image。

SDXL_text2image_base-refiner.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 17,
  "last_link_id": 27,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        283.39399999999995
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            10
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 13
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            23
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 11,
      "type": "KSampler",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 23
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 16
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 17
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            13
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        10,
        8,
        "euler",
        "normal",
        0.25
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            17
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      3,
      0,
      11,
      3,
      "LATENT"
    ],
    [
      13,
      11,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      16,
      12,
      0,
      11,
      1,
      "CONDITIONING"
    ],
    [
      17,
      13,
      0,
      11,
      2,
      "CONDITIONING"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      23,
      14,
      0,
      11,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650711,
      "offset": [
        564.1167298218162,
        -39.22301019595235
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1078.7664667739357,
          564.0676093271748
        ],
        "linkIds": [
          10
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          720.9764667739358,
          834.9976093271746
        ],
        "linkIds": [
          10
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 10,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

1. 🟪 用 SDXL base 照常进行 text2image（输出 latent）
1. 🟨 将那个 latent 连接到使用了 SDXL refiner 的 KSampler
1. 🟨 以低的 denoise（例：0.2〜0.3）进行 image2image
- 因为专注于增加细节，所以真的一点点就足够了。

形象上是活用原本 base 的画风，只让 refiner 调整细部和质感。

在采样途中切换（KSampler Advanced）

作为稍微聪明一点的做法，也有在采样途中从 base → refiner 切换的做法。使用 KSampler (Advanced) 节点。

SDXL_text2image_base-refiner_Advanced.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 21,
  "last_link_id": 40,
  "nodes": [
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1323.7480000000007,
        892.7600000000001
      ],
      "size": [
        408.737603500472,
        456.22967321788406
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 15,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -7.033930879039095,
        515.072432907588
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            19,
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "RAW photo,vase,lily flower,brully background"
      ]
    },
    {
      "id": 16,
      "type": "PrimitiveStringMultiline",
      "pos": [
        -5.702930879038938,
        697.419432907589
      ],
      "size": [
        365.394,
        120.13999999999999
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            20,
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveStringMultiline"
      },
      "widgets_values": [
        "text, watermark, worst quality"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1085.3795000000005,
        892.7600000000001
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 37
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": [],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.23966775646977,
        217
      ],
      "size": [
        269.51687531860387,
        88
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            28
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.95113860092863,
        371.08248510742186
      ],
      "size": [
        270.805404474145,
        88
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            29
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        439.9365430750737,
        540.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            30
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 12,
      "type": "CLIPTextEncode",
      "pos": [
        413.0448260445553,
        857.9590000000014
      ],
      "size": [
        271.71171703051834,
        88
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 14
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            33
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        1012.4284851074226
      ],
      "size": [
        271.75654307507364,
        88
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 15
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            34
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 14,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.41999999999924,
        930.9400000000011
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            38
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            14,
            15
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": [
            18
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_refiner_1.0_0.9vae.safetensors"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        43.10000000000001,
        310.8900000000004
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            40
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 18,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        253.50849999999977
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 40
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 28
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 29
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 30
        },
        {
          "name": "end_at_step",
          "type": "INT",
          "widget": {
            "name": "end_at_step"
          },
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "enable",
        12345,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "enable"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 21,
      "type": "KSamplerAdvanced",
      "pos": [
        742.0110000000002,
        892.7600000000001
      ],
      "size": [
        304.748046875,
        334
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 38
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 33
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 34
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "start_at_step",
          "type": "INT",
          "widget": {
            "name": "start_at_step"
          },
          "link": 39
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            37
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "disable",
        0,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0,
        10000,
        "disable"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 19,
      "type": "PrimitiveInt",
      "pos": [
        474.75654307507364,
        702.5630454545455
      ],
      "size": [
        210,
        82
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "INT",
          "type": "INT",
          "links": [
            31,
            39
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PrimitiveInt"
      },
      "widgets_values": [
        15,
        "fixed"
      ],
      "color": "#223",
      "bgcolor": "#335"
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      14,
      14,
      1,
      12,
      0,
      "CLIP"
    ],
    [
      15,
      14,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      18,
      14,
      2,
      8,
      1,
      "VAE"
    ],
    [
      19,
      15,
      0,
      6,
      1,
      "STRING"
    ],
    [
      20,
      16,
      0,
      7,
      1,
      "STRING"
    ],
    [
      21,
      15,
      0,
      12,
      1,
      "STRING"
    ],
    [
      22,
      16,
      0,
      13,
      1,
      "STRING"
    ],
    [
      28,
      6,
      0,
      18,
      1,
      "CONDITIONING"
    ],
    [
      29,
      7,
      0,
      18,
      2,
      "CONDITIONING"
    ],
    [
      30,
      5,
      0,
      18,
      3,
      "LATENT"
    ],
    [
      31,
      19,
      0,
      18,
      4,
      "INT"
    ],
    [
      33,
      12,
      0,
      21,
      1,
      "CONDITIONING"
    ],
    [
      34,
      13,
      0,
      21,
      2,
      "CONDITIONING"
    ],
    [
      35,
      18,
      0,
      21,
      3,
      "LATENT"
    ],
    [
      37,
      21,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      38,
      14,
      0,
      21,
      0,
      "MODEL"
    ],
    [
      39,
      19,
      0,
      21,
      4,
      "INT"
    ],
    [
      40,
      4,
      0,
      18,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        497.94863087903946,
        -35.01040000000002
      ]
    },
    "frontendVersion": "1.33.10",
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1052.4388634681484,
          619.3758737899849
        ],
        "linkIds": [
          35
        ]
      },
      {
        "id": 2,
        "parentId": 1,
        "pos": [
          735.2514479910659,
          833.4949797253714
        ],
        "linkIds": [
          35
        ]
      }
    ],
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true,
    "workflowRendererVersion": "LG",
    "linkExtensions": [
      {
        "id": 35,
        "parentId": 2
      }
    ]
  },
  "version": 0.4
}

🟪 直到中盘用 SDXL base 进行采样
🟨 将剩下的步数切换给 SDXL refiner 采样
🟦 在 int 节点设定切换的时机。

虽然我个人因为 image2image 更容易理解所以喜欢，但像这样，可以在 1 次采样路径中进行 base 和 refiner 的切换这点，记住也未尝不可。

虽不需要 refiner，但“refiner 式的思考方式”很重要

refiner-less 的 SDXL 模型

虽然有许许多多以 SDXL 为基础的派生模型（社区模型和商用模型），但很多模型被 调整为即使不使用 refiner 也能出充分的画质。

说得稍微强硬点，“用 refiner 进行后处理”这种设计，也是为了弥补当时 base 单体性能的妥协策略。

refiner 式的思考方式

虽说如此，“跨越多个模型来完成 1 张图像”这个思考方式本身，是至今也十分通用的想法。

虽然喜欢画风，但不怎么服从提示词的模型
相反，虽然很服从提示词，但画风不喜欢的模型

像这种“差点意思”的模型，要多少有多少。

在这样的场面，SDXL 的 refiner 式的思考方式就会派上用场。

用构图和提示词再现性优秀的模型，首先生成作为基础的图像
将那个图像，用画风喜欢的模型进行 image2image 来完稿

通过做成这种二段构成，可以组建“构图用 A 模型”“画风用 B 模型”这种，取长补短的工作流。

SDXL 中的 base / refiner，不过是其中一个具体例子。请寻找“如何组合多个模型”属于你自己的组合。

什么是 SDXL？

模型的下载

仅用 base 模型 text2image

CLIPTextEncodeSDXL

base + refiner

用 base 生成 → 用 refiner image2image

在采样途中切换（KSampler Advanced）

虽不需要 refiner，但“refiner 式的思考方式”很重要

refiner-less 的 SDXL 模型

refiner 式的思考方式

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！