SDXLとは?
SDXL(正確には SDXL 1.0)は、Stable Diffusion 1.5 を開発した Stability AI による正統後継モデルです。(Stable Diffusion 2.1 という系統もありましたが、まあ…性能がね……)
Stable Diffusion からの大きな違いとして、だいたい次の2点があります。
base と refiner の二段構成
- 基本的な text2image は
baseモデルだけで完結します。 - その後、
refinerモデルで image2image することで、ディテールや質感を整える「仕上げ」を行う設計になっています。
学習時の解像度の変更
- Stable Diffusion 1.5
- 512 × 512px の正方形画像を中心に学習
- SDXL
- 1024 × 1024px を中心に、さまざまなアスペクト比で学習
- 元から解像度の高い画像生成や、縦長・横長の構図にもある程度対応しやすくなっています。
モデルのダウンロード
📂ComfyUI/
└── 📂models/
└── 📂checkpoints/
├── sd_xl_base_1.0_0.9vae.safetensors
└── sd_xl_refiner_1.0_0.9vae.safetensors
baseモデルだけで text2image
まずは base だけで、シンプルに text2image してみましょう。
SD1.5 の text2image の workflow で、Checkpoint を SDXL base に差し替えるだけで基本的な生成はできます。

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 10,
"last_link_id": 9,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
582.1350317382813,
606.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
186
],
"size": [
315,
262
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
411.95503173828126,
151.0030493164063
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
8
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
416.1970166015625,
392.37848510742185
],
"size": [
410.75801513671877,
158.82607910156253
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark, worst quality"
]
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
8,
4,
2,
8,
1,
"VAE"
],
[
9,
8,
0,
9,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1,
"offset": [
57.89999999999999,
-86
]
},
"frontendVersion": "1.33.10",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG"
},
"version": 0.4
}
-
解像度は、おおよそ 1M ピクセル(1024 × 1024px 前後)を目安にします。
- 例:1024 × 1024 / 896 × 1152 / 1152 × 896 など
CLIPTextEncodeSDXL
SDXL base はテキストエンコーダとして、2種類の CLIP(OpenCLIP-ViT/G, CLIP-ViT/L)を組み合わせた構成になっています。
ComfyUI には、それぞれの CLIP に別々のテキストを入力できるノードもありますが、先に言っておくと 使う必要はありません。

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 11,
"last_link_id": 13,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863.0000000000001,
186
],
"size": [
315,
262
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 11
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 13
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
567.8709090909092,
607.7089478601074
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
47.38181818181817,
189.8718181818187
],
"size": [
315,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
10,
12
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
8
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
]
},
{
"id": 11,
"type": "CLIPTextEncodeSDXL",
"pos": [
412.69090909090914,
253.1908935375169
],
"size": [
400,
286
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 12
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
13
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "CLIPTextEncodeSDXL"
},
"widgets_values": [
1024,
1024,
0,
0,
1024,
1024,
"text, watermark, worst quality",
"text, watermark, worst quality"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 10,
"type": "CLIPTextEncodeSDXL",
"pos": [
412.69090909090914,
-106.71203512855799
],
"size": [
400,
286
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
11
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "CLIPTextEncodeSDXL"
},
"widgets_values": [
1024,
1024,
0,
0,
1024,
1024,
"RAW photo,vase,lily flower,brully background",
"RAW photo,vase,lily flower,brully background"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
8,
4,
2,
8,
1,
"VAE"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
4,
1,
10,
0,
"CLIP"
],
[
11,
10,
0,
3,
1,
"CONDITIONING"
],
[
12,
4,
1,
11,
0,
"CLIP"
],
[
13,
11,
0,
3,
2,
"CONDITIONING"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
52.61818181818183,
206.712035128558
]
},
"frontendVersion": "1.33.10",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG"
},
"version": 0.4
}
- 両方の CLIP に同じプロンプトを入れた場合、結果としては
CLIP Text Encodeノードを使ったときとほぼ同じ挙動になります。 - 実験の結果でも、両方の CLIP に同じテキストを入れたときが、もっとも安定した出力になりやすいことが分かっています。
base + refiner
次に、base が生成した画像を refiner で仕上げてみます。
base で生成 → refiner で image2image
SDXL base と SDXL refiner は、同じ latent 表現を使います。 そのため、base で生成した latent を、そのまま refiner 側の KSampler に入力して image2image できます。

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 17,
"last_link_id": 27,
"nodes": [
{
"id": 9,
"type": "SaveImage",
"pos": [
1323.7480000000007,
892.7600000000001
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 15,
"type": "PrimitiveStringMultiline",
"pos": [
-7.033930879039095,
515.072432907588
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
19,
21
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 16,
"type": "PrimitiveStringMultiline",
"pos": [
-5.702930879038938,
697.419432907589
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
20,
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"text, watermark, worst quality"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
742.0110000000002,
283.39399999999995
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
10
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1085.3795000000005,
892.7600000000001
],
"size": [
210,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 13
},
{
"name": "vae",
"type": "VAE",
"link": 18
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": [],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 14,
"type": "CheckpointLoaderSimple",
"pos": [
43.41999999999924,
930.9400000000011
],
"size": [
315,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
23
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
14,
15
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
18
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_refiner_1.0_0.9vae.safetensors"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 11,
"type": "KSampler",
"pos": [
742.0110000000002,
892.7600000000001
],
"size": [
315,
262
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 23
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 16
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 17
},
{
"name": "latent_image",
"type": "LATENT",
"link": 10
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
13
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
10,
8,
"euler",
"normal",
0.25
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
439.9365430750737,
540.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.95113860092863,
371.08248510742186
],
"size": [
270.805404474145,
88
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 20
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.23966775646977,
217
],
"size": [
269.51687531860387,
88
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 19
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 12,
"type": "CLIPTextEncode",
"pos": [
413.0448260445553,
857.9590000000014
],
"size": [
271.71171703051834,
88
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 14
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 21
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
16
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
413,
1012.4284851074226
],
"size": [
271.75654307507364,
88
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 15
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 22
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
17
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
3,
0,
11,
3,
"LATENT"
],
[
13,
11,
0,
8,
0,
"LATENT"
],
[
14,
14,
1,
12,
0,
"CLIP"
],
[
15,
14,
1,
13,
0,
"CLIP"
],
[
16,
12,
0,
11,
1,
"CONDITIONING"
],
[
17,
13,
0,
11,
2,
"CONDITIONING"
],
[
18,
14,
2,
8,
1,
"VAE"
],
[
19,
15,
0,
6,
1,
"STRING"
],
[
20,
16,
0,
7,
1,
"STRING"
],
[
21,
15,
0,
12,
1,
"STRING"
],
[
22,
16,
0,
13,
1,
"STRING"
],
[
23,
14,
0,
11,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650711,
"offset": [
564.1167298218162,
-39.22301019595235
]
},
"frontendVersion": "1.33.10",
"reroutes": [
{
"id": 1,
"pos": [
1078.7664667739357,
564.0676093271748
],
"linkIds": [
10
]
},
{
"id": 2,
"parentId": 1,
"pos": [
720.9764667739358,
834.9976093271746
],
"linkIds": [
10
]
}
],
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG",
"linkExtensions": [
{
"id": 10,
"parentId": 2
}
]
},
"version": 0.4
}
-
- 🟪 SDXL base で通常どおり text2image(latent を出力)
-
- 🟨 その latent を、SDXL refiner を使った KSampler に接続
-
- 🟨 低い
denoise(例:0.2〜0.3)で image2image
- ディテールを増やすことに特化しているため、本当に少しで十分です。
- 🟨 低い
もともとの base の絵柄は活かしつつ、細部や質感だけを refiner に整えてもらうイメージです。
サンプリング途中で切り替え(KSampler Advanced)
もう少しスマートにやる方法として、サンプリング途中で base → refiner に切り替えるやり方もあります。 KSampler (Advanced)ノード を使います。

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 21,
"last_link_id": 40,
"nodes": [
{
"id": 9,
"type": "SaveImage",
"pos": [
1323.7480000000007,
892.7600000000001
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 15,
"type": "PrimitiveStringMultiline",
"pos": [
-7.033930879039095,
515.072432907588
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
19,
21
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 16,
"type": "PrimitiveStringMultiline",
"pos": [
-5.702930879038938,
697.419432907589
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
20,
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"text, watermark, worst quality"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1085.3795000000005,
892.7600000000001
],
"size": [
210,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 37
},
{
"name": "vae",
"type": "VAE",
"link": 18
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": [],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.23966775646977,
217
],
"size": [
269.51687531860387,
88
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 19
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
28
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.95113860092863,
371.08248510742186
],
"size": [
270.805404474145,
88
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 20
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
29
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
439.9365430750737,
540.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
30
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 12,
"type": "CLIPTextEncode",
"pos": [
413.0448260445553,
857.9590000000014
],
"size": [
271.71171703051834,
88
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 14
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 21
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
33
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
413,
1012.4284851074226
],
"size": [
271.75654307507364,
88
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 15
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 22
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
34
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 14,
"type": "CheckpointLoaderSimple",
"pos": [
43.41999999999924,
930.9400000000011
],
"size": [
315,
98
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
38
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
14,
15
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
18
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_refiner_1.0_0.9vae.safetensors"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
40
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 18,
"type": "KSamplerAdvanced",
"pos": [
742.0110000000002,
253.50849999999977
],
"size": [
304.748046875,
334
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 40
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 28
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 29
},
{
"name": "latent_image",
"type": "LATENT",
"link": 30
},
{
"name": "end_at_step",
"type": "INT",
"widget": {
"name": "end_at_step"
},
"link": 31
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"enable",
12345,
"fixed",
20,
8,
"euler",
"normal",
0,
10000,
"enable"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 21,
"type": "KSamplerAdvanced",
"pos": [
742.0110000000002,
892.7600000000001
],
"size": [
304.748046875,
334
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 38
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 33
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 34
},
{
"name": "latent_image",
"type": "LATENT",
"link": 35
},
{
"name": "start_at_step",
"type": "INT",
"widget": {
"name": "start_at_step"
},
"link": 39
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
37
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"disable",
0,
"fixed",
20,
8,
"euler",
"normal",
0,
10000,
"disable"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 19,
"type": "PrimitiveInt",
"pos": [
474.75654307507364,
702.5630454545455
],
"size": [
210,
82
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "INT",
"type": "INT",
"links": [
31,
39
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveInt"
},
"widgets_values": [
15,
"fixed"
],
"color": "#223",
"bgcolor": "#335"
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
14,
14,
1,
12,
0,
"CLIP"
],
[
15,
14,
1,
13,
0,
"CLIP"
],
[
18,
14,
2,
8,
1,
"VAE"
],
[
19,
15,
0,
6,
1,
"STRING"
],
[
20,
16,
0,
7,
1,
"STRING"
],
[
21,
15,
0,
12,
1,
"STRING"
],
[
22,
16,
0,
13,
1,
"STRING"
],
[
28,
6,
0,
18,
1,
"CONDITIONING"
],
[
29,
7,
0,
18,
2,
"CONDITIONING"
],
[
30,
5,
0,
18,
3,
"LATENT"
],
[
31,
19,
0,
18,
4,
"INT"
],
[
33,
12,
0,
21,
1,
"CONDITIONING"
],
[
34,
13,
0,
21,
2,
"CONDITIONING"
],
[
35,
18,
0,
21,
3,
"LATENT"
],
[
37,
21,
0,
8,
0,
"LATENT"
],
[
38,
14,
0,
21,
0,
"MODEL"
],
[
39,
19,
0,
21,
4,
"INT"
],
[
40,
4,
0,
18,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
497.94863087903946,
-35.01040000000002
]
},
"frontendVersion": "1.33.10",
"reroutes": [
{
"id": 1,
"pos": [
1052.4388634681484,
619.3758737899849
],
"linkIds": [
35
]
},
{
"id": 2,
"parentId": 1,
"pos": [
735.2514479910659,
833.4949797253714
],
"linkIds": [
35
]
}
],
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG",
"linkExtensions": [
{
"id": 35,
"parentId": 2
}
]
},
"version": 0.4
}
- 🟪 中盤までは SDXL base でサンプリング
- 🟨 残りのステップを SDXL refiner に切り替えサンプリング
- 🟦
intノードで切り替えるタイミングを設定します。
個人的には image2image のほうが分かりやすいので好みではありますが、このように、1回のサンプリングパスの中で base と refiner の切り替えができるというのは覚えておいてもいいかもしれません。
refinerはいらないが、「refiner的な考え方」は大事
refiner レスな SDXLモデル
SDXL をベースにした派生モデル(コミュニティモデルや商用モデル)は数多くありますが、多くのモデルは refiner を使わなくても十分な画質が出るように調整 されています。
少し強い言い方をすれば、「refiner で後処理をする」という設計は、当時の base 単体の性能を補うための妥協策でもありました。
refiner的な考え方
とはいえ、「複数のモデルにまたがって1枚の画像を仕上げる」という考え方自体は、今でも十分通用する発想です。
- 絵柄は好みだが、プロンプトにはあまり従ってくれないモデル
- 逆に、プロンプトにはよく従うが、絵柄が好みではないモデル
といった「帯に短し」のようなモデルは、いくらでもあります。
こうした場面では、SDXL の refiner 的な考え方が役に立ちます。
- 構図やプロンプト再現性に優れたモデルで、まずベースとなる画像を生成する
- その画像を、絵柄が好みのモデルで image2image して仕上げる
という二段構成にすることで、「構図は A モデル」「絵柄は B モデル」といった、いいとこ取りの workflow を組むことができます。
SDXL における base / refiner は、その一つの具体例に過ぎません。 「複数モデルをどう掛け合わせるか」自分なりの組み合わせを探してみてください。