What is SDXL?
SDXL (specifically SDXL 1.0) is the legitimate successor model by Stability AI, who developed Stable Diffusion 1.5. (There was a lineage called Stable Diffusion 2.1, but well... the performance was...)
There are roughly two main differences from Stable Diffusion:
Two-stage configuration of base and refiner
- Basic text2image is completed only with the
basemodel. - After that, it is designed to perform "finishing" to adjust details and texture by performing image2image with the
refinermodel.
Change in learning resolution
- Stable Diffusion 1.5
- Trained mainly on 512 x 512px square images
- SDXL
- Trained mainly on 1024 x 1024px with various aspect ratios
- It handles high-resolution image generation and portrait/landscape compositions more easily from the beginning.
Model Download
📂ComfyUI/
└── 📂models/
└── 📂checkpoints/
├── sd_xl_base_1.0_0.9vae.safetensors
└── sd_xl_refiner_1.0_0.9vae.safetensors
text2image with only base model
First, let's simply do text2image with only base.
Basic generation is possible just by replacing the Checkpoint with SDXL base in the text2image workflow of SD1.5.

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 10,
"last_link_id": 9,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
582.1350317382813,
606.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
186
],
"size": [
315,
262
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
411.95503173828126,
151.0030493164063
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
8
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
416.1970166015625,
392.37848510742185
],
"size": [
410.75801513671877,
158.82607910156253
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark, worst quality"
]
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
8,
4,
2,
8,
1,
"VAE"
],
[
9,
8,
0,
9,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1,
"offset": [
57.89999999999999,
-86
]
},
"frontendVersion": "1.33.10",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG"
},
"version": 0.4
}
-
Set the resolution to approximately 1M pixels (around 1024 x 1024px).
- Examples: 1024 x 1024 / 896 x 1152 / 1152 x 896, etc.
CLIPTextEncodeSDXL
SDXL base is configured by combining two types of CLIP (OpenCLIP-ViT/G, CLIP-ViT/L) as text encoders.
ComfyUI has nodes that can input separate texts into each CLIP, but let me say first, you don't need to use them.

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 11,
"last_link_id": 13,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863.0000000000001,
186
],
"size": [
315,
262
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 11
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 13
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
567.8709090909092,
607.7089478601074
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
47.38181818181817,
189.8718181818187
],
"size": [
315,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
10,
12
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
8
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
]
},
{
"id": 11,
"type": "CLIPTextEncodeSDXL",
"pos": [
412.69090909090914,
253.1908935375169
],
"size": [
400,
286
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 12
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
13
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "CLIPTextEncodeSDXL"
},
"widgets_values": [
1024,
1024,
0,
0,
1024,
1024,
"text, watermark, worst quality",
"text, watermark, worst quality"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 10,
"type": "CLIPTextEncodeSDXL",
"pos": [
412.69090909090914,
-106.71203512855799
],
"size": [
400,
286
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
11
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "CLIPTextEncodeSDXL"
},
"widgets_values": [
1024,
1024,
0,
0,
1024,
1024,
"RAW photo,vase,lily flower,brully background",
"RAW photo,vase,lily flower,brully background"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
8,
4,
2,
8,
1,
"VAE"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
4,
1,
10,
0,
"CLIP"
],
[
11,
10,
0,
3,
1,
"CONDITIONING"
],
[
12,
4,
1,
11,
0,
"CLIP"
],
[
13,
11,
0,
3,
2,
"CONDITIONING"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
52.61818181818183,
206.712035128558
]
},
"frontendVersion": "1.33.10",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG"
},
"version": 0.4
}
- If you input the same prompt to both CLIPs, the behavior will result in almost the same as when using the
CLIP Text Encodenode. - It is known from experimental results that the output tends to be most stable when the same text is input to both CLIPs.
base + refiner
Next, let's finish the image generated by base with refiner.
Generated by base → image2image with refiner
SDXL base and SDXL refiner use the same latent representation. Therefore, the latent generated by base can be input to the KSampler of the refiner side as is for image2image.

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 17,
"last_link_id": 27,
"nodes": [
{
"id": 9,
"type": "SaveImage",
"pos": [
1323.7480000000007,
892.7600000000001
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 15,
"type": "PrimitiveStringMultiline",
"pos": [
-7.033930879039095,
515.072432907588
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
19,
21
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 16,
"type": "PrimitiveStringMultiline",
"pos": [
-5.702930879038938,
697.419432907589
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
20,
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"text, watermark, worst quality"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
742.0110000000002,
283.39399999999995
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
10
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
20,
8,
"euler",
"normal",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1085.3795000000005,
892.7600000000001
],
"size": [
210,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 13
},
{
"name": "vae",
"type": "VAE",
"link": 18
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": [],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 14,
"type": "CheckpointLoaderSimple",
"pos": [
43.41999999999924,
930.9400000000011
],
"size": [
315,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
23
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
14,
15
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
18
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_refiner_1.0_0.9vae.safetensors"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 11,
"type": "KSampler",
"pos": [
742.0110000000002,
892.7600000000001
],
"size": [
315,
262
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 23
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 16
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 17
},
{
"name": "latent_image",
"type": "LATENT",
"link": 10
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
13
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
10,
8,
"euler",
"normal",
0.25
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
439.9365430750737,
540.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.95113860092863,
371.08248510742186
],
"size": [
270.805404474145,
88
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 20
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.23966775646977,
217
],
"size": [
269.51687531860387,
88
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 19
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 12,
"type": "CLIPTextEncode",
"pos": [
413.0448260445553,
857.9590000000014
],
"size": [
271.71171703051834,
88
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 14
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 21
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
16
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
413,
1012.4284851074226
],
"size": [
271.75654307507364,
88
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 15
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 22
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
17
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
3,
0,
11,
3,
"LATENT"
],
[
13,
11,
0,
8,
0,
"LATENT"
],
[
14,
14,
1,
12,
0,
"CLIP"
],
[
15,
14,
1,
13,
0,
"CLIP"
],
[
16,
12,
0,
11,
1,
"CONDITIONING"
],
[
17,
13,
0,
11,
2,
"CONDITIONING"
],
[
18,
14,
2,
8,
1,
"VAE"
],
[
19,
15,
0,
6,
1,
"STRING"
],
[
20,
16,
0,
7,
1,
"STRING"
],
[
21,
15,
0,
12,
1,
"STRING"
],
[
22,
16,
0,
13,
1,
"STRING"
],
[
23,
14,
0,
11,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650711,
"offset": [
564.1167298218162,
-39.22301019595235
]
},
"frontendVersion": "1.33.10",
"reroutes": [
{
"id": 1,
"pos": [
1078.7664667739357,
564.0676093271748
],
"linkIds": [
10
]
},
{
"id": 2,
"parentId": 1,
"pos": [
720.9764667739358,
834.9976093271746
],
"linkIds": [
10
]
}
],
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG",
"linkExtensions": [
{
"id": 10,
"parentId": 2
}
]
},
"version": 0.4
}
-
- 🟪 text2image as usual with SDXL base (output latent)
-
- 🟨 Connect that latent to KSampler using SDXL refiner
-
- 🟨 image2image with low
denoise(e.g., 0.2 to 0.3)
- Since it specializes in increasing details, really a little is enough.
- 🟨 image2image with low
The image is that the original style of base is utilized, and only the details and textures are adjusted by refiner.
Switching during sampling (KSampler Advanced)
As a slightly smarter way, there is also a way to switch from base to refiner during sampling. Use KSampler (Advanced) Node.

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 21,
"last_link_id": 40,
"nodes": [
{
"id": 9,
"type": "SaveImage",
"pos": [
1323.7480000000007,
892.7600000000001
],
"size": [
408.737603500472,
456.22967321788406
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 15,
"type": "PrimitiveStringMultiline",
"pos": [
-7.033930879039095,
515.072432907588
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
19,
21
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"RAW photo,vase,lily flower,brully background"
]
},
{
"id": 16,
"type": "PrimitiveStringMultiline",
"pos": [
-5.702930879038938,
697.419432907589
],
"size": [
365.394,
120.13999999999999
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
20,
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveStringMultiline"
},
"widgets_values": [
"text, watermark, worst quality"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1085.3795000000005,
892.7600000000001
],
"size": [
210,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 37
},
{
"name": "vae",
"type": "VAE",
"link": 18
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": [],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.23966775646977,
217
],
"size": [
269.51687531860387,
88
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 19
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
28
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.95113860092863,
371.08248510742186
],
"size": [
270.805404474145,
88
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 20
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
29
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
439.9365430750737,
540.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
30
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
1024,
1024,
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 12,
"type": "CLIPTextEncode",
"pos": [
413.0448260445553,
857.9590000000014
],
"size": [
271.71171703051834,
88
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 14
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 21
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
33
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
413,
1012.4284851074226
],
"size": [
271.75654307507364,
88
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 15
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 22
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
34
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 14,
"type": "CheckpointLoaderSimple",
"pos": [
43.41999999999924,
930.9400000000011
],
"size": [
315,
98
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
38
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
14,
15
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
18
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_refiner_1.0_0.9vae.safetensors"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
43.10000000000001,
310.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
40
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"sd_xl_base_1.0_0.9vae.safetensors"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 18,
"type": "KSamplerAdvanced",
"pos": [
742.0110000000002,
253.50849999999977
],
"size": [
304.748046875,
334
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 40
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 28
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 29
},
{
"name": "latent_image",
"type": "LATENT",
"link": 30
},
{
"name": "end_at_step",
"type": "INT",
"widget": {
"name": "end_at_step"
},
"link": 31
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"enable",
12345,
"fixed",
20,
8,
"euler",
"normal",
0,
10000,
"enable"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 21,
"type": "KSamplerAdvanced",
"pos": [
742.0110000000002,
892.7600000000001
],
"size": [
304.748046875,
334
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 38
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 33
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 34
},
{
"name": "latent_image",
"type": "LATENT",
"link": 35
},
{
"name": "start_at_step",
"type": "INT",
"widget": {
"name": "start_at_step"
},
"link": 39
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
37
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "KSamplerAdvanced"
},
"widgets_values": [
"disable",
0,
"fixed",
20,
8,
"euler",
"normal",
0,
10000,
"disable"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 19,
"type": "PrimitiveInt",
"pos": [
474.75654307507364,
702.5630454545455
],
"size": [
210,
82
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "INT",
"type": "INT",
"links": [
31,
39
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "PrimitiveInt"
},
"widgets_values": [
15,
"fixed"
],
"color": "#223",
"bgcolor": "#335"
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
14,
14,
1,
12,
0,
"CLIP"
],
[
15,
14,
1,
13,
0,
"CLIP"
],
[
18,
14,
2,
8,
1,
"VAE"
],
[
19,
15,
0,
6,
1,
"STRING"
],
[
20,
16,
0,
7,
1,
"STRING"
],
[
21,
15,
0,
12,
1,
"STRING"
],
[
22,
16,
0,
13,
1,
"STRING"
],
[
28,
6,
0,
18,
1,
"CONDITIONING"
],
[
29,
7,
0,
18,
2,
"CONDITIONING"
],
[
30,
5,
0,
18,
3,
"LATENT"
],
[
31,
19,
0,
18,
4,
"INT"
],
[
33,
12,
0,
21,
1,
"CONDITIONING"
],
[
34,
13,
0,
21,
2,
"CONDITIONING"
],
[
35,
18,
0,
21,
3,
"LATENT"
],
[
37,
21,
0,
8,
0,
"LATENT"
],
[
38,
14,
0,
21,
0,
"MODEL"
],
[
39,
19,
0,
21,
4,
"INT"
],
[
40,
4,
0,
18,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
497.94863087903946,
-35.01040000000002
]
},
"frontendVersion": "1.33.10",
"reroutes": [
{
"id": 1,
"pos": [
1052.4388634681484,
619.3758737899849
],
"linkIds": [
35
]
},
{
"id": 2,
"parentId": 1,
"pos": [
735.2514479910659,
833.4949797253714
],
"linkIds": [
35
]
}
],
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true,
"workflowRendererVersion": "LG",
"linkExtensions": [
{
"id": 35,
"parentId": 2
}
]
},
"version": 0.4
}
- 🟪 Sample with SDXL base until the middle
- 🟨 Switch remaining steps to SDXL refiner to sample
- 🟦 Set the timing to switch with
intnode.
Personally, I prefer image2image because it is easier to understand, but it might be good to remember that switching between base and refiner within one sampling pass is possible.
refiner is not necessary, but "refiner-like thinking" is important
Refiner-less SDXL Models
There are many derivative models based on SDXL (community models and commercial models), but many models are adjusted to produce sufficient image quality without using refiner.
To put it a bit strongly, the design of "post-processing with refiner" was also a compromise to compensate for the performance of base alone at that time.
Refiner-like Thinking
However, the idea of "finishing one image across multiple models" itself is still a valid idea.
- Models whose style is good but do not follow prompts very well
- Conversely, models that follow prompts well but whose style is not preferred
There are plenty of such "not quite right" models.
In such situations, SDXL's refiner-like thinking is useful.
- First, generate a base image with a model excellent in composition and prompt reproducibility
- Finish that image by image2image-ing with a model whose style you prefer
By making a two-stage configuration like this, you can build a "best of both worlds" workflow where "Composition is model A" and "Style is model B".
base / refiner in SDXL is just one specific example of that. Please look for your own combination of "how to multiply multiple models".