What is Ultimate SD upscale?

There were two reasons why Stable Diffusion could not generate large images: one was that it was not trained on large images, and another simple reason was the problem of computational cost.
Trying to generate a super high-resolution image like 4K or 8K in one go is quite severe in terms of VRAM and computation time.
Therefore, the idea was born not to make it all at once, but to divide the image and perform Hires.fix on each of them.
-
- Enlarge the image
-
- Divide into tiles
-
- image2image for each tile individually
-
- Connect the tiles at the end
The name Ultimate SD upscale is famous, but what is really important is the concept of Tile (tile division).
Custom Nodes
There is also a node exactly named ssitu/ComfyUI_UltimateSDUpscale, but since we want to follow the principle this time, we will use the simple node above.
Weakness of Tile: Seams
First, let's look at the basic behavior of Tile. Here we will explain using the Tiled Diffusion node as an example, but the node can be anything as long as you understand the concept.


{
"last_node_id": 25,
"last_link_id": 35,
"nodes": [
{
"id": 24,
"type": "ImageScale",
"pos": [
470,
635
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 21,
"type": "VAELoader",
"pos": [
815,
730
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
810,
635
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
151,
320
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1435,
190
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1654,
195
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 3,
"type": "KSampler",
"pos": [
1084,
189
],
"size": {
"0": 315,
"1": 262
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 22,
"type": "LoadImage",
"pos": [
130,
635
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
631,
-35
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
0,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
569.5670166015625,
390
],
"size": {
"0": 425.27801513671875,
"1": 180.6060791015625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
572,
180
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
4
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟨 Resize input image to 1024 × 1024 px
- 🟩 Set tile size to 512 × 512 px
With this setting, the image of the puppy on the bottom left is neatly divided into 4, and each tile is processed as an independent image2image.
As you can see, the boundaries of the tiles are clearly visible, and the unity of the screen as a whole is weak. This is the first weakness of Tile.
Blending seams with overlap
If you are concerned about boundaries, you can arrange the tiles with a slight overlap.
This is tile_overlap.


{
"last_node_id": 25,
"last_link_id": 35,
"nodes": [
{
"id": 24,
"type": "ImageScale",
"pos": [
470,
635
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
810,
635
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
151,
320
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1435,
190
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1654,
195
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 3,
"type": "KSampler",
"pos": [
1084,
189
],
"size": {
"0": 315,
"1": 262
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 22,
"type": "LoadImage",
"pos": [
130,
635
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
569.5670166015625,
390
],
"size": {
"0": 425.27801513671875,
"1": 180.6060791015625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
572,
180
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
4
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
631,
-35
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
256,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 21,
"type": "VAELoader",
"pos": [
811,
730
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟩 Set
tile_overlapto 256px - Instead of lining up tiles neatly, imagine arranging them intentionally overlapping by about half.
The overlapped part acts like a cushion where adjacent tiles share information, so the boundary blends in as sampling progresses, making the seams of the tiles less noticeable.
However, increasing the overlap means sampling the same area multiple times, so the generation time increases.
Another weakness of Tile: Prompt
Tile has another major weakness. Since the same prompt is used for all tiles, unnecessary things are generated in unexpected places.

Let's try generating with settings like tile_overlap = 0 / denoise = 1 in the previous workflow, and writing only a dog in the prompt.
Then, as shown in the image, many dogs appear in one image.
Since it tries to generate one dog in each of the top left, top right, bottom left, and bottom right tiles, four dogs are drawn as a whole. This is the second weakness of Tile.
Idea of changing prompt for each tile
In theory, you could consider writing a separate prompt for each tile.
- Top left tile:
dog right ear, right eye - Top right tile:
dog left ear, left eye - Bottom left tile:
dog front leg - Bottom right tile:
dog back leg
If you do this, every tile should understand "I only need to be responsible for the ear".
However, in practice, this is rarely used.
Writing prompts for the number of tiles is not realistic, and above all, Stable Diffusion cannot understand and differentiate prompts like "only the top right quarter of a dog's face".
Fixing structure with ControlNet Tile
This is where ControlNet Tile comes in.
ControlNet Tile is a ControlNet that generates a new image while maintaining the structure of the input image quite strongly.
It does not copy pixels as they are, but behaves like repainting textures and details while maintaining the rough shape and positional relationship of objects.


{
"last_node_id": 27,
"last_link_id": 43,
"nodes": [
{
"id": 21,
"type": "VAELoader",
"pos": [
925,
840
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
450,
445
],
"size": {
"0": 431.1927490234375,
"1": 116.54621887207031
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
39
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
520,
20
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
0,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1630,
285
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1835,
290
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
920,
745
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 22,
"type": "LoadImage",
"pos": [
210,
745
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
45,
370
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 24,
"type": "ImageScale",
"pos": [
557,
747
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32,
42
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 27,
"type": "ControlNetLoader",
"pos": [
543,
623
],
"size": {
"0": 331.4544982910156,
"1": 58
},
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "CONTROL_NET",
"type": "CONTROL_NET",
"links": [
43
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "ControlNetLoader"
},
"widgets_values": [
"ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1295,
280
],
"size": {
"0": 306.7227783203125,
"1": 265.1262512207031
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 40
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 41
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
1
]
},
{
"id": 26,
"type": "ControlNetApplyAdvanced",
"pos": [
940,
300
],
"size": {
"0": 315,
"1": 166
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 38
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 39
},
{
"name": "control_net",
"type": "CONTROL_NET",
"link": 43,
"slot_index": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 42
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
40
],
"shape": 3,
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
41
],
"shape": 3,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "ControlNetApplyAdvanced"
},
"widgets_values": [
1,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
455,
235
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
38
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
],
[
38,
6,
0,
26,
0,
"CONDITIONING"
],
[
39,
7,
0,
26,
1,
"CONDITIONING"
],
[
40,
26,
0,
3,
1,
"CONDITIONING"
],
[
41,
26,
1,
3,
2,
"CONDITIONING"
],
[
42,
24,
0,
26,
3,
"IMAGE"
],
[
43,
27,
0,
26,
2,
"CONTROL_NET"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
In this workflow, we dare to set tile_overlap = 0, denoise = 1, which makes the weakness of Tile most visible.
Still, you can see that by passing through ControlNet Tile, upscaling is possible while keeping the composition of the original image to a considerable extent.
Finishing with overlap × ControlNet Tile
Combining the elements so far, practical Tile upscale form comes into view.


{
"last_node_id": 30,
"last_link_id": 48,
"nodes": [
{
"id": 21,
"type": "VAELoader",
"pos": [
925,
840
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
450,
445
],
"size": [
431.192761039734,
116.54622192382817
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
39
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1630,
285
],
"size": [
177.86739979492222,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1835,
290
],
"size": [
474.6894287109376,
509.40776977539065
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
920,
745
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
45,
370
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 27,
"type": "ControlNetLoader",
"pos": [
543,
623
],
"size": [
331.4544950753203,
58
],
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "CONTROL_NET",
"type": "CONTROL_NET",
"links": [
43
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "ControlNetLoader"
},
"widgets_values": [
"ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 22,
"type": "LoadImage",
"pos": [
210,
745
],
"size": [
312.6875305175781,
412.5834655761719
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 26,
"type": "ControlNetApplyAdvanced",
"pos": [
940,
300
],
"size": {
"0": 315,
"1": 166
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 38
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 39
},
{
"name": "control_net",
"type": "CONTROL_NET",
"link": 43,
"slot_index": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 48
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
40
],
"shape": 3,
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
41
],
"shape": 3,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "ControlNetApplyAdvanced"
},
"widgets_values": [
0.6,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
520,
20
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
256,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 24,
"type": "ImageScale",
"pos": [
557,
747
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32,
48
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1295,
280
],
"size": [
306.7227709960939,
265.12625854492194
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 40
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 41
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
455,
235
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
38
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
],
[
38,
6,
0,
26,
0,
"CONDITIONING"
],
[
39,
7,
0,
26,
1,
"CONDITIONING"
],
[
40,
26,
0,
3,
1,
"CONDITIONING"
],
[
41,
26,
1,
3,
2,
"CONDITIONING"
],
[
43,
27,
0,
26,
2,
"CONTROL_NET"
],
[
48,
24,
0,
26,
3,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟩 overlap 256px
- 🟦 Controlnet strength 0.6
It has become a much more natural finish.
Summary: The concept of Ultimate SD upscale
The essence of Ultimate SD upscale is the following three pillars.
-
- Tile (Tile Division) Instead of handling large images as they are, divide them into tiles and perform image2image to aim for super-resolution while suppressing VRAM and computation time load.
-
- overlap (Tile Overlap) Arrange tiles slightly overlapping each other and blend boundaries during the sampling process to make seams unnoticeable.
-
- ControlNet Tile (fixing structure) By upscaling tiles while strongly maintaining the structure of the input image, suppress the "dog inside dog" problem and the problem that the whole becomes disjointed.
Actual Ultimate SD upscale nodes and presets are just packaging this concept into a single node.
By the way, a similar idea can be applied to video generation, where frames are divided. It's like taking 20 frames at a time from a 100-frame video and overlapping 5 frames. We won't cover details here, but the point of dividing finely to reduce calculation cost is exactly the same.
