什么是 Ultimate SD upscale?

作为在 Stable Diffusion 无法生成大图像的理由,有未以大图像进行学习的理由,但作为另一个单纯的原因,有计算成本的问题。
想原样生成 1 张 4K 或 8K 这样的超高分辨率的图像的话,在 VRAM 和计算时间方面相当严峻。
因此产生了不是一口气制作,而是分割图像分别进行 Hires.fix 的想法。
-
- 扩大图像
-
- 分割为瓦片状
- 3.各瓦片个别地 image2image
-
- 最后连接瓦片
名字上 Ultimate SD upscale 很出名,但真正重要的是 Tile(瓦片分割) 这个想法。
自定义节点
虽然也有名为 ssitu/ComfyUI_UltimateSDUpscale 的正是 Ultimate SD upscale 的节点,但这次想追寻原理,所以使用上面的单纯的节点。
Tile 的弱点 : 边界线
首先,确认一下 Tile 的基本举动。 这里以 Tiled Diffusion 的节点为例说明,但只要掌握了想法什么节点都无所谓。


{
"last_node_id": 25,
"last_link_id": 35,
"nodes": [
{
"id": 24,
"type": "ImageScale",
"pos": [
470,
635
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 21,
"type": "VAELoader",
"pos": [
815,
730
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
810,
635
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
151,
320
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1435,
190
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1654,
195
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 3,
"type": "KSampler",
"pos": [
1084,
189
],
"size": {
"0": 315,
"1": 262
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 22,
"type": "LoadImage",
"pos": [
130,
635
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
631,
-35
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
0,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
569.5670166015625,
390
],
"size": {
"0": 425.27801513671875,
"1": 180.6060791015625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
572,
180
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
4
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟨 将输入图像调整尺寸为 1024 × 1024 px
- 🟩 将瓦片尺寸设定为 512 × 512 px
这个设定的话,左下的小狗的图像被漂亮地 4 分割, 各瓦片作为独立的 image2image 被处理。
如所见那样,瓦片的边界线被清楚地看见,作为画面整体的统一感很弱。
这就是 Tile 的 第 1 个弱点。
用 overlap 融合边界线
如果介意边界线的话,只要将瓦片稍微重叠配置就好,有这样的构思。
这就是 tile_overlap。


{
"last_node_id": 25,
"last_link_id": 35,
"nodes": [
{
"id": 24,
"type": "ImageScale",
"pos": [
470,
635
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
810,
635
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
151,
320
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1435,
190
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1654,
195
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 3,
"type": "KSampler",
"pos": [
1084,
189
],
"size": {
"0": 315,
"1": 262
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 22,
"type": "LoadImage",
"pos": [
130,
635
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
569.5670166015625,
390
],
"size": {
"0": 425.27801513671875,
"1": 180.6060791015625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
572,
180
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
4
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
631,
-35
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
256,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 21,
"type": "VAELoader",
"pos": [
811,
730
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟩 将
tile_overlap设为 256px - 不是漂亮地排列瓦片,而是 故意重叠一半左右 排列的印象。
重叠的部分,因为像相邻的瓦片彼此共享信息的缓冲那样工作, 所以在推进采样的期间边界融合,瓦片的接缝变得不显眼。
但是,越增加 overlap,因为越会对同一领域多次采样,所以生成花费的时间会增加。
Tile 的另一个弱点:提示词
Tile 还有另一个,大的弱点。
因为 在所有的瓦片使用相同的提示词,所以在没想到的地方生成了多余的东西。

在刚才的工作流中,像 tile_overlap = 0 / denoise = 1 这样设定,
并在提示词只写 一只狗 试着生成吧。
于是就像图像那样,在一个图像中出现好几只狗。
因为试图在左上、右上、左下、右下的各瓦片生成一只狗,所以作为整体变成了画四只狗呢。这就是 Tile 的 第 2 个弱点。
每瓦片改变提示词的方案
光说理论的话,可以考虑 每瓦片写分别的提示词 的方法。
- 左上瓦片:
狗的右耳, 右眼 - 右上瓦片:
狗的左耳, 左眼 - 左下瓦片:
狗的前足 - 右下瓦片:
狗的后足
这样的话,哪个瓦片都应该理解“自己只要担当耳朵就好”。
但是,实际上几乎不被使用。
写瓦片数量份的提示词不现实,而且比什么都重要的是 Stable Diffusion 无法理解并分别画出“狗的脸的右上 4 分 of 1”这样的提示词。
用 ControlNet Tile 固定结构
在这里登场的是 ControlNet Tile。
ControlNet Tile 是 相当强地保持输入图像的结构 生成新图像的 ControlNet。
虽然不是原样复制像素,但保持 大体的形状、对象的位置关系 原样,进行重涂纹理和细节那样的举动。


{
"last_node_id": 27,
"last_link_id": 43,
"nodes": [
{
"id": 21,
"type": "VAELoader",
"pos": [
925,
840
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
450,
445
],
"size": {
"0": 431.1927490234375,
"1": 116.54621887207031
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
39
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
520,
20
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
0,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1630,
285
],
"size": {
"0": 177.86740112304688,
"1": 46
},
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1835,
290
],
"size": {
"0": 474.6894226074219,
"1": 509.40777587890625
},
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
920,
745
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 22,
"type": "LoadImage",
"pos": [
210,
745
],
"size": {
"0": 312.6875305175781,
"1": 412.5834655761719
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
45,
370
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 24,
"type": "ImageScale",
"pos": [
557,
747
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32,
42
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 27,
"type": "ControlNetLoader",
"pos": [
543,
623
],
"size": {
"0": 331.4544982910156,
"1": 58
},
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "CONTROL_NET",
"type": "CONTROL_NET",
"links": [
43
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "ControlNetLoader"
},
"widgets_values": [
"ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1295,
280
],
"size": {
"0": 306.7227783203125,
"1": 265.1262512207031
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 40
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 41
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
1
]
},
{
"id": 26,
"type": "ControlNetApplyAdvanced",
"pos": [
940,
300
],
"size": {
"0": 315,
"1": 166
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 38
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 39
},
{
"name": "control_net",
"type": "CONTROL_NET",
"link": 43,
"slot_index": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 42
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
40
],
"shape": 3,
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
41
],
"shape": 3,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "ControlNetApplyAdvanced"
},
"widgets_values": [
1,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
455,
235
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
38
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
],
[
38,
6,
0,
26,
0,
"CONDITIONING"
],
[
39,
7,
0,
26,
1,
"CONDITIONING"
],
[
40,
26,
0,
3,
1,
"CONDITIONING"
],
[
41,
26,
1,
3,
2,
"CONDITIONING"
],
[
42,
24,
0,
26,
3,
"IMAGE"
],
[
43,
27,
0,
26,
2,
"CONTROL_NET"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
在这个工作流中,敢于设为 tile_overlap = 0、denoise = 1 这种,最容易出现 Tile 的弱点的设定。
即便如此,应该能明白通过通过 ControlNet Tile 保持了相当程度原图像的构图 进行了放大。
用 overlap × ControlNet Tile 完成
组合到此为止的要素的话,就能看见实用的 Tile 放大的形状。


{
"last_node_id": 30,
"last_link_id": 48,
"nodes": [
{
"id": 21,
"type": "VAELoader",
"pos": [
925,
840
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
29,
33
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
450,
445
],
"size": [
431.192761039734,
116.54622192382817
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
39
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1630,
285
],
"size": [
177.86739979492222,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 29,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
15
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 13,
"type": "PreviewImage",
"pos": [
1835,
290
],
"size": [
474.6894287109376,
509.40776977539065
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 15
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 25,
"type": "VAEEncode",
"pos": [
920,
745
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 32
},
{
"name": "vae",
"type": "VAE",
"link": 33
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
34
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEEncode"
}
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
45,
370
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
27
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
3,
5
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"📷-v1.x\\dreamshaper_8.safetensors"
]
},
{
"id": 27,
"type": "ControlNetLoader",
"pos": [
543,
623
],
"size": [
331.4544950753203,
58
],
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "CONTROL_NET",
"type": "CONTROL_NET",
"links": [
43
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "ControlNetLoader"
},
"widgets_values": [
"ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 22,
"type": "LoadImage",
"pos": [
210,
745
],
"size": [
312.6875305175781,
412.5834655761719
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Tiled Diffusion.png",
"image"
]
},
{
"id": 26,
"type": "ControlNetApplyAdvanced",
"pos": [
940,
300
],
"size": {
"0": 315,
"1": 166
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 38
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 39
},
{
"name": "control_net",
"type": "CONTROL_NET",
"link": 43,
"slot_index": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 48
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
40
],
"shape": 3,
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
41
],
"shape": 3,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "ControlNetApplyAdvanced"
},
"widgets_values": [
0.6,
0,
1
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 19,
"type": "TiledDiffusion",
"pos": [
520,
20
],
"size": {
"0": 315,
"1": 154
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 27
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
35
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "TiledDiffusion"
},
"widgets_values": [
"MultiDiffusion",
512,
512,
256,
4
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 24,
"type": "ImageScale",
"pos": [
557,
747
],
"size": {
"0": 315,
"1": 130
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 31
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32,
48
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageScale"
},
"widgets_values": [
"nearest-exact",
1024,
1024,
"disabled"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1295,
280
],
"size": [
306.7227709960939,
265.12625854492194
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 35
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 40
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 41
},
{
"name": "latent_image",
"type": "LATENT",
"link": 34,
"slot_index": 3
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
2480,
"fixed",
20,
8,
"dpmpp_2m",
"karras",
0.6
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
455,
235
],
"size": {
"0": 422.84503173828125,
"1": 164.31304931640625
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
38
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"RAW photo of a dog,looking at viewer,white puppy"
]
}
],
"links": [
[
3,
4,
1,
6,
0,
"CLIP"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
15,
8,
0,
13,
0,
"IMAGE"
],
[
27,
4,
0,
19,
0,
"MODEL"
],
[
29,
21,
0,
8,
1,
"VAE"
],
[
31,
22,
0,
24,
0,
"IMAGE"
],
[
32,
24,
0,
25,
0,
"IMAGE"
],
[
33,
21,
0,
25,
1,
"VAE"
],
[
34,
25,
0,
3,
3,
"LATENT"
],
[
35,
19,
0,
3,
0,
"MODEL"
],
[
38,
6,
0,
26,
0,
"CONDITIONING"
],
[
39,
7,
0,
26,
1,
"CONDITIONING"
],
[
40,
26,
0,
3,
1,
"CONDITIONING"
],
[
41,
26,
1,
3,
2,
"CONDITIONING"
],
[
43,
27,
0,
26,
2,
"CONTROL_NET"
],
[
48,
24,
0,
26,
3,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
- 🟩 overlap 256px
- 🟦 Controlnet strength 0.6
变成了相当自然的完成呢。
总结:Ultimate SD upscale 的想法
Ultimate SD upscale 的本质,是以下三根柱子。
-
- Tile(瓦片分割) 不是原样处理大图像,而是分割为瓦片进行 image2image, 一边抑制 VRAM 和计算时间的负荷一边谋求超分辨率。
-
- overlap(瓦片的重叠) 稍微重叠瓦片配置,通过在采样的过程融合边界, 让接缝不显眼。
-
- ControlNet Tile(结构的固定) 通过强力保持输入图像的结构进行瓦片放大, 抑制“狗中的狗”问题,和整体变得零散的问题。
实际的 Ultimate SD upscale 系节点和预设,只不过是将这个想法打包到一个节点而已。
顺便一提,同样的思考也可以应用到视频生成,这次变成分割帧。 将 100 帧的视频分为各 20 帧,overlap 5 帧份这样的感觉呢。 详情这里不处理,但为了降低计算成本细致地分割这点是完全一样的。
