Ultimate SD upscale

什么是 Ultimate SD upscale？

作为在 Stable Diffusion 无法生成大图像的理由，有未以大图像进行学习的理由，但作为另一个单纯的原因，有计算成本的问题。

想原样生成 1 张 4K 或 8K 这样的超高分辨率的图像的话，在 VRAM 和计算时间方面相当严峻。

因此产生了不是一口气制作，而是分割图像分别进行 Hires.fix 的想法。

1. 扩大图像
1. 分割为瓦片状
3.各瓦片个别地 image2image
1. 最后连接瓦片

名字上 Ultimate SD upscale 很出名，但真正重要的是 Tile（瓦片分割） 这个想法。

自定义节点

shiimizu/ComfyUI-TiledDiffusion

虽然也有名为 ssitu/ComfyUI_UltimateSDUpscale 的正是 Ultimate SD upscale 的节点，但这次想追寻原理，所以使用上面的单纯的节点。

Tile 的弱点 : 边界线

首先，确认一下 Tile 的基本举动。这里以 Tiled Diffusion 的节点为例说明，但只要掌握了想法什么节点都无所谓。

Tiled_Diffusion_overlap0.json

{
  "last_node_id": 25,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        470,
        635
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        815,
        730
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        810,
        635
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        151,
        320
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1435,
        190
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1654,
        195
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1084,
        189
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        130,
        635
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        631,
        -35
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        0,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        569.5670166015625,
        390
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        572,
        180
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟨 将输入图像调整尺寸为 1024 × 1024 px
🟩 将瓦片尺寸设定为 512 × 512 px

这个设定的话，左下的小狗的图像被漂亮地 4 分割，各瓦片作为独立的 image2image 被处理。

如所见那样，瓦片的边界线被清楚地看见，作为画面整体的统一感很弱。
这就是 Tile 的 第 1 个弱点。

用 overlap 融合边界线

如果介意边界线的话，只要将瓦片稍微重叠配置就好，有这样的构思。这就是 tile_overlap。

Tiled_Diffusion_overlap256.json

{
  "last_node_id": 25,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        470,
        635
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        810,
        635
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        151,
        320
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1435,
        190
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1654,
        195
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1084,
        189
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        130,
        635
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        569.5670166015625,
        390
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        572,
        180
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        631,
        -35
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        256,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        811,
        730
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟩 将 tile_overlap 设为 256px
不是漂亮地排列瓦片，而是 故意重叠一半左右 排列的印象。

重叠的部分，因为像相邻的瓦片彼此共享信息的缓冲那样工作，所以在推进采样的期间边界融合，瓦片的接缝变得不显眼。

但是，越增加 overlap，因为越会对同一领域多次采样，所以生成花费的时间会增加。

Tile 的另一个弱点：提示词

Tile 还有另一个，大的弱点。
因为 在所有的瓦片使用相同的提示词，所以在没想到的地方生成了多余的东西。

在刚才的工作流中，像 tile_overlap = 0 / denoise = 1 这样设定，并在提示词只写 一只狗 试着生成吧。
于是就像图像那样，在一个图像中出现好几只狗。

因为试图在左上、右上、左下、右下的各瓦片生成一只狗，所以作为整体变成了画四只狗呢。这就是 Tile 的 第 2 个弱点。

每瓦片改变提示词的方案

光说理论的话，可以考虑 每瓦片写分别的提示词 的方法。

左上瓦片：狗的右耳, 右眼
右上瓦片：狗的左耳, 左眼
左下瓦片：狗的前足
右下瓦片：狗的后足

这样的话，哪个瓦片都应该理解“自己只要担当耳朵就好”。

但是，实际上几乎不被使用。

写瓦片数量份的提示词不现实，而且比什么都重要的是 Stable Diffusion 无法理解并分别画出“狗的脸的右上 4 分 of 1”这样的提示词。

用 ControlNet Tile 固定结构

在这里登场的是 ControlNet Tile。

ControlNet Tile 是 相当强地保持输入图像的结构 生成新图像的 ControlNet。

虽然不是原样复制像素，但保持 大体的形状、对象的位置关系 原样，进行重涂纹理和细节那样的举动。

TiledDiffusion_ControlNet_Tile.json

{
  "last_node_id": 27,
  "last_link_id": 43,
  "nodes": [
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        925,
        840
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        450,
        445
      ],
      "size": {
        "0": 431.1927490234375,
        "1": 116.54621887207031
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            39
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        520,
        20
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        0,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1630,
        285
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1835,
        290
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        920,
        745
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        210,
        745
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        45,
        370
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        557,
        747
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32,
            42
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 27,
      "type": "ControlNetLoader",
      "pos": [
        543,
        623
      ],
      "size": {
        "0": 331.4544982910156,
        "1": 58
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            43
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1295,
        280
      ],
      "size": {
        "0": 306.7227783203125,
        "1": 265.1262512207031
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 40
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 41
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        1
      ]
    },
    {
      "id": 26,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        940,
        300
      ],
      "size": {
        "0": 315,
        "1": 166
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 38
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 39
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 43,
          "slot_index": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 42
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            40
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            41
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        1,
        0,
        1
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        455,
        235
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            38
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      38,
      6,
      0,
      26,
      0,
      "CONDITIONING"
    ],
    [
      39,
      7,
      0,
      26,
      1,
      "CONDITIONING"
    ],
    [
      40,
      26,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      41,
      26,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      42,
      24,
      0,
      26,
      3,
      "IMAGE"
    ],
    [
      43,
      27,
      0,
      26,
      2,
      "CONTROL_NET"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

在这个工作流中，敢于设为 tile_overlap = 0、denoise = 1 这种，最容易出现 Tile 的弱点的设定。

即便如此，应该能明白通过通过 ControlNet Tile 保持了相当程度原图像的构图 进行了放大。

用 overlap × ControlNet Tile 完成

组合到此为止的要素的话，就能看见实用的 Tile 放大的形状。

Tiled_Diffusion_overlap_ContolNet_Tile.json

{
  "last_node_id": 30,
  "last_link_id": 48,
  "nodes": [
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        925,
        840
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        450,
        445
      ],
      "size": [
        431.192761039734,
        116.54622192382817
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            39
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1630,
        285
      ],
      "size": [
        177.86739979492222,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1835,
        290
      ],
      "size": [
        474.6894287109376,
        509.40776977539065
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        920,
        745
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        45,
        370
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 27,
      "type": "ControlNetLoader",
      "pos": [
        543,
        623
      ],
      "size": [
        331.4544950753203,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            43
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        210,
        745
      ],
      "size": [
        312.6875305175781,
        412.5834655761719
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 26,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        940,
        300
      ],
      "size": {
        "0": 315,
        "1": 166
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 38
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 39
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 43,
          "slot_index": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 48
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            40
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            41
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        0.6,
        0,
        1
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        520,
        20
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        256,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        557,
        747
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32,
            48
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1295,
        280
      ],
      "size": [
        306.7227709960939,
        265.12625854492194
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 40
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 41
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        455,
        235
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            38
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      38,
      6,
      0,
      26,
      0,
      "CONDITIONING"
    ],
    [
      39,
      7,
      0,
      26,
      1,
      "CONDITIONING"
    ],
    [
      40,
      26,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      41,
      26,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      43,
      27,
      0,
      26,
      2,
      "CONTROL_NET"
    ],
    [
      48,
      24,
      0,
      26,
      3,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟩 overlap 256px
🟦 Controlnet strength 0.6

变成了相当自然的完成呢。

总结：Ultimate SD upscale 的想法

Ultimate SD upscale 的本质，是以下三根柱子。

1. Tile（瓦片分割） 不是原样处理大图像，而是分割为瓦片进行 image2image，一边抑制 VRAM 和计算时间的负荷一边谋求超分辨率。
1. overlap（瓦片的重叠） 稍微重叠瓦片配置，通过在采样的过程融合边界，让接缝不显眼。
1. ControlNet Tile（结构的固定） 通过强力保持输入图像的结构进行瓦片放大，抑制“狗中的狗”问题，和整体变得零散的问题。

实际的 Ultimate SD upscale 系节点和预设，只不过是将这个想法打包到一个节点而已。

顺便一提，同样的思考也可以应用到视频生成，这次变成分割帧。将 100 帧的视频分为各 20 帧，overlap 5 帧份这样的感觉呢。详情这里不处理，但为了降低计算成本细致地分割这点是完全一样的。

Ultimate SD upscale

什么是 Ultimate SD upscale？

自定义节点

Tile 的弱点 : 边界线

用 overlap 融合边界线

Tile 的另一个弱点：提示词

每瓦片改变提示词的方案

用 ControlNet Tile 固定结构

用 overlap × ControlNet Tile 完成

总结：Ultimate SD upscale 的想法

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！

Ultimate SD upscale

什么是 Ultimate SD upscale？

自定义节点

Tile 的弱点 : 边界线

用 overlap 融合边界线

Tile 的另一个弱点：提示词

每瓦片改变提示词的方案

用 ControlNet Tile 固定结构

用 overlap × ControlNet Tile 完成

总结：Ultimate SD upscale 的想法

相关工作流