Ultimate SD upscale

Ultimate SD upscaleとは？

Stable Diffusionで大きな画像を生成できない理由として、大きな画像で学習されていないという理由がありましたが、もう一つシンプルな原因として、計算コストの問題がありました。

4K や 8K といった超高解像度の画像を、1枚そのまま生成しようとすると VRAM と計算時間の面でかなり厳しい。

そこで一気に作るのではなく、画像を分割してそれぞれをHires.fixしようという考え方が生まれました。

1. 画像を拡大する
1. タイル状に分割する
1. 各タイルを個別に image2image
1. 最後にタイルをつなぎ合わせる

名前としては Ultimate SD upscale が有名ですが、本当に大事なのは Tile（タイル分割） という考え方です。

カスタムノード

shiimizu/ComfyUI-TiledDiffusion

ssitu/ComfyUI_UltimateSDUpscaleという名のまさにUltimate SD upscaleのノードもあるのですが、今回は原理を追っていきたいので、上の単純なノードを使います。

Tileの弱点 : 境界線

まずは、Tile の基本的な挙動を見ておきます。ここでは Tiled Diffusion のノードを例に説明しますが、考え方さえ押さえればノードは何でもかまいません。

Tiled_Diffusion_overlap0.json

{
  "last_node_id": 25,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        470,
        635
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        815,
        730
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        810,
        635
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        151,
        320
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1435,
        190
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1654,
        195
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1084,
        189
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        130,
        635
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        631,
        -35
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        0,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        569.5670166015625,
        390
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        572,
        180
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟨 入力画像を 1024 × 1024 px にリサイズ
🟩 タイルサイズを 512 × 512 px に設定

この設定だと、左下の子犬の画像はきれいに 4 分割され、それぞれのタイルが独立した image2image として処理されます。

見て分かるように、タイルの境界線がはっきり見えてしまい画面全体としてのまとまりが弱いです。
これが Tile の 1つ目の弱点 です。

overlapで境界線をなじませる

境界線が気になるなら、タイルを少し重ねて配置すればよい、という発想があります。これが tile_overlap です。

Tiled_Diffusion_overlap256.json

{
  "last_node_id": 25,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        470,
        635
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        810,
        635
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        151,
        320
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1435,
        190
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1654,
        195
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1084,
        189
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        130,
        635
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        569.5670166015625,
        390
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        572,
        180
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        631,
        -35
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        256,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        811,
        730
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟩 tile_overlap を 256px に
タイルをきれいに並べるのではなく、わざと半分くらい重ねて 並べるイメージです。

重なった部分は、隣り合うタイル同士が情報を共有するクッションのように働くため、サンプリングを進めるうちに境界がなじみ、タイルの継ぎ目が目立ちにくくなります。

ただし、overlap を増やすほど、同じ領域を何度もサンプリングすることになるため、生成にかかる時間は増えます。

Tileのもうひとつの弱点：プロンプト

Tile にはもうひとつ、大きな弱点があります。
すべてのタイルで同じプロンプトを使う ため、思ってもいない場所に余計なものが生成されてしまうのです。

先程のworkflowで、tile_overlap = 0 / denoise = 1 のように設定し、プロンプトに 一匹の犬 とだけ書いて生成してみましょう。
すると画像のように、一つの画像の中に何匹もの犬が現れてしまいます。

左上、右上、左下、右下それぞれのタイルで一匹の犬を生成しようとするため、全体としては四匹の犬が描かれることになるんですね。これが Tile の 2つ目の弱点 です。

タイルごとにプロンプトを変える案

理屈だけで言えば、タイルごとに別々のプロンプトを書く 方法が考えられます。

左上タイル：犬の右耳, 右目
右上タイル：犬の左耳, 左目
左下タイル：犬の前足
右下タイル：犬の後ろ足

こうすれば、どのタイルも「自分は耳だけを担当すればいい」と理解してくれるはずです。

しかし、実際にはほとんど使われません。

タイルの数だけプロンプトを書くのは現実的ではありませんし、なにより Stable Diffusion は「犬の顔の右上 4 分の 1だけ」といったプロンプトを理解して描き分けることができません。

ControlNet Tileで構造を固定する

ここで出てくるのが ControlNet Tile です。

ControlNet Tile は、入力画像の 構造をかなり強く保持したまま 新しい画像を生成する ControlNet です。

画素をそのままコピーするわけではありませんが、大まかな形、オブジェクトの位置関係 を保ったまま、テクスチャやディテールを塗り直すような挙動をします。

TiledDiffusion_ControlNet_Tile.json

{
  "last_node_id": 27,
  "last_link_id": 43,
  "nodes": [
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        925,
        840
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        450,
        445
      ],
      "size": {
        "0": 431.1927490234375,
        "1": 116.54621887207031
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            39
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        520,
        20
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        0,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1630,
        285
      ],
      "size": {
        "0": 177.86740112304688,
        "1": 46
      },
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1835,
        290
      ],
      "size": {
        "0": 474.6894226074219,
        "1": 509.40777587890625
      },
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        920,
        745
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        210,
        745
      ],
      "size": {
        "0": 312.6875305175781,
        "1": 412.5834655761719
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        45,
        370
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        557,
        747
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32,
            42
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 27,
      "type": "ControlNetLoader",
      "pos": [
        543,
        623
      ],
      "size": {
        "0": 331.4544982910156,
        "1": 58
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            43
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1295,
        280
      ],
      "size": {
        "0": 306.7227783203125,
        "1": 265.1262512207031
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 40
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 41
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        1
      ]
    },
    {
      "id": 26,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        940,
        300
      ],
      "size": {
        "0": 315,
        "1": 166
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 38
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 39
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 43,
          "slot_index": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 42
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            40
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            41
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        1,
        0,
        1
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        455,
        235
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            38
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      38,
      6,
      0,
      26,
      0,
      "CONDITIONING"
    ],
    [
      39,
      7,
      0,
      26,
      1,
      "CONDITIONING"
    ],
    [
      40,
      26,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      41,
      26,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      42,
      24,
      0,
      26,
      3,
      "IMAGE"
    ],
    [
      43,
      27,
      0,
      26,
      2,
      "CONTROL_NET"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

このworkflowでは、あえてtile_overlap = 0、denoise = 1という、もっとも Tile の弱点が出やすい設定にしています。

それでも、ControlNet Tile を通すことで 元画像の構図をかなりの程度保ったまま アップスケールできているのが分かるはずです。

overlap × ControlNet Tileで仕上げる

ここまでの要素を組み合わせると、実用的な Tile アップスケールの形が見えてきます。

Tiled_Diffusion_overlap_ContolNet_Tile.json

{
  "last_node_id": 30,
  "last_link_id": 48,
  "nodes": [
    {
      "id": 21,
      "type": "VAELoader",
      "pos": [
        925,
        840
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            29,
            33
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        450,
        445
      ],
      "size": [
        431.192761039734,
        116.54622192382817
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            39
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1630,
        285
      ],
      "size": [
        177.86739979492222,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 29,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            15
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 13,
      "type": "PreviewImage",
      "pos": [
        1835,
        290
      ],
      "size": [
        474.6894287109376,
        509.40776977539065
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 15
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 25,
      "type": "VAEEncode",
      "pos": [
        920,
        745
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 32
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        45,
        370
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            27
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "📷-v1.x\\dreamshaper_8.safetensors"
      ]
    },
    {
      "id": 27,
      "type": "ControlNetLoader",
      "pos": [
        543,
        623
      ],
      "size": [
        331.4544950753203,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            43
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "ControlNet-v1-1\\control_v11f1e_sd15_tile.pth"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 22,
      "type": "LoadImage",
      "pos": [
        210,
        745
      ],
      "size": [
        312.6875305175781,
        412.5834655761719
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Tiled Diffusion.png",
        "image"
      ]
    },
    {
      "id": 26,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        940,
        300
      ],
      "size": {
        "0": 315,
        "1": 166
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 38
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 39
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 43,
          "slot_index": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 48
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            40
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            41
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        0.6,
        0,
        1
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 19,
      "type": "TiledDiffusion",
      "pos": [
        520,
        20
      ],
      "size": {
        "0": 315,
        "1": 154
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 27
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "TiledDiffusion"
      },
      "widgets_values": [
        "MultiDiffusion",
        512,
        512,
        256,
        4
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 24,
      "type": "ImageScale",
      "pos": [
        557,
        747
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32,
            48
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        1024,
        1024,
        "disabled"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1295,
        280
      ],
      "size": [
        306.7227709960939,
        265.12625854492194
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 35
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 40
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 41
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34,
          "slot_index": 3
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        2480,
        "fixed",
        20,
        8,
        "dpmpp_2m",
        "karras",
        0.6
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        455,
        235
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            38
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "RAW photo of a dog,looking at viewer,white puppy"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      15,
      8,
      0,
      13,
      0,
      "IMAGE"
    ],
    [
      27,
      4,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      29,
      21,
      0,
      8,
      1,
      "VAE"
    ],
    [
      31,
      22,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      32,
      24,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      33,
      21,
      0,
      25,
      1,
      "VAE"
    ],
    [
      34,
      25,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      19,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      38,
      6,
      0,
      26,
      0,
      "CONDITIONING"
    ],
    [
      39,
      7,
      0,
      26,
      1,
      "CONDITIONING"
    ],
    [
      40,
      26,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      41,
      26,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      43,
      27,
      0,
      26,
      2,
      "CONTROL_NET"
    ],
    [
      48,
      24,
      0,
      26,
      3,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

🟩 overlap 256px
🟦 Controlnet strength 0.6

大分自然な仕上がりになりましたね。

まとめ：Ultimate SD upscaleの考え方

Ultimate SD upscale の本質は、次の三本柱です。

1. Tile（タイル分割） 大きな画像をそのまま扱うのではなく、タイルに分割して image2image することで、 VRAM と計算時間の負荷を抑えながら超解像を狙う。
1. overlap（タイルの重なり） タイル同士を少し重ねて配置し、サンプリングの過程で境界をなじませることで、継ぎ目を目立たせないようにする。
1. ControlNet Tile（構造の固定） 入力画像の構造を強く保持したままタイルアップスケールすることで、「犬の中の犬」問題や、全体がバラバラになる問題を抑える。

実際の Ultimate SD upscale 系ノードやプリセットは、この考え方をひとつのノードにパッケージしたものにすぎません。

ちなみにですが、動画生成にも同じような考えが応用できて、今度はフレームを分割するようになります。 100フレームの動画を20フレームずつにして、5フレーム分overlapするといった感じですね。詳しくはここでは扱いませんが、計算コストを下げるために細かく分割する点は全く同じです。

Ultimate SD upscale

Ultimate SD upscaleとは？

カスタムノード

Tileの弱点 : 境界線

overlapで境界線をなじませる

Tileのもうひとつの弱点：プロンプト

タイルごとにプロンプトを変える案

ControlNet Tileで構造を固定する

overlap × ControlNet Tileで仕上げる

まとめ：Ultimate SD upscaleの考え方

jsonコピーボタンとは？

修正・誤字報告

記事リクエスト

感想・その他

ありがとうございます

Ultimate SD upscale

Ultimate SD upscaleとは？

カスタムノード

Tileの弱点 : 境界線

overlapで境界線をなじませる

Tileのもうひとつの弱点：プロンプト

タイルごとにプロンプトを変える案

ControlNet Tileで構造を固定する

overlap × ControlNet Tileで仕上げる

まとめ：Ultimate SD upscaleの考え方

関連Workflow