アップスケール・画像修復

アップスケールは解像度の小さな画像を大きくするタスクですが、ただ大きくするだけなら PowerPoint などでもできます。

しかし、低画質でガビガビした画像を単に 2 倍・4 倍に拡大しただけでは、ただ「大きなガビガビ画像」ができるだけで、情報量は増えません。

そのため、ここで言うアップスケールは 「画像を拡大する」「失われたディテールをそれらしく補い、画質を修復する」 、この 2 つをセットで行う技術を指します。

また、より画像修復に特化したものもあります。古い写真のキズを消したり、白黒写真に自動で色付けしたりするような処理も「画像修復」の一種として考えることができます。

どんな手法・モデルが使われているか、代表的なものだけ見ていきましょう。

GAN / 従来型アップスケール

GAN や従来型の超解像モデルを使ったアップスケールです。
Stable Diffusion 以前からある系統で、今でも軽量な処理として使われることがあります。

ESRGAN.json

{
  "id": "856e71ca-93c8-443a-a6c2-c2d179f2bd60",
  "revision": 0,
  "last_node_id": 6,
  "last_link_id": 4,
  "nodes": [
    {
      "id": 3,
      "type": "LoadImage",
      "pos": [
        557.2993863646276,
        332.50277383751643
      ],
      "size": [
        249.53462357954527,
        493.0909090909091
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "5905871d320b72c5dd9db3ab44d81854-png.jpg",
        "image"
      ]
    },
    {
      "id": 2,
      "type": "ImageUpscaleWithModel",
      "pos": [
        843.6630227282637,
        313.41186474660685
      ],
      "size": [
        246.5274857954545,
        46
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "upscale_model",
          "type": "UPSCALE_MODEL",
          "link": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "ImageUpscaleWithModel"
      },
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "SaveImage",
      "pos": [
        1121.8448409100806,
        315.2300465647888
      ],
      "size": [
        304.5454545454545,
        506.3636363636364
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 4,
      "type": "UpscaleModelLoader",
      "pos": [
        596.8340099441729,
        213.4118647466074
      ],
      "size": [
        210,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "UPSCALE_MODEL",
          "type": "UPSCALE_MODEL",
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "UpscaleModelLoader"
      },
      "widgets_values": [
        "ESRGAN\\ESRGAN_4x.pth"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      3,
      0,
      2,
      1,
      "IMAGE"
    ],
    [
      2,
      4,
      0,
      2,
      0,
      "UPSCALE_MODEL"
    ],
    [
      4,
      2,
      0,
      6,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9229599817706415,
      "offset": [
        -361.95397406280983,
        -53.820982057971335
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

ESRGAN
Real-ESRGAN
SwinIR
HYPIR

顔修復モデル（顔まわりに特化）

顔に特化して、ボケ・崩れ・低解像度の顔を復元するためのモデルです。
FaceSwapで有名なReActorという技術があるのですが、低解像度の生成しか出来ないため、後処理として使用されます。

GFPGAN
CodeFormer

拡散モデル系アップスケール・修復

Stable Diffusion などの拡散モデルを使って、画像を描き直しながらアップスケール・修復を行う方法です。

image2image
- 画像を下敷きにして画像を生成する機能ですが、denoise 量を抑えることで、構図や内容をあまり変えずに「修復」として使うことができます。
Ultimate SD upscale
- 単なる image2image では、そのモデルが扱える解像度や、PC のスペック上生成できるサイズに制限があります。
- そこで、画像をタイル状に分割し、一つずつを image2image してから再度合体することで、より大きな画像を扱えるようにする仕組みです。
SUPIR
- SDXL ベースの、アップスケール・画像復元に特化したモデルです。低画質な入力から自然な高解像度画像を復元することを目的としています。

拡散モデルでのアップスケールは、ある意味描き直しています。
そのため、単なる修復を超えて 「やり過ぎてしまう」 傾向があります。
もちろんこれも表現の一つですが、なるべく元画像を保持するアップスケールと区別して、Enhanceと表現されることもあります。

指示ベース画像編集による修復

最近の「指示ベース画像編集」モデルでは、テキストで指示するだけで、アップスケール・修復に近い処理をまとめて行えるものもあります。

専門のモデルを個別に用意しなくても、「この写真をきれいにして」「ノイズを減らして」「白黒写真に色を付けて」などと指示すれば、それらの処理をまとめてやってくれます。

Qwen-Image-Edit-2509.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 123,
  "last_link_id": 319,
  "nodes": [
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        634.9767456054688,
        -1.8326886892318726
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 282
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            123
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1000000000000005
      ]
    },
    {
      "id": 63,
      "type": "VAEEncode",
      "pos": [
        714.6403198242188,
        673.7313842773438
      ],
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 239
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 115
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            112
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1293.939697265625,
        143.6978759765625
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            254
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 112,
      "type": "CLIPLoader",
      "pos": [
        75.53079223632812,
        277.016357421875
      ],
      "size": [
        270,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            290,
            291
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_2.5_vl_7b_fp8_scaled.safetensors",
        "qwen_image",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        107.53079223632812,
        446.7167663574219
      ],
      "size": [
        238,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76,
            115,
            292,
            293
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "qwen_image_vae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 114,
      "type": "TextEncodeQwenImageEditPlus",
      "pos": [
        454.6401672363281,
        419.63690185546875
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 291
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": 293
        },
        {
          "name": "image1",
          "shape": 7,
          "type": "IMAGE",
          "link": 295
        },
        {
          "name": "image2",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        },
        {
          "name": "image3",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            315
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.59",
        "Node name for S&R": "TextEncodeQwenImageEditPlus"
      },
      "widgets_values": [
        ""
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 111,
      "type": "UNETLoader",
      "pos": [
        330.1968994140625,
        -1.8326886892318726
      ],
      "size": [
        276.62274169921875,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            282
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Qwen-Image\\qwen_image_edit_2509_fp8_e4m3fn.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        933.5941772460938,
        143.6978759765625
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 123
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 314
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 315
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 112
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        20,
        2.5,
        "res_multistep",
        "simple",
        1
      ]
    },
    {
      "id": 82,
      "type": "ImageScaleToTotalPixels",
      "pos": [
        -224.63221740722656,
        668.4074096679688
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 275
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            244
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "ImageScaleToTotalPixels"
      },
      "widgets_values": [
        "nearest-exact",
        1
      ]
    },
    {
      "id": 83,
      "type": "ImageResizeKJv2",
      "pos": [
        75.53079223632812,
        668.4074096679688
      ],
      "size": [
        270,
        336
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 244
        },
        {
          "name": "mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            239,
            294,
            295
          ]
        },
        {
          "name": "width",
          "type": "INT",
          "links": null
        },
        {
          "name": "height",
          "type": "INT",
          "links": null
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfyui-kjnodes",
        "ver": "e2ce0843d1183aea86ce6a1617426f492dcdc802",
        "Node name for S&R": "ImageResizeKJv2"
      },
      "widgets_values": [
        0,
        0,
        "nearest-exact",
        "crop",
        "0, 0, 0",
        "center",
        8,
        "cpu",
        "<tr><td>Output: </td><td><b>1</b> x <b>1024</b> x <b>1024 | 12.00MB</b></td></tr>"
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -84.94583892822266,
        -171.1671905517578
      ],
      "size": [
        386.9856262207031,
        251.33447265625
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [qwen_image_edit_2509_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_edit_2509_fp8_e4m3fn.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae)\n\n\n```\n📂ComfyUI/\n└──📂models/\n    ├── 📂diffusion_models/\n    │   └── qwen_image_edit_2509_fp8_e4m3fn.safetensors\n    ├── 📂text_encoders/\n    │   └── qwen_2.5_vl_7b_fp8.safetensors\n    └── 📂vae/\n         └── wan_2.1_vae.safetensors\n\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 113,
      "type": "TextEncodeQwenImageEditPlus",
      "pos": [
        454.6401672363281,
        163.63690185546875
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 290
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": 292
        },
        {
          "name": "image1",
          "shape": 7,
          "type": "IMAGE",
          "link": 294
        },
        {
          "name": "image2",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        },
        {
          "name": "image3",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            314
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.59",
        "Node name for S&R": "TextEncodeQwenImageEditPlus"
      },
      "widgets_values": [
        "Make this image look clear and in focus. Reduce blur, enhance edges and textures, and keep the original colors and overall look."
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 99,
      "type": "LoadImage",
      "pos": [
        -787.9675541015623,
        668.4074096679688
      ],
      "size": [
        525.4008842507812,
        636.6210345562502
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            275
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (40).png",
        "image"
      ]
    },
    {
      "id": 97,
      "type": "SaveImage",
      "pos": [
        1495.48046875,
        143.6978759765625
      ],
      "size": [
        606.8076645485153,
        669.4791159073438
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 254
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      112,
      63,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      115,
      39,
      0,
      63,
      1,
      "VAE"
    ],
    [
      123,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      239,
      83,
      0,
      63,
      0,
      "IMAGE"
    ],
    [
      244,
      82,
      0,
      83,
      0,
      "IMAGE"
    ],
    [
      254,
      8,
      0,
      97,
      0,
      "IMAGE"
    ],
    [
      275,
      99,
      0,
      82,
      0,
      "IMAGE"
    ],
    [
      282,
      111,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      290,
      112,
      0,
      113,
      0,
      "CLIP"
    ],
    [
      291,
      112,
      0,
      114,
      0,
      "CLIP"
    ],
    [
      292,
      39,
      0,
      113,
      1,
      "VAE"
    ],
    [
      293,
      39,
      0,
      114,
      1,
      "VAE"
    ],
    [
      294,
      83,
      0,
      113,
      2,
      "IMAGE"
    ],
    [
      295,
      83,
      0,
      114,
      2,
      "IMAGE"
    ],
    [
      314,
      113,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      315,
      114,
      0,
      3,
      2,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650712,
      "offset": [
        309.96283560156246,
        -29.313273468242187
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

詳しくは「指示ベース画像編集」のページで扱います。

動画のアップスケール・動画修復

画像のアップスケールを 1 フレームずつ適用すれば、動画のアップスケールも一応は可能です。

ただし、この方法では時間的整合性がないため、チラつきやフリッカーが残る可能性があります。

動画専用のアップスケール・修復モデルは、前後フレームの情報も使うことで、チラつきやフリッカーを抑えながら画質を上げることを狙っています。

代表的な系統としては次のようなものがあります。

SeedVR2
FlashVSR

これらを、静止画のアップスケールとして使うことも問題ありません。むしろその用途でも人気が高いです。

アップスケール・画像修復