ControlNet

什么是 ControlNet？

生成 AI 的本质，是学习 “两个东西的对应关系”。在 text2image 中让其记住“噪声 ↔ 图像”的关系，但噪声以外的东西也能做同样的事

学习 线稿 ↔ 图像 的配对 → 线稿自动上色
学习 火柴人 ↔ 图像 的配对 → 指定姿势生成图像
学习 深度图 ↔ 图像 的配对 → 从景深信息生成图像

ControlNet 是实现这些的技术之一。

SD1.5 × ControlNet Scribble

ControlNet 有无数的种类，首先试一下“scribble”吧。
scribble 模型，是基于“粗略的涂鸦”生成图像的 ControlNet。

ControlNet 模型的下载

control_v11p_sd15_scribble_fp16.safetensors

📂ComfyUI/
  └── 📂models/
      └── 📂controlnet/
          └── control_v11p_sd15_scribble_fp16.safetensors

工作流

SD1.5_ControlNet_scribble.json

{
  "id": "ff9a9120-9e06-4d07-93fd-048b505d0534",
  "revision": 0,
  "last_node_id": 23,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        481.0621643066406,
        427.9620666503906
      ],
      "size": [
        419.9831237792969,
        100.87960815429688
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, low quality"
      ]
    },
    {
      "id": 17,
      "type": "VAELoader",
      "pos": [
        1301.9801940917969,
        183.17780701188025
      ],
      "size": [
        280.8620910644531,
        58
      ],
      "flags": {
        "collapsed": false
      },
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 13,
      "type": "ControlNetLoader",
      "pos": [
        586.0452880859375,
        606.119140625
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            25
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "control_v11p_sd15_scribble_fp16.safetensors"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 14,
      "type": "LoadImage",
      "pos": [
        506.46689675070996,
        739.2381924715907
      ],
      "size": [
        312.5415954589844,
        402.5836486816406
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            33,
            34
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "fd112e311d4e0503fbb4df2044fc9325.png",
        "image"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        481.0621643066406,
        227.43450927734375
      ],
      "size": [
        419.9831237792969,
        140.84524536132812
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "high quality,high detailed,RAW Photograph of a cat"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1267.84228515625,
        299.4739990234375
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 23
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 24
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        11111,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 20,
      "type": "GetImageSize",
      "pos": [
        843.8379185975353,
        740.3050594888713
      ],
      "size": [
        140,
        124
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            28
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            29
          ]
        },
        {
          "name": "batch_size",
          "type": "INT",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "GetImageSize"
      },
      "widgets_values": [
        "width: 512, height: 512\n batch size: 1"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        1005.7074254334752,
        716.0095069019711
      ],
      "size": [
        210,
        106
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 28
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 29
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        512,
        1
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        105.53061575140833,
        331.77475253018486
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1615.2298583984375,
        299.4739990234375
      ],
      "size": [
        172.8817596435547,
        46
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 16
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 23,
      "type": "SaveImage",
      "pos": [
        1820.499267578125,
        299.4739990234375
      ],
      "size": [
        460.22799999999984,
        432.01099999999985
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 35
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 21,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        948.958740234375,
        320.1973571777344
      ],
      "size": [
        270,
        186
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 21
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 22
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 25
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 34
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            23
          ]
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            24
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        0.8,
        0,
        0.4
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      16,
      17,
      0,
      8,
      1,
      "VAE"
    ],
    [
      21,
      6,
      0,
      21,
      0,
      "CONDITIONING"
    ],
    [
      22,
      7,
      0,
      21,
      1,
      "CONDITIONING"
    ],
    [
      23,
      21,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      24,
      21,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      25,
      13,
      0,
      21,
      2,
      "CONTROL_NET"
    ],
    [
      28,
      20,
      0,
      5,
      0,
      "INT"
    ],
    [
      29,
      20,
      1,
      5,
      1,
      "INT"
    ],
    [
      33,
      14,
      0,
      20,
      0,
      "IMAGE"
    ],
    [
      34,
      14,
      0,
      21,
      3,
      "IMAGE"
    ],
    [
      35,
      8,
      0,
      23,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        -5.530615751408334,
        -83.17780701188025
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟩 向 Apply ControlNet 节点输入 ControlNet 模型和 scribble 图像。
🟨 ControlNet 图像和生成的图像尺寸如果不相同也不会报错，但最好设为相同尺寸。

scribble 模型是对“黑背景上用白色画的线”最优化的。
如果是白背景上用黑色画的线，很多时候反应不好，请注意。

样本图像

ControlNet 的控制平衡

扩散模型，本来 在不受束缚生成时质量最高。
但是，如果完全自由就没法用，所以用文本或 ControlNet 等 Conditioning 进行控制。
如果控制太强质量就会下降 —— 这在文本提示词或 LoRA 中也是一样的。

那么，控制和质量的平衡该如何取呢？

start_percent / end_percent

采样是在序章决定大致形状，在后半描绘细节。

ControlNet 的许多（pose / depth / scribble 等）是 决定形状类型 的控制。
也就是说，可以认为 只在序章让 ControlNet 生效就好。

在 Apply ControlNet 中，可以指定 ControlNet 在哪个区间生效。

start_percent: 开始生效的时机
end_percent: 结束生效的时机

越降低 end_percent，后半段模型的自由度就越恢复，能在保持形状的同时提高质量。

组合 strength（强度）和 start_percent / end_percent，
去寻找“不过度束缚，也不过度崩坏”的平衡吧。

主要的 ControlNet 的种类

能与图像对应的“概念”，有星辰之多。
这里只介绍具有代表性的东西。

模型的下载

一览

Canny

保持照片或图像的轮廓，以别的风格重绘。

Lineart

与 Canny 相似，但更面向插画。
用于线稿上色等。

Depth

使用深度图（前后信息），保持元图像的景深和构图进行生成。
适合不想破坏建筑物或风景等立体感的情况。

Normal

使用法线贴图，控制光照方式和立体感。

Pose

从 OpenPose 等提取的“火柴人姿势信息”，生成相同姿势的人物・角色图像。

Inpaint

想要只重绘图像一部分时使用的模型。
可以只自然地重绘用掩膜指定的范围（消除不需要的东西・替换小物件等）。

QR Code Monster

制作作为二维码可读取的图像。
不限于二维码，也可以将“黑白图案图像”作为基础，变形为喜欢的画面。

Tile

从模糊强烈的图像或低分辨率图像，制作漂亮的图像。
虽然也可以单体使用，但实际上更多被与 Ultimate SD Upscale 这样的“超分辨率放大”组合使用。

ControlNet Union

虽然是 Flux 以后的话题，但将 Scribble 或 Pose、Depth 这样基本的 ControlNet
作为一个模型内置的东西就是“ControlNet Union”。

只要认为是自动识别输入图像的特征（姿势・线・深度等），
试图统一再现接近的 ControlNet 举动的模型就足够了。

ControlNet

什么是 ControlNet？