ControlNet

What is ControlNet?

The essence of generative AI is learning the "correspondence between two things". In text2image, it learns the relationship "Noise ↔ Image", but the same can be done with things other than noise.

Learn Line drawing ↔ Image pair → Automatic coloring from line drawing
Learn Stick figure ↔ Image pair → Image generation by specifying pose
Learn Depth map ↔ Image pair → Image generation from depth information

ControlNet is one of the technologies that realize this.

SD1.5 × ControlNet Scribble

There are countless types of ControlNet, but let's start by trying "scribble". The scribble model is a ControlNet that generates images based on "rough doodles".

Download ControlNet Model

control_v11p_sd15_scribble_fp16.safetensors

📂ComfyUI/
  └── 📂models/
      └── 📂controlnet/
          └── control_v11p_sd15_scribble_fp16.safetensors

workflow

SD1.5_ControlNet_scribble.json

{
  "id": "ff9a9120-9e06-4d07-93fd-048b505d0534",
  "revision": 0,
  "last_node_id": 23,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        481.0621643066406,
        427.9620666503906
      ],
      "size": [
        419.9831237792969,
        100.87960815429688
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, low quality"
      ]
    },
    {
      "id": 17,
      "type": "VAELoader",
      "pos": [
        1301.9801940917969,
        183.17780701188025
      ],
      "size": [
        280.8620910644531,
        58
      ],
      "flags": {
        "collapsed": false
      },
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 13,
      "type": "ControlNetLoader",
      "pos": [
        586.0452880859375,
        606.119140625
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CONTROL_NET",
          "type": "CONTROL_NET",
          "links": [
            25
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ControlNetLoader"
      },
      "widgets_values": [
        "control_v11p_sd15_scribble_fp16.safetensors"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 14,
      "type": "LoadImage",
      "pos": [
        506.46689675070996,
        739.2381924715907
      ],
      "size": [
        312.5415954589844,
        402.5836486816406
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            33,
            34
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "fd112e311d4e0503fbb4df2044fc9325.png",
        "image"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        481.0621643066406,
        227.43450927734375
      ],
      "size": [
        419.9831237792969,
        140.84524536132812
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "high quality,high detailed,RAW Photograph of a cat"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1267.84228515625,
        299.4739990234375
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 23
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 24
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        11111,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 20,
      "type": "GetImageSize",
      "pos": [
        843.8379185975353,
        740.3050594888713
      ],
      "size": [
        140,
        124
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            28
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            29
          ]
        },
        {
          "name": "batch_size",
          "type": "INT",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "GetImageSize"
      },
      "widgets_values": [
        "width: 512, height: 512\n batch size: 1"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        1005.7074254334752,
        716.0095069019711
      ],
      "size": [
        210,
        106
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 28
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 29
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        512,
        1
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        105.53061575140833,
        331.77475253018486
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1615.2298583984375,
        299.4739990234375
      ],
      "size": [
        172.8817596435547,
        46
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 16
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 23,
      "type": "SaveImage",
      "pos": [
        1820.499267578125,
        299.4739990234375
      ],
      "size": [
        460.22799999999984,
        432.01099999999985
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 35
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 21,
      "type": "ControlNetApplyAdvanced",
      "pos": [
        948.958740234375,
        320.1973571777344
      ],
      "size": [
        270,
        186
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 21
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 22
        },
        {
          "name": "control_net",
          "type": "CONTROL_NET",
          "link": 25
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 34
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            23
          ]
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            24
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ControlNetApplyAdvanced"
      },
      "widgets_values": [
        0.8,
        0,
        0.4
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      16,
      17,
      0,
      8,
      1,
      "VAE"
    ],
    [
      21,
      6,
      0,
      21,
      0,
      "CONDITIONING"
    ],
    [
      22,
      7,
      0,
      21,
      1,
      "CONDITIONING"
    ],
    [
      23,
      21,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      24,
      21,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      25,
      13,
      0,
      21,
      2,
      "CONTROL_NET"
    ],
    [
      28,
      20,
      0,
      5,
      0,
      "INT"
    ],
    [
      29,
      20,
      1,
      5,
      1,
      "INT"
    ],
    [
      33,
      14,
      0,
      20,
      0,
      "IMAGE"
    ],
    [
      34,
      14,
      0,
      21,
      3,
      "IMAGE"
    ],
    [
      35,
      8,
      0,
      23,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6830134553650705,
      "offset": [
        -5.530615751408334,
        -83.17780701188025
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟩 Input the ControlNet model and scribble image into the Apply ControlNet node.
🟨 It is not an error if the ControlNet image and the generated image size are not the same, but let's make them the same size.

The scribble model is optimized for "white lines drawn on a black background". Please note that black lines drawn on a white background often do not react well.

Sample Image

Balance of ControlNet Control

Diffusion models originally have the highest quality when generating without constraints. However, it is useless if it is completely free, so we control it with Conditioning such as text and ControlNet. If the control is too strong, the quality drops —— this is the same for text prompts and LoRA.

So, how should we balance control and quality?

start_percent / end_percent

In sampling, the rough shape is decided in the early stages, and details are drawn in the latter half.

Many ControlNets (pose / depth / scribble etc.) are controls of the type that determines shape. This means that we can think it is enough to apply ControlNet only in the early stages.

In Apply ControlNet, you can specify in which interval ControlNet works.

start_percent: Timing to start working
end_percent: Timing to finish working

As you lower end_percent, the freedom of the model returns in the second half, and quality can be improved while maintaining the shape.

Combine strength (strength) and start_percent / end_percent to find a balance of "not too bound, not too broken".

Main ControlNet Types

There are as many "concepts" that can be associated with images as there are stars. Here we will introduce only representative ones.

Download Models

List

Canny

Redraws in a different style while keeping the outline of the photo or image.

Lineart

Similar to Canny, but more for illustrations.
Used for coloring line drawings, etc.

Depth

Generates while maintaining the depth and composition of the original image using a depth map (information on front / back).
Suitable when you do not want to break the three-dimensional effect of buildings or landscapes.

Normal

Controls how light hits and three-dimensionality using a normal map.

Pose

Generates images of people/characters with the same pose from "stick figure pose information" extracted by OpenPose etc.

Inpaint

A model used when you want to redraw only a part of the image.
You can redraw naturally only the range specified by the mask (erasing unnecessary objects, replacing small items, etc.).

QR Code Monster

Creates an image that can be read as a QR code.
Not limited to QR codes, it can also be used to transform "black and white pattern images" into any pattern you like.

Tile

Creates a beautiful image from a highly blurred image or low-resolution image.
Can be used alone, but in practice, it is often used in combination with "super-resolution upscaling" such as Ultimate SD Upscale.

ControlNet Union

This is a story since Flux, but "ControlNet Union" is a model that incorporates basic ControlNets such as Scribble, Pose, and Depth into a single model.

It is enough to consider it as a model that automatically recognizes the features (pose, line, depth, etc.) of the input image and tries to reproduce the behavior of ControlNet closer to it collectively.

ControlNet

What is ControlNet?