深度推断与法线贴图生成

什么是深度贴图与法线贴图？

深度贴图（depth map）

让每个像素具有“距离相机的距离”的图像。
一般来说越近越白，越远越黑。

法线贴图（normal map）

用 RGB 对每个像素的“面的方向（法线向量）”进行编码的图像。
因为能知道面朝向哪个方向，所以用于重光照或 3D 风格的变形。

单目深度推断

从一张 RGB 图像推断深度贴图的任务。
如果真的想求出准确的深度，需要 LiDAR 或立体相机等多个传感器，但单目深度推断是“试图只从一张照片中恢复伪进深信息”的尝试。
因为深度和法线是相近的信息，所以能同时推断两者的模型也很多。

单目深度推断的代表模型

MiDaS / ZoeDepth（扩散模型以前的常客）

在扩散模型普及之前，MiDaS 或 ZoeDepth 是单目深度推断的常客模型。

MiDaS_Depth-Normal_Map.json

{
  "id": "7dc3def5-a895-4b0c-b417-14463917dad2",
  "revision": 0,
  "last_node_id": 5,
  "last_link_id": 4,
  "nodes": [
    {
      "id": 4,
      "type": "PreviewImage",
      "pos": [
        1247.6461167320394,
        490.11986803330626
      ],
      "size": [
        327.82870022539464,
        258
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 3
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 2,
      "type": "MiDaS-NormalMapPreprocessor",
      "pos": [
        1005.425383378875,
        806.6054384302507
      ],
      "size": [
        210,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui_controlnet_aux",
        "ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
        "Node name for S&R": "MiDaS-NormalMapPreprocessor"
      },
      "widgets_values": [
        6.283185307179586,
        0.1,
        512
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 3,
      "type": "LoadImage",
      "pos": [
        558.9991789425821,
        627.1175485569306
      ],
      "size": [
        355.980078125,
        350.29999999999995
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1,
            2
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "bridge.jpg",
        "image"
      ]
    },
    {
      "id": 5,
      "type": "PreviewImage",
      "pos": [
        1247.067690643409,
        806.6054384302507
      ],
      "size": [
        334.59053343350865,
        262.5289256198346
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 1,
      "type": "MiDaS-DepthMapPreprocessor",
      "pos": [
        1005.425383378875,
        490.11986803330626
      ],
      "size": [
        210,
        106
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui_controlnet_aux",
        "ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
        "Node name for S&R": "MiDaS-DepthMapPreprocessor"
      },
      "widgets_values": [
        6.283185307179586,
        0.1,
        512
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      3,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      2,
      3,
      0,
      2,
      0,
      "IMAGE"
    ],
    [
      3,
      1,
      0,
      4,
      0,
      "IMAGE"
    ],
    [
      4,
      2,
      0,
      5,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        -458.99917894258215,
        -390.11986803330626
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

MiDaS
- 即使是相机参数各不相同的“杂乱图像”，也能被训练成推断相对深度的模型。
- 在只要知道“相对来说哪个在前面・哪个在后面”就行的用途中被广泛使用。
ZoeDepth
- 旨在统一处理相对深度和以米为单位的深度的模型。

在新的 workflow 中使用这个没有什么意义，但在旧的 workflow 中有时能看到，所以只记住名字就好。

Depth Anything 系

最近的主流是 Depth Anything / Depth Anything V2 / V3 等深度推断的基座模型。

Depth_Anything_V2.json

{
  "id": "7dc3def5-a895-4b0c-b417-14463917dad2",
  "revision": 0,
  "last_node_id": 7,
  "last_link_id": 9,
  "nodes": [
    {
      "id": 3,
      "type": "LoadImage",
      "pos": [
        558.9991789425821,
        627.1175485569306
      ],
      "size": [
        355.980078125,
        350.29999999999995
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            8
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "bridge.jpg",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "PreviewImage",
      "pos": [
        1222.0552076411293,
        627.1175485569306
      ],
      "size": [
        469.0687002253949,
        355.46000000000004
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "DepthAnythingV2Preprocessor",
      "pos": [
        946.7014370612255,
        627.1175485569306
      ],
      "size": [
        243.6315905862604,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui_controlnet_aux",
        "ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
        "Node name for S&R": "DepthAnythingV2Preprocessor"
      },
      "widgets_values": [
        "depth_anything_v2_vitl.pth",
        512
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      8,
      3,
      0,
      7,
      0,
      "IMAGE"
    ],
    [
      9,
      7,
      0,
      4,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.2100000000000009,
      "offset": [
        -458.99917894258215,
        -527.1175485569306
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

在 ComfyUI 中制作深度贴图时，我认为大多数情况下是用作 ControlNet 的预处理，总之先用这个就 OK 了。

源自扩散模型的深度・法线推断

扩散模型普及后，也出现了“将生成模型拥有的世界知识，也用于其他任务”方向的研究。

如果不怕误解的话，可以说就像是 “转换成深度贴图风格的画风” 一样。

Marigold

Marigold 是以 Stable Diffusion 2 为基础，“针对深度推断任务进行了微调的模型”。

因为除此之外几乎没有在图像生成以外使用图像生成模型的想法，所以在当时备受瞩目。只是，因为要花费与生成一张图像几乎相同的计算成本，所以作为单纯的预处理有点重。

Lotus

Lotus 是“使用扩散模型的架构，但不进行噪声预测，而是直接输出深度或法线本身”类型的 dense prediction 模型。

LBM（Latent Bridge Matching）

LBM 是基于 Stable Diffusion XL 的“1 步 image-to-image”的框架，其中有深度推断 / 法线推断的派生模型。

深度推断与法线贴图生成

什么是深度贴图与法线贴图？

单目深度推断的代表模型

MiDaS / ZoeDepth（扩散模型以前的常客）

Depth Anything 系

源自扩散模型的深度・法线推断

Marigold

Lotus

LBM（Latent Bridge Matching）

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！