什么是深度贴图与法线贴图?
深度贴图(depth map)
- 让每个像素具有“距离相机的距离”的图像。
- 一般来说越近越白,越远越黑。
法线贴图(normal map)
- 用 RGB 对每个像素的“面的方向(法线向量)”进行编码的图像。
- 因为能知道面朝向哪个方向,所以用于重光照或 3D 风格的变形。
单目深度推断
- 从一张 RGB 图像推断深度贴图的任务。
- 如果真的想求出准确的深度,需要 LiDAR 或立体相机等多个传感器,但单目深度推断是“试图只从一张照片中恢复伪进深信息”的尝试。
- 因为深度和法线是相近的信息,所以能同时推断两者的模型也很多。
单目深度推断的代表模型
MiDaS / ZoeDepth(扩散模型以前的常客)
在扩散模型普及之前,MiDaS 或 ZoeDepth 是单目深度推断的常客模型。
MiDaS_Depth-Normal_Map.json
{
"id": "7dc3def5-a895-4b0c-b417-14463917dad2",
"revision": 0,
"last_node_id": 5,
"last_link_id": 4,
"nodes": [
{
"id": 4,
"type": "PreviewImage",
"pos": [
1247.6461167320394,
490.11986803330626
],
"size": [
327.82870022539464,
258
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 3
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 2,
"type": "MiDaS-NormalMapPreprocessor",
"pos": [
1005.425383378875,
806.6054384302507
],
"size": [
210,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 2
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "MiDaS-NormalMapPreprocessor"
},
"widgets_values": [
6.283185307179586,
0.1,
512
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 3,
"type": "LoadImage",
"pos": [
558.9991789425821,
627.1175485569306
],
"size": [
355.980078125,
350.29999999999995
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
1,
2
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"bridge.jpg",
"image"
]
},
{
"id": 5,
"type": "PreviewImage",
"pos": [
1247.067690643409,
806.6054384302507
],
"size": [
334.59053343350865,
262.5289256198346
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 4
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 1,
"type": "MiDaS-DepthMapPreprocessor",
"pos": [
1005.425383378875,
490.11986803330626
],
"size": [
210,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
3
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "MiDaS-DepthMapPreprocessor"
},
"widgets_values": [
6.283185307179586,
0.1,
512
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
3,
0,
1,
0,
"IMAGE"
],
[
2,
3,
0,
2,
0,
"IMAGE"
],
[
3,
1,
0,
4,
0,
"IMAGE"
],
[
4,
2,
0,
5,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
-458.99917894258215,
-390.11986803330626
]
},
"frontendVersion": "1.34.2",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
-
MiDaS
- 即使是相机参数各不相同的“杂乱图像”,也能被训练成推断相对深度的模型。
- 在只要知道“相对来说哪个在前面・哪个在后面”就行的用途中被广泛使用。
-
ZoeDepth
在新的 workflow 中使用这个没有什么意义,但在旧的 workflow 中有时能看到,所以只记住名字就好。
Depth Anything 系
最近的主流是 Depth Anything / Depth Anything V2 / V3 等深度推断的基座模型。
{
"id": "7dc3def5-a895-4b0c-b417-14463917dad2",
"revision": 0,
"last_node_id": 7,
"last_link_id": 9,
"nodes": [
{
"id": 3,
"type": "LoadImage",
"pos": [
558.9991789425821,
627.1175485569306
],
"size": [
355.980078125,
350.29999999999995
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
8
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"bridge.jpg",
"image"
]
},
{
"id": 4,
"type": "PreviewImage",
"pos": [
1222.0552076411293,
627.1175485569306
],
"size": [
469.0687002253949,
355.46000000000004
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 7,
"type": "DepthAnythingV2Preprocessor",
"pos": [
946.7014370612255,
627.1175485569306
],
"size": [
243.6315905862604,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfyui_controlnet_aux",
"ver": "12f35647f0d510e03b45a47fb420fe1245a575df",
"Node name for S&R": "DepthAnythingV2Preprocessor"
},
"widgets_values": [
"depth_anything_v2_vitl.pth",
512
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
8,
3,
0,
7,
0,
"IMAGE"
],
[
9,
7,
0,
4,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.2100000000000009,
"offset": [
-458.99917894258215,
-527.1175485569306
]
},
"frontendVersion": "1.34.2",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
在 ComfyUI 中制作深度贴图时,我认为大多数情况下是用作 ControlNet 的预处理,总之先用这个就 OK 了。
源自扩散模型的深度・法线推断
扩散模型普及后,也出现了“将生成模型拥有的世界知识,也用于其他任务”方向的研究。
如果不怕误解的话,可以说就像是 “转换成深度贴图风格的画风” 一样。
Marigold
Marigold 是以 Stable Diffusion 2 为基础,“针对深度推断任务进行了微调的模型”。
因为除此之外几乎没有在图像生成以外使用图像生成模型的想法,所以在当时备受瞩目。
只是,因为要花费与生成一张图像几乎相同的计算成本,所以作为单纯的预处理有点重。
Lotus
Lotus 是“使用扩散模型的架构,但不进行噪声预测,而是直接输出深度或法线本身”类型的 dense prediction 模型。
LBM(Latent Bridge Matching)
LBM 是基于 Stable Diffusion XL 的“1 步 image-to-image”的框架,其中有深度推断 / 法线推断的派生模型。