使用 AI 生成蒙版
在 Inpainting 等操作中经常需要制作蒙版,但每次都手动绘制或准备蒙版图像非常辛苦。最重要的是无法自动化。
因此,让我们利用各种 AI 来自动生成蒙版吧。
- 物体检测 (Object Detection)
- 根据文本等指令,用 边界框 (Bounding Box) 检测图像中的物体。
- 抠图 (Matting)
- 用带有渐变的蒙版(Alpha Matte)分隔 前景 和 背景(在 ComfyUI 中也经常变成二值蒙版)。
- 分割 (Segmentation)
- 用黑白蒙版(二值蒙版)提取 “物体的形状”。
必要的自定义节点
实现这些功能的技术种类繁多,相应的自定义节点也各种各样,但暂时只要有以下这些就足够了。
- 1038lab/ComfyUI-RMBG
- 从抠图到分割,支持多种技术,使用也很方便。
- ltdrdata/ComfyUI-Impact-Pack
- ltdrdata/ComfyUI-Impact-Subpack
- 用于进行 Detailer 作业,单纯作为蒙版生成使用的话稍微有点特殊。
- kijai/ComfyUI-Florence2
- 运行名为 Florence2 的 MLLM。
- kijai/ComfyUI-segment-anything-2
- 运行名为 SAM 2 的分割模型,通常与 Florence2 搭配使用。
物体检测 (Detection)

顾名思义,它可以确定图像内特定物体的位置,并输出称为 BBOX 的矩形范围。
存在各种在准确性、通用性、速度方面各具特色的技术。
YOLO 系
以实时检测物体为目的的超高速检测技术。
基本上,它针对每种想要检测的物体类型创建一个模型(如人脸专用、手专用等),因此如果没有模型就需要自己制作,不适合想要检测多种类型的情况。

{
"id": "ffcc6c64-e535-4685-ab04-be903b4cdf3c",
"revision": 0,
"last_node_id": 7,
"last_link_id": 5,
"nodes": [
{
"id": 3,
"type": "UltralyticsDetectorProvider",
"pos": [
-131.74129771892854,
275.10463657117793
],
"size": [
225.47324988344883,
100.20074983277442
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "BBOX_DETECTOR",
"type": "BBOX_DETECTOR",
"links": [
2
]
},
{
"name": "SEGM_DETECTOR",
"type": "SEGM_DETECTOR",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-impact-subpack",
"ver": "1.3.5",
"Node name for S&R": "UltralyticsDetectorProvider"
},
"widgets_values": [
"segm/person_yolov8m-seg.pt"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 2,
"type": "LoadImage",
"pos": [
-192.01296976493634,
433.54398787774375
],
"size": [
288.15658006702404,
326
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
1
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"1f421a11eb7f46ffcf970787036c5cc1.jpg",
"image"
]
},
{
"id": 5,
"type": "SegsToCombinedMask",
"pos": [
424.4134665014664,
275.10463657117793
],
"size": [
211.851171875,
26
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "segs",
"type": "SEGS",
"link": 3
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "SegsToCombinedMask"
},
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "MaskPreview",
"pos": [
679.5682861699395,
275.10463657117793
],
"size": [
294.93629499045346,
258
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 4
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "MaskPreview"
},
"widgets_values": []
},
{
"id": 7,
"type": "SEGSPreview",
"pos": [
424.5080547233428,
380.8224702427784
],
"size": [
210,
314
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "segs",
"type": "SEGS",
"link": 5
},
{
"name": "fallback_image_opt",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"shape": 6,
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "SEGSPreview"
},
"widgets_values": [
true,
0.2
]
},
{
"id": 1,
"type": "ImpactSimpleDetectorSEGS",
"pos": [
137.03559995799336,
275.10463657117793
],
"size": [
244.07421875,
310
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "bbox_detector",
"type": "BBOX_DETECTOR",
"link": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 1
},
{
"name": "sam_model_opt",
"shape": 7,
"type": "SAM_MODEL",
"link": null
},
{
"name": "segm_detector_opt",
"shape": 7,
"type": "SEGM_DETECTOR",
"link": null
}
],
"outputs": [
{
"name": "SEGS",
"type": "SEGS",
"links": [
3,
5
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "ImpactSimpleDetectorSEGS"
},
"widgets_values": [
0.5,
0,
3,
10,
0.5,
0,
0,
0.7,
0
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
2,
0,
1,
1,
"IMAGE"
],
[
2,
3,
0,
1,
0,
"BBOX_DETECTOR"
],
[
3,
1,
0,
5,
0,
"SEGS"
],
[
4,
5,
0,
6,
0,
"MASK"
],
[
5,
1,
0,
7,
0,
"SEGS"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.0152559799477097,
"offset": [
292.0129697649363,
-175.10463657117793
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
适用于需要高速处理的情况(如人脸检测等已确定特定对象的情况)。
- 模型的获取方法:
ComfyUI Manager→Install Models→ 搜索 YOLO,除了人脸外还能找到各种 YOLO 模型。 - 这里不贴链接,但在 Civitai 上搜索 Adetailer 也可以找到专注于 NSFW 的模型。
Grounding DINO
检测用文本指定的物体,并输出 BBOX。
与 YOLO 不同,可以用“white dog”、“red car”等任意文本指定物体,因此使用方便,同时也可以检测多个物体。
由于没有单独运行 Grounding DINO 的节点,下面会介绍与分割组合使用的 工作流。
Florence-2
Florence-2 是能像文章一样理解图像的视觉语言模型。
虽然可以生成标题等各种内容,但其中之一就是物体检测。

{
"id": "57b8cf9b-11ed-420b-be41-187510d36325",
"revision": 0,
"last_node_id": 9,
"last_link_id": 9,
"nodes": [
{
"id": 4,
"type": "PreviewImage",
"pos": [
500.84779414328955,
53.49562866388473
],
"size": [
357.987809336234,
366.9149013951313
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 6
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.68",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 7,
"type": "DownloadAndLoadFlorence2Model",
"pos": [
-199.95852064582468,
506.0635940169577
],
"size": [
258.6021484375,
130
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [
{
"name": "lora",
"shape": 7,
"type": "PEFTLORA",
"link": null
}
],
"outputs": [
{
"name": "florence2_model",
"type": "FL2MODEL",
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfyui-florence2",
"ver": "00b63382966a444a9fefacb65b8deb188d12a458",
"Node name for S&R": "DownloadAndLoadFlorence2Model"
},
"widgets_values": [
"microsoft/Florence-2-base-ft",
"fp16",
"sdpa",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 9,
"type": "MaskPreview",
"pos": [
504.15530090191146,
487.1967803209515
],
"size": [
356.4644286534351,
363.80642544479423
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "MaskPreview"
},
"widgets_values": []
},
{
"id": 6,
"type": "Florence2Run",
"pos": [
95.85142311428962,
53.49562866388473
],
"size": [
366.62910569436383,
364
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 4
},
{
"name": "florence2_model",
"type": "FL2MODEL",
"link": 7
}
],
"outputs": [
{
"name": "image",
"type": "IMAGE",
"links": [
6
]
},
{
"name": "mask",
"type": "MASK",
"links": [
9
]
},
{
"name": "caption",
"type": "STRING",
"links": null
},
{
"name": "data",
"type": "JSON",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-florence2",
"ver": "00b63382966a444a9fefacb65b8deb188d12a458",
"Node name for S&R": "Florence2Run"
},
"widgets_values": [
"Potted plant",
"caption_to_phrase_grounding",
true,
false,
1024,
3,
true,
"",
1234,
"fixed"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 5,
"type": "LoadImage",
"pos": [
-232.51584222034649,
53.49562866388473
],
"size": [
290,
390
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
4
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.68",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"ComfyUI_05189_.png",
"image"
]
}
],
"links": [
[
4,
5,
0,
6,
0,
"IMAGE"
],
[
6,
6,
0,
4,
0,
"IMAGE"
],
[
7,
7,
0,
6,
1,
"FL2MODEL"
],
[
9,
6,
1,
9,
0,
"MASK"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1167815779424781,
"offset": [
332.5158422203465,
46.50437133611527
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 模型: 虽然感觉差别不大,但请尝试各种模型。模型会自动下载。
- 提示词: 描述想要检测的物体。
- task: caption_to_phrase_grounding
- output_mask_select: 当检测到多个物体时,选择使用哪个输出(如果为空则全部输出)。
适用于想要用复杂的文章表现来指定对象,或想要利用 LLM 的理解力的情况(但速度较慢)。
抠图 (Matting)
以“背景去除”的名义提供的服务或功能的内核基本就是这个。
它无法指定对象,而“背景”究竟指哪里?也是交给 AI 判断的,因此适合单纯想要去除背景,或者前景和背景的边界清晰的情况。
BiRefNet
大概是使用最多的模型。速度和性能都无可挑剔,暂时用这个就行了。

{
"id": "57b8cf9b-11ed-420b-be41-187510d36325",
"revision": 0,
"last_node_id": 5,
"last_link_id": 3,
"nodes": [
{
"id": 5,
"type": "LoadImage",
"pos": [
-232.51584222034649,
53.49562866388473
],
"size": [
283.4437144886363,
493.72727272727275
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
3
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.68",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"viewfilename=ComfyUI_temp_gzdac_00001_.png",
"image"
]
},
{
"id": 4,
"type": "PreviewImage",
"pos": [
500.8477941432896,
53.49562866388473
],
"size": [
352.3299825744998,
503.21998838299993
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 2
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.68",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 3,
"type": "BiRefNetRMBG",
"pos": [
105.88783320578972,
53.49562866388473
],
"size": [
340,
254
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 3
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
2
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.3",
"Node name for S&R": "BiRefNetRMBG"
},
"widgets_values": [
"BiRefNet-general",
0,
0,
false,
false,
"Color",
"#00ff00"
],
"color": "#222e40",
"bgcolor": "#364254"
}
],
"links": [
[
2,
3,
0,
4,
0,
"IMAGE"
],
[
3,
5,
0,
3,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8390545288824014,
"offset": [
492.21940782589115,
157.34341313697843
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 将
Background设为Alpha,会输出附带 Alpha 通道的透明图像。 - 注意: 此时的输出是 RGBA,因此如果在 image2image 等中使用,可能会发生错误(参考 蒙版与 Alpha 通道)。
根据用途有一些衍生模型,例如擅长动漫图像的 ToonOut 等。请尝试各种模型。
分割 (Segmentation)
SAM (Segment Anything Model)
目前最有名的分割模型。
它熟知“物体的形状”,如果用点或框指定照片中的车等,它就能准确找到其轮廓并制成蒙版。

这是通过点击点来分割指定对象的功能,但在基本使用中,通常会与物体检测组合使用。
-
- 右键点击图像类节点 →
Open in SAM Detector
- 右键点击图像类节点 →
-
- 左键点击想要提取的物体(右键点击想要排除的范围)
-
- 点击
Detect生成蒙版
- 点击
SAM 目前仍在持续开发中,有初期版 / SAM 2 / SAM 2.1 / SAM 3。
最新版的 SAM 3 不仅支持点或 BBOX 指令,还支持文本指令。虽然下面会再次介绍,但老实说,对于静止画的 AI 蒙版生成,SAM 3 就足够了。
服装与人体部位分割
进行“上半身”、“裙子”、“脸”、“头发”等特定部位的分割。

{
"id": "207761f3-951e-495d-82e6-ba18f812bf62",
"revision": 0,
"last_node_id": 6,
"last_link_id": 4,
"nodes": [
{
"id": 4,
"type": "LoadImage",
"pos": [
-196.19169533724752,
147.27211328602687
],
"size": [
300.2159903749374,
523.434865885697
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
1
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"ComfyUI_temp_jgbjo_00009_.png",
"image"
]
},
{
"id": 5,
"type": "PreviewImage",
"pos": [
554.1983967152759,
147.27211328602687
],
"size": [
279.6810290221624,
519.4029697754617
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 3
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 1,
"type": "ClothesSegment",
"pos": [
159.1113458764829,
147.27211328602687
],
"size": [
340,
662
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
3
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.4",
"Node name for S&R": "ClothesSegment"
},
"widgets_values": [
false,
false,
false,
false,
true,
false,
false,
false,
false,
false,
false,
false,
false,
true,
false,
false,
false,
false,
512,
0,
0,
false,
"Color",
"#00ff00"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
4,
0,
1,
0,
"IMAGE"
],
[
3,
1,
0,
5,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6934334949441355,
"offset": [
552.8853816068156,
29.159152850417545
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 选择想要分割的类别。
以前在换装等任务中经常使用,但现在物体检测 + 分割的通用性可能更高,性能也更好。
组合使用
通过组合物体检测、分割和抠图,可以实现更高精度的蒙版生成。
YOLO × SAM

{
"id": "ffcc6c64-e535-4685-ab04-be903b4cdf3c",
"revision": 0,
"last_node_id": 8,
"last_link_id": 6,
"nodes": [
{
"id": 3,
"type": "UltralyticsDetectorProvider",
"pos": [
-131.74129771892854,
275.10463657117793
],
"size": [
225.47324988344883,
100.20074983277442
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "BBOX_DETECTOR",
"type": "BBOX_DETECTOR",
"links": [
2
]
},
{
"name": "SEGM_DETECTOR",
"type": "SEGM_DETECTOR",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-impact-subpack",
"ver": "1.3.5",
"Node name for S&R": "UltralyticsDetectorProvider"
},
"widgets_values": [
"segm/person_yolov8m-seg.pt"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 5,
"type": "SegsToCombinedMask",
"pos": [
424.4134665014664,
275.10463657117793
],
"size": [
211.851171875,
26
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "segs",
"type": "SEGS",
"link": 3
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "SegsToCombinedMask"
},
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "MaskPreview",
"pos": [
679.5682861699395,
275.10463657117793
],
"size": [
294.93629499045346,
258
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 4
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "MaskPreview"
},
"widgets_values": []
},
{
"id": 7,
"type": "SEGSPreview",
"pos": [
424.5080547233428,
380.8224702427784
],
"size": [
210,
314
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "segs",
"type": "SEGS",
"link": 5
},
{
"name": "fallback_image_opt",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"shape": 6,
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "SEGSPreview"
},
"widgets_values": [
true,
0.2
]
},
{
"id": 8,
"type": "SAMLoader",
"pos": [
-116.2680478354797,
435.37734731069196
],
"size": [
210,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "SAM_MODEL",
"type": "SAM_MODEL",
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "SAMLoader"
},
"widgets_values": [
"sam_vit_b_01ec64.pth",
"AUTO"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 2,
"type": "LoadImage",
"pos": [
-199.16827143603965,
581.4934848883244
],
"size": [
288.15658006702404,
326
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
1
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"1f421a11eb7f46ffcf970787036c5cc1.jpg",
"image"
]
},
{
"id": 1,
"type": "ImpactSimpleDetectorSEGS",
"pos": [
137.03559995799336,
275.10463657117793
],
"size": [
244.07421875,
310
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "bbox_detector",
"type": "BBOX_DETECTOR",
"link": 2
},
{
"name": "image",
"type": "IMAGE",
"link": 1
},
{
"name": "sam_model_opt",
"shape": 7,
"type": "SAM_MODEL",
"link": 6
},
{
"name": "segm_detector_opt",
"shape": 7,
"type": "SEGM_DETECTOR",
"link": null
}
],
"outputs": [
{
"name": "SEGS",
"type": "SEGS",
"links": [
3,
5
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
"Node name for S&R": "ImpactSimpleDetectorSEGS"
},
"widgets_values": [
0.5,
0,
3,
10,
0.5,
0,
0,
0.7,
0
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
1,
2,
0,
1,
1,
"IMAGE"
],
[
2,
3,
0,
1,
0,
"BBOX_DETECTOR"
],
[
3,
1,
0,
5,
0,
"SEGS"
],
[
4,
5,
0,
6,
0,
"MASK"
],
[
5,
1,
0,
7,
0,
"SEGS"
],
[
6,
8,
0,
1,
2,
"SAM_MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.839054528882405,
"offset": [
431.4600310048111,
-114.3219362287694
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
高速人脸检测 (YOLO) 和 SAM (初期) 的组合。
Grounding DINO × SAM

{
"id": "45213769-31e7-40a4-9027-26c67d437c51",
"revision": 0,
"last_node_id": 6,
"last_link_id": 4,
"nodes": [
{
"id": 4,
"type": "LoadImage",
"pos": [
-84.57715485740746,
436.65995789100543
],
"size": [
306.56906795083313,
543.6425774433825
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
1
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-14705585.jpg",
"image"
]
},
{
"id": 2,
"type": "SegmentV2",
"pos": [
270.53229781565096,
436.65995789100543
],
"size": [
340,
332
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
3
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.4",
"Node name for S&R": "SegmentV2"
},
"widgets_values": [
"horse",
"sam_hq_vit_h (2.57GB)",
"GroundingDINO_SwinT_OGC (694MB)",
0.35,
0,
0,
false,
"Color",
"#00ff00"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 5,
"type": "PreviewImage",
"pos": [
659.0726825378763,
436.65995789100543
],
"size": [
332.83609638042526,
541.6899599010097
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 3
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
1,
4,
0,
2,
0,
"IMAGE"
],
[
3,
2,
0,
5,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7627768444385543,
"offset": [
184.57715485740746,
-336.65995789100543
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Grounding DINO 和 SAM 的改良版 HQ-SAM 的组合。
既可以用文本指定对象,又能生成高精度的蒙版,是最常用的组合之一。
Florence2 × SAM2

{
"id": "b13968f1-cfe5-4646-9f22-ac07831aae2b",
"revision": 0,
"last_node_id": 33,
"last_link_id": 41,
"nodes": [
{
"id": 27,
"type": "DownloadAndLoadFlorence2Model",
"pos": [
797.5498046875,
435.3081359863281
],
"size": [
270,
130
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [
{
"name": "lora",
"shape": 7,
"type": "PEFTLORA",
"link": null
}
],
"outputs": [
{
"name": "florence2_model",
"type": "FL2MODEL",
"links": [
28
]
}
],
"properties": {
"cnr_id": "comfyui-florence2",
"ver": "de485b65b3e1b9b887ab494afa236dff4bef9a7e",
"Node name for S&R": "DownloadAndLoadFlorence2Model"
},
"widgets_values": [
"microsoft/Florence-2-base",
"fp16",
"sdpa",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 30,
"type": "Florence2toCoordinates",
"pos": [
1548.1920166015625,
275.46484375
],
"size": [
270,
102
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "data",
"type": "JSON",
"link": 36
}
],
"outputs": [
{
"name": "center_coordinates",
"type": "STRING",
"links": null
},
{
"name": "bboxes",
"type": "BBOX",
"links": [
37
]
}
],
"properties": {
"cnr_id": "ComfyUI-segment-anything-2",
"ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
"Node name for S&R": "Florence2toCoordinates"
},
"widgets_values": [
"0",
false
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 16,
"type": "LoadImage",
"pos": [
797.5498046875,
-13.30300235748291
],
"size": [
270,
392.65997314453125
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
26,
34,
41
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Clipboard - 2025-05-13 21.27.11.png",
"image"
]
},
{
"id": 29,
"type": "InvertMask",
"pos": [
2183.08349609375,
215.1739044189453
],
"size": [
140,
26
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 38
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "InvertMask"
},
"widgets_values": []
},
{
"id": 23,
"type": "PreviewImage",
"pos": [
2585.65771484375,
-6.269532203674316
],
"size": [
374.6875305175781,
390.1878356933594
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 32
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 32,
"type": "Sam2Segmentation",
"pos": [
1870.6756591796875,
216.38262939453125
],
"size": [
272.087890625,
182
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "sam2_model",
"type": "SAM2MODEL",
"link": 40
},
{
"name": "image",
"type": "IMAGE",
"link": 41
},
{
"name": "coordinates_positive",
"shape": 7,
"type": "STRING",
"link": null
},
{
"name": "coordinates_negative",
"shape": 7,
"type": "STRING",
"link": null
},
{
"name": "bboxes",
"shape": 7,
"type": "BBOX",
"link": 37
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "mask",
"type": "MASK",
"links": [
38
]
}
],
"properties": {
"cnr_id": "ComfyUI-segment-anything-2",
"ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
"Node name for S&R": "Sam2Segmentation"
},
"widgets_values": [
true,
false
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 28,
"type": "JoinImageWithAlpha",
"pos": [
2368.4716796875,
-6.269532203674316
],
"size": [
176.86484375,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 34
},
{
"name": "alpha",
"type": "MASK",
"link": 35
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
32
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "JoinImageWithAlpha"
},
"widgets_values": []
},
{
"id": 33,
"type": "DownloadAndLoadSAM2Model",
"pos": [
1548.1920166015625,
82.7560043334961
],
"size": [
270,
130
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "sam2_model",
"type": "SAM2MODEL",
"links": [
40
]
}
],
"properties": {
"cnr_id": "ComfyUI-segment-anything-2",
"ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
"Node name for S&R": "DownloadAndLoadSAM2Model"
},
"widgets_values": [
"sam2.1_hiera_base_plus.safetensors",
"single_image",
"cuda",
"fp16"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 25,
"type": "Florence2Run",
"pos": [
1107.8709716796875,
74.4581298828125
],
"size": [
400,
364
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 26
},
{
"name": "florence2_model",
"type": "FL2MODEL",
"link": 28
}
],
"outputs": [
{
"name": "image",
"type": "IMAGE",
"links": []
},
{
"name": "mask",
"type": "MASK",
"links": []
},
{
"name": "caption",
"type": "STRING",
"links": null
},
{
"name": "data",
"type": "JSON",
"links": [
36
]
}
],
"properties": {
"cnr_id": "comfyui-florence2",
"ver": "de485b65b3e1b9b887ab494afa236dff4bef9a7e",
"Node name for S&R": "Florence2Run"
},
"widgets_values": [
"goldfish",
"caption_to_phrase_grounding",
true,
false,
1024,
3,
true,
"",
1234,
"fixed"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
26,
16,
0,
25,
0,
"IMAGE"
],
[
28,
27,
0,
25,
1,
"FL2MODEL"
],
[
32,
28,
0,
23,
0,
"IMAGE"
],
[
34,
16,
0,
28,
0,
"IMAGE"
],
[
35,
29,
0,
28,
1,
"MASK"
],
[
36,
25,
3,
30,
0,
"JSON"
],
[
37,
30,
1,
32,
4,
"BBOX"
],
[
38,
32,
0,
29,
0,
"MASK"
],
[
40,
33,
0,
32,
0,
"SAM2MODEL"
],
[
41,
16,
0,
32,
1,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.620921323059155,
"offset": [
-697.5498046875,
113.30300235748291
]
},
"reroutes": [
{
"id": 1,
"pos": [
1829.7442626953125,
3.2779242992401123
],
"linkIds": [
34,
41
]
}
],
"linkExtensions": [
{
"id": 34,
"parentId": 1
},
{
"id": 41,
"parentId": 1
}
],
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Florence2 和 SAM2.1 的组合。
如果是人或动物等易于理解的对象随便用哪个都行,但当想要指定“戴着墨镜的男人”、“躺在树下的猫”等复杂条件时,这种基于 LLM 的模型就会发挥作用。
🔥SAM 3

{
"id": "45213769-31e7-40a4-9027-26c67d437c51",
"revision": 0,
"last_node_id": 11,
"last_link_id": 11,
"nodes": [
{
"id": 6,
"type": "PreviewImage",
"pos": [
410.5883107288138,
420.92796486120585
],
"size": [
597.0143975156826,
437.7992150216443
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 4
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 4,
"type": "LoadImage",
"pos": [
-513.4050648613645,
420.92796486120585
],
"size": [
507.5333607299855,
441.38462274968504
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
11
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (34).png",
"image"
]
},
{
"id": 3,
"type": "SAM3Segment",
"pos": [
32.358303298717374,
420.92796486120585
],
"size": [
340,
332
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 11
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
4
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.4",
"Node name for S&R": "SAM3Segment"
},
"widgets_values": [
"a woman wearing an apron",
"sam3",
"Auto",
0.5,
0,
0,
false,
"Color",
"#00ff00"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
4,
3,
0,
6,
0,
"IMAGE"
],
[
11,
4,
0,
3,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.0152559799477263,
"offset": [
613.4050648613645,
-320.92796486120585
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
SAM 的最新版,支持文本指令,可以一次性执行物体检测和分割。
精度、性能和速度都很优秀,总之先用这个吧(´ε` )
如果想做更复杂的事情,也可以尝试 Ltamann/ComfyUI-TBG-SAM3 等自定义节点。
SAM 3 × BiRefNet

{
"id": "5231bbde-3d9e-483d-9963-63165fedc646",
"revision": 0,
"last_node_id": 12,
"last_link_id": 18,
"nodes": [
{
"id": 2,
"type": "PreviewImage",
"pos": [
1836.5379900055684,
293.7408968602474
],
"size": [
554.9600255276209,
422.8923553539689
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 17
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 1,
"type": "LoadImage",
"pos": [
477.2842309638515,
293.7408968602474
],
"size": [
526.1926943110356,
491.5335516952887
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
18
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (35).png",
"image"
]
},
{
"id": 11,
"type": "BiRefNetRMBG",
"pos": [
1445.5176350953413,
293.7408968602474
],
"size": [
340,
254
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 16
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
17
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.4",
"Node name for S&R": "BiRefNetRMBG"
},
"widgets_values": [
"BiRefNet-general",
0,
0,
false,
false,
"Alpha",
"#222222"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 5,
"type": "PreviewImage",
"pos": [
1448.15746204173,
611.2211523676546
],
"size": [
332.392016078781,
258
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 4
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.71",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 4,
"type": "SAM3Segment",
"pos": [
1054.497280185114,
293.7408968602474
],
"size": [
340,
332
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 18
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
4,
16
]
},
{
"name": "MASK",
"type": "MASK",
"links": []
},
{
"name": "MASK_IMAGE",
"type": "IMAGE",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-rmbg",
"ver": "2.9.4",
"Node name for S&R": "SAM3Segment"
},
"widgets_values": [
"the woman on the right",
"sam3",
"Auto",
0.5,
0,
7,
false,
"Color",
"#00ff00"
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
4,
4,
0,
5,
0,
"IMAGE"
],
[
16,
4,
0,
11,
0,
"IMAGE"
],
[
17,
11,
0,
2,
0,
"IMAGE"
],
[
18,
1,
0,
4,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8390545288824087,
"offset": [
-377.2842309638515,
-193.7408968602474
]
},
"frontendVersion": "1.33.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
分割原本是用来区分对象的,并不是为了精细的抠图而设计的。
相对的,抠图可以处理像头发这样微小的东西,或者像玻璃这样半透明的东西。
通过将它们组合起来,可以发挥彼此的能力。