AIを使ったマスク生成

inpaintingなどでマスクを作る場面は多いですが、毎回手書きをしたり、マスク画像を用意するのは大変です。なにより自動化できません。

そこで様々なAIを活用してマスクを自動生成してみましょう。

物体検出 (Detection)
- テキストなどの指示に従い、画像の中の物体を バウンディングボックス (Bounding Box) で検出します。
マッティング (Matting)
- 手前の景色 と 背後の景色 をグラデーションのあるマスク（Alpha Matte）で区切ります（ComfyUIではバイナリマスクになることも多いです）。
セグメンテーション (Segmentation)
- 「物体の形」 を白と黒のマスク（バイナリマスク）で抽出します。

必要なカスタムノード

これらを行う技術は多くの種類があり、それに従いカスタムノードも様々なものがあるのですが、ひとまず以下のものがあれば事足ります。

1038lab/ComfyUI-RMBG
- マッティングからセグメンテーションまで、多くの技術に対応しており、使い勝手もよいです。
ltdrdata/ComfyUI-Impact-Pack
ltdrdata/ComfyUI-Impact-Subpack
- Detailerという作業をするためのもので、単純にマスク生成として使うにはクセがあります。
kijai/ComfyUI-Florence2
- Florence2というMLLMを動かします。
kijai/ComfyUI-segment-anything-2
- SAM 2というセグメンテーションモデルを動かすもので、Florence2とセットで使います。

物体検出 (Detection)

その名の通り、画像内にある特定の物体の位置を特定することができ、BBOXと呼ばれる四角い範囲を出力します。

正確性・汎用性・速度、それぞれに特徴のある様々な技術が存在します。

YOLO系

リアルタイムに物体を検出することを目的としている、超高速な検出技術です。

基本的には、検出したい物体の種類に対して一つのモデル（顔専用、手専用など）を作るため、モデルがなければ自分で作る必要がありますし、複数の種類を検出したい場合には不向きです。

Simple_Detector_(SEGS)-YOLO_face.json

{
  "id": "ffcc6c64-e535-4685-ab04-be903b4cdf3c",
  "revision": 0,
  "last_node_id": 7,
  "last_link_id": 5,
  "nodes": [
    {
      "id": 3,
      "type": "UltralyticsDetectorProvider",
      "pos": [
        -131.74129771892854,
        275.10463657117793
      ],
      "size": [
        225.47324988344883,
        100.20074983277442
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "BBOX_DETECTOR",
          "type": "BBOX_DETECTOR",
          "links": [
            2
          ]
        },
        {
          "name": "SEGM_DETECTOR",
          "type": "SEGM_DETECTOR",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-subpack",
        "ver": "1.3.5",
        "Node name for S&R": "UltralyticsDetectorProvider"
      },
      "widgets_values": [
        "segm/person_yolov8m-seg.pt"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        -192.01296976493634,
        433.54398787774375
      ],
      "size": [
        288.15658006702404,
        326
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "1f421a11eb7f46ffcf970787036c5cc1.jpg",
        "image"
      ]
    },
    {
      "id": 5,
      "type": "SegsToCombinedMask",
      "pos": [
        424.4134665014664,
        275.10463657117793
      ],
      "size": [
        211.851171875,
        26
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "segs",
          "type": "SEGS",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "MASK",
          "type": "MASK",
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "SegsToCombinedMask"
      },
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "MaskPreview",
      "pos": [
        679.5682861699395,
        275.10463657117793
      ],
      "size": [
        294.93629499045346,
        258
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "MaskPreview"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "SEGSPreview",
      "pos": [
        424.5080547233428,
        380.8224702427784
      ],
      "size": [
        210,
        314
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "segs",
          "type": "SEGS",
          "link": 5
        },
        {
          "name": "fallback_image_opt",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "shape": 6,
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "SEGSPreview"
      },
      "widgets_values": [
        true,
        0.2
      ]
    },
    {
      "id": 1,
      "type": "ImpactSimpleDetectorSEGS",
      "pos": [
        137.03559995799336,
        275.10463657117793
      ],
      "size": [
        244.07421875,
        310
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "bbox_detector",
          "type": "BBOX_DETECTOR",
          "link": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "sam_model_opt",
          "shape": 7,
          "type": "SAM_MODEL",
          "link": null
        },
        {
          "name": "segm_detector_opt",
          "shape": 7,
          "type": "SEGM_DETECTOR",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "SEGS",
          "type": "SEGS",
          "links": [
            3,
            5
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "ImpactSimpleDetectorSEGS"
      },
      "widgets_values": [
        0.5,
        0,
        3,
        10,
        0.5,
        0,
        0,
        0.7,
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      1,
      "IMAGE"
    ],
    [
      2,
      3,
      0,
      1,
      0,
      "BBOX_DETECTOR"
    ],
    [
      3,
      1,
      0,
      5,
      0,
      "SEGS"
    ],
    [
      4,
      5,
      0,
      6,
      0,
      "MASK"
    ],
    [
      5,
      1,
      0,
      7,
      0,
      "SEGS"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.0152559799477097,
      "offset": [
        292.0129697649363,
        -175.10463657117793
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

高速処理が必要な場合（顔検出など、特定の対象が決まっている場合）に適しています。

モデルの入手方法: ComfyUI Manager → Install Models → YOLOで検索すると顔以外にも色々なYOLOモデルを見つけることができます。
リンクは貼りませんが、CivitaiでAdetailerと探すとNSFWに特化したモデルも見つけることができます。

Grounding DINO

テキストで指定した物体を検出し、BBOXを出力します。

YOLOとは違い、「white dog」「red car」など任意のテキストで物体を指定できるため使い勝手が良く、同時に複数の物体を検出することもできます。

Grounding DINO単体で動かすノードが無いため、下でセグメンテーションと組み合わせたworkflowを紹介します。

Florence-2

Florence-2 は、画像を文章として理解できる視覚言語モデルです。

キャプション生成など様々なことができますが、その中の一つに物体検出があります。

Florence2Run.json

{
  "id": "57b8cf9b-11ed-420b-be41-187510d36325",
  "revision": 0,
  "last_node_id": 9,
  "last_link_id": 9,
  "nodes": [
    {
      "id": 4,
      "type": "PreviewImage",
      "pos": [
        500.84779414328955,
        53.49562866388473
      ],
      "size": [
        357.987809336234,
        366.9149013951313
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 6
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.68",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        -199.95852064582468,
        506.0635940169577
      ],
      "size": [
        258.6021484375,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base-ft",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 9,
      "type": "MaskPreview",
      "pos": [
        504.15530090191146,
        487.1967803209515
      ],
      "size": [
        356.4644286534351,
        363.80642544479423
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "MaskPreview"
      },
      "widgets_values": []
    },
    {
      "id": 6,
      "type": "Florence2Run",
      "pos": [
        95.85142311428962,
        53.49562866388473
      ],
      "size": [
        366.62910569436383,
        364
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 4
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 7
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": [
            6
          ]
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": [
            9
          ]
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": null
        },
        {
          "name": "data",
          "type": "JSON",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "Potted plant",
        "caption_to_phrase_grounding",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 5,
      "type": "LoadImage",
      "pos": [
        -232.51584222034649,
        53.49562866388473
      ],
      "size": [
        290,
        390
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.68",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "ComfyUI_05189_.png",
        "image"
      ]
    }
  ],
  "links": [
    [
      4,
      5,
      0,
      6,
      0,
      "IMAGE"
    ],
    [
      6,
      6,
      0,
      4,
      0,
      "IMAGE"
    ],
    [
      7,
      7,
      0,
      6,
      1,
      "FL2MODEL"
    ],
    [
      9,
      6,
      1,
      9,
      0,
      "MASK"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1167815779424781,
      "offset": [
        332.5158422203465,
        46.50437133611527
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

モデル: あまり大きな違いは感じませんが、色々と試してみてください。モデルは自動でダウンロードされます。
プロンプト: 検出したい物体を説明します。
task: caption_to_phrase_grounding
output_mask_select: 検出したものがいくつかある場合、どの出力を使うか選択します（空白の場合は全て出力されます）。

複雑な文章表現で対象を指定したい場合や、LLMの理解力を活用したい場合に適しています（ただし速度は遅いです）。

マッティング (Matting)

「背景除去」という名前で提供されているサービスや機能の中身は基本的にこれと同じものです。

オブジェクトを指定することなどは出来ませんし、「背景」が一体どこのことを指すのか？はAIに委ねられているため、シンプルに背景除去したい場合や、前景と背景の境界がはっきりしている場合に使うのがいいでしょう。

BiRefNet

おそらく最も使われているモデルです。速度・性能ともに申し分ないため、とりあえずこれを使えばよいでしょう。

BiRefNet_Remove_Background_(RMBG).json

{
  "id": "57b8cf9b-11ed-420b-be41-187510d36325",
  "revision": 0,
  "last_node_id": 5,
  "last_link_id": 3,
  "nodes": [
    {
      "id": 5,
      "type": "LoadImage",
      "pos": [
        -232.51584222034649,
        53.49562866388473
      ],
      "size": [
        283.4437144886363,
        493.72727272727275
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            3
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.68",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "viewfilename=ComfyUI_temp_gzdac_00001_.png",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "PreviewImage",
      "pos": [
        500.8477941432896,
        53.49562866388473
      ],
      "size": [
        352.3299825744998,
        503.21998838299993
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 2
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.68",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 3,
      "type": "BiRefNetRMBG",
      "pos": [
        105.88783320578972,
        53.49562866388473
      ],
      "size": [
        340,
        254
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            2
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.3",
        "Node name for S&R": "BiRefNetRMBG"
      },
      "widgets_values": [
        "BiRefNet-general",
        0,
        0,
        false,
        false,
        "Color",
        "#00ff00"
      ],
      "color": "#222e40",
      "bgcolor": "#364254"
    }
  ],
  "links": [
    [
      2,
      3,
      0,
      4,
      0,
      "IMAGE"
    ],
    [
      3,
      5,
      0,
      3,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.8390545288824014,
      "offset": [
        492.21940782589115,
        157.34341313697843
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Background を Alpha にすると、アルファチャンネルが付加された透過画像が出力されます。
注意: このときの出力は RGBA であるため、image2image等で使う場合、エラーが発生する可能性があります（マスクとアルファチャンネル参照）。

アニメ画像が得意な ToonOut など、用途によって派生モデルがいくつかあります。色々試してみてください。

セグメンテーション (Segmentation)

SAM (Segment Anything Model)

現在最も有名なセグメンテーションモデルです。

「物の形」を熟知しており、写真内の車などをポイントやボックスで指定すると、その輪郭を正確に見つけてマスクにしてくれます。

これはポイントを押して、指定したオブジェクトをセグメンテーションする機能ですが、基本的には物体検出と組み合わせることが多いでしょう。

1. 画像系ノードを右クリック → Open in SAM Detector
1. 抽出したい物体を左クリックでポチポチと指定（除外したい範囲は右クリック）
1. Detect を押すとマスクが生成されます

SAMは現在は開発が続けられており、初期/SAM 2/SAM 2.1/SAM3 があります。

最新版である SAM 3 は、ポイントやBBOXでの指示のみならず、テキスト指示にも対応しています。以下で改めて紹介しますが、正直、静止画のAIマスク生成はSAM 3だけで十分です。

服装・人体部位セグメンテーション

「上半身」「スカート」「顔」「髪」といった特定部位のセグメンテーションを行います。

Clothing_Segmentation_(RMBG).json

{
  "id": "207761f3-951e-495d-82e6-ba18f812bf62",
  "revision": 0,
  "last_node_id": 6,
  "last_link_id": 4,
  "nodes": [
    {
      "id": 4,
      "type": "LoadImage",
      "pos": [
        -196.19169533724752,
        147.27211328602687
      ],
      "size": [
        300.2159903749374,
        523.434865885697
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "ComfyUI_temp_jgbjo_00009_.png",
        "image"
      ]
    },
    {
      "id": 5,
      "type": "PreviewImage",
      "pos": [
        554.1983967152759,
        147.27211328602687
      ],
      "size": [
        279.6810290221624,
        519.4029697754617
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 3
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 1,
      "type": "ClothesSegment",
      "pos": [
        159.1113458764829,
        147.27211328602687
      ],
      "size": [
        340,
        662
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            3
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.4",
        "Node name for S&R": "ClothesSegment"
      },
      "widgets_values": [
        false,
        false,
        false,
        false,
        true,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        false,
        true,
        false,
        false,
        false,
        false,
        512,
        0,
        0,
        false,
        "Color",
        "#00ff00"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      3,
      1,
      0,
      5,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.6934334949441355,
      "offset": [
        552.8853816068156,
        29.159152850417545
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

セグメンテーションしたいカテゴリを選択します。

着せ替え等のタスクで以前はよく使っていたのですが、現在は物体検出 + セグメンテーションのほうが汎用性が高く性能が良いかもしれません。

組み合わせる

物体検出とセグメンテーション、マッティングを組み合わせることで、より高精度なマスク生成が可能になります。

YOLO × SAM

YOLO_face-SAM.json

{
  "id": "ffcc6c64-e535-4685-ab04-be903b4cdf3c",
  "revision": 0,
  "last_node_id": 8,
  "last_link_id": 6,
  "nodes": [
    {
      "id": 3,
      "type": "UltralyticsDetectorProvider",
      "pos": [
        -131.74129771892854,
        275.10463657117793
      ],
      "size": [
        225.47324988344883,
        100.20074983277442
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "BBOX_DETECTOR",
          "type": "BBOX_DETECTOR",
          "links": [
            2
          ]
        },
        {
          "name": "SEGM_DETECTOR",
          "type": "SEGM_DETECTOR",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-subpack",
        "ver": "1.3.5",
        "Node name for S&R": "UltralyticsDetectorProvider"
      },
      "widgets_values": [
        "segm/person_yolov8m-seg.pt"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 5,
      "type": "SegsToCombinedMask",
      "pos": [
        424.4134665014664,
        275.10463657117793
      ],
      "size": [
        211.851171875,
        26
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "segs",
          "type": "SEGS",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "MASK",
          "type": "MASK",
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "SegsToCombinedMask"
      },
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "MaskPreview",
      "pos": [
        679.5682861699395,
        275.10463657117793
      ],
      "size": [
        294.93629499045346,
        258
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "MaskPreview"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "SEGSPreview",
      "pos": [
        424.5080547233428,
        380.8224702427784
      ],
      "size": [
        210,
        314
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "segs",
          "type": "SEGS",
          "link": 5
        },
        {
          "name": "fallback_image_opt",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "shape": 6,
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "SEGSPreview"
      },
      "widgets_values": [
        true,
        0.2
      ]
    },
    {
      "id": 8,
      "type": "SAMLoader",
      "pos": [
        -116.2680478354797,
        435.37734731069196
      ],
      "size": [
        210,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "SAM_MODEL",
          "type": "SAM_MODEL",
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "SAMLoader"
      },
      "widgets_values": [
        "sam_vit_b_01ec64.pth",
        "AUTO"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        -199.16827143603965,
        581.4934848883244
      ],
      "size": [
        288.15658006702404,
        326
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "1f421a11eb7f46ffcf970787036c5cc1.jpg",
        "image"
      ]
    },
    {
      "id": 1,
      "type": "ImpactSimpleDetectorSEGS",
      "pos": [
        137.03559995799336,
        275.10463657117793
      ],
      "size": [
        244.07421875,
        310
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "bbox_detector",
          "type": "BBOX_DETECTOR",
          "link": 2
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "sam_model_opt",
          "shape": 7,
          "type": "SAM_MODEL",
          "link": 6
        },
        {
          "name": "segm_detector_opt",
          "shape": 7,
          "type": "SEGM_DETECTOR",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "SEGS",
          "type": "SEGS",
          "links": [
            3,
            5
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-impact-pack",
        "ver": "61bd8397a18e7e7668e6a24e95168967768c2bed",
        "Node name for S&R": "ImpactSimpleDetectorSEGS"
      },
      "widgets_values": [
        0.5,
        0,
        3,
        10,
        0.5,
        0,
        0,
        0.7,
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      1,
      "IMAGE"
    ],
    [
      2,
      3,
      0,
      1,
      0,
      "BBOX_DETECTOR"
    ],
    [
      3,
      1,
      0,
      5,
      0,
      "SEGS"
    ],
    [
      4,
      5,
      0,
      6,
      0,
      "MASK"
    ],
    [
      5,
      1,
      0,
      7,
      0,
      "SEGS"
    ],
    [
      6,
      8,
      0,
      1,
      2,
      "SAM_MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.839054528882405,
      "offset": [
        431.4600310048111,
        -114.3219362287694
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

高速な顔検出（YOLO）とSAM(初期) の組み合わせです。

Grounding DINO × SAM

Grounding_DINO_HQ-SAM.json

{
  "id": "45213769-31e7-40a4-9027-26c67d437c51",
  "revision": 0,
  "last_node_id": 6,
  "last_link_id": 4,
  "nodes": [
    {
      "id": 4,
      "type": "LoadImage",
      "pos": [
        -84.57715485740746,
        436.65995789100543
      ],
      "size": [
        306.56906795083313,
        543.6425774433825
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pexels-photo-14705585.jpg",
        "image"
      ]
    },
    {
      "id": 2,
      "type": "SegmentV2",
      "pos": [
        270.53229781565096,
        436.65995789100543
      ],
      "size": [
        340,
        332
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            3
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.4",
        "Node name for S&R": "SegmentV2"
      },
      "widgets_values": [
        "horse",
        "sam_hq_vit_h (2.57GB)",
        "GroundingDINO_SwinT_OGC (694MB)",
        0.35,
        0,
        0,
        false,
        "Color",
        "#00ff00"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 5,
      "type": "PreviewImage",
      "pos": [
        659.0726825378763,
        436.65995789100543
      ],
      "size": [
        332.83609638042526,
        541.6899599010097
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 3
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      2,
      0,
      "IMAGE"
    ],
    [
      3,
      2,
      0,
      5,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.7627768444385543,
      "offset": [
        184.57715485740746,
        -336.65995789100543
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Grounding DINO と SAMの改良版である HQ-SAM の組み合わせです。

テキストで対象を指定しつつ、高精度なマスクを生成出来るので、最も使われる組み合わせの一つです。

Florence2 × SAM2

Florence2_SAM2.1.json

{
  "id": "b13968f1-cfe5-4646-9f22-ac07831aae2b",
  "revision": 0,
  "last_node_id": 33,
  "last_link_id": 41,
  "nodes": [
    {
      "id": 27,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        797.5498046875,
        435.3081359863281
      ],
      "size": [
        270,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            28
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "de485b65b3e1b9b887ab494afa236dff4bef9a7e",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 30,
      "type": "Florence2toCoordinates",
      "pos": [
        1548.1920166015625,
        275.46484375
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "data",
          "type": "JSON",
          "link": 36
        }
      ],
      "outputs": [
        {
          "name": "center_coordinates",
          "type": "STRING",
          "links": null
        },
        {
          "name": "bboxes",
          "type": "BBOX",
          "links": [
            37
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
        "Node name for S&R": "Florence2toCoordinates"
      },
      "widgets_values": [
        "0",
        false
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 16,
      "type": "LoadImage",
      "pos": [
        797.5498046875,
        -13.30300235748291
      ],
      "size": [
        270,
        392.65997314453125
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            26,
            34,
            41
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "Clipboard - 2025-05-13 21.27.11.png",
        "image"
      ]
    },
    {
      "id": 29,
      "type": "InvertMask",
      "pos": [
        2183.08349609375,
        215.1739044189453
      ],
      "size": [
        140,
        26
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 38
        }
      ],
      "outputs": [
        {
          "name": "MASK",
          "type": "MASK",
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "InvertMask"
      },
      "widgets_values": []
    },
    {
      "id": 23,
      "type": "PreviewImage",
      "pos": [
        2585.65771484375,
        -6.269532203674316
      ],
      "size": [
        374.6875305175781,
        390.1878356933594
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 32
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 32,
      "type": "Sam2Segmentation",
      "pos": [
        1870.6756591796875,
        216.38262939453125
      ],
      "size": [
        272.087890625,
        182
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "sam2_model",
          "type": "SAM2MODEL",
          "link": 40
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 41
        },
        {
          "name": "coordinates_positive",
          "shape": 7,
          "type": "STRING",
          "link": null
        },
        {
          "name": "coordinates_negative",
          "shape": 7,
          "type": "STRING",
          "link": null
        },
        {
          "name": "bboxes",
          "shape": 7,
          "type": "BBOX",
          "link": 37
        },
        {
          "name": "mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "mask",
          "type": "MASK",
          "links": [
            38
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
        "Node name for S&R": "Sam2Segmentation"
      },
      "widgets_values": [
        true,
        false
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 28,
      "type": "JoinImageWithAlpha",
      "pos": [
        2368.4716796875,
        -6.269532203674316
      ],
      "size": [
        176.86484375,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 34
        },
        {
          "name": "alpha",
          "type": "MASK",
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            32
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "JoinImageWithAlpha"
      },
      "widgets_values": []
    },
    {
      "id": 33,
      "type": "DownloadAndLoadSAM2Model",
      "pos": [
        1548.1920166015625,
        82.7560043334961
      ],
      "size": [
        270,
        130
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "sam2_model",
          "type": "SAM2MODEL",
          "links": [
            40
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "c59676b008a76237002926f684d0ca3a9b29ac54",
        "Node name for S&R": "DownloadAndLoadSAM2Model"
      },
      "widgets_values": [
        "sam2.1_hiera_base_plus.safetensors",
        "single_image",
        "cuda",
        "fp16"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 25,
      "type": "Florence2Run",
      "pos": [
        1107.8709716796875,
        74.4581298828125
      ],
      "size": [
        400,
        364
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 26
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 28
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": []
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": null
        },
        {
          "name": "data",
          "type": "JSON",
          "links": [
            36
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "de485b65b3e1b9b887ab494afa236dff4bef9a7e",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "goldfish",
        "caption_to_phrase_grounding",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      26,
      16,
      0,
      25,
      0,
      "IMAGE"
    ],
    [
      28,
      27,
      0,
      25,
      1,
      "FL2MODEL"
    ],
    [
      32,
      28,
      0,
      23,
      0,
      "IMAGE"
    ],
    [
      34,
      16,
      0,
      28,
      0,
      "IMAGE"
    ],
    [
      35,
      29,
      0,
      28,
      1,
      "MASK"
    ],
    [
      36,
      25,
      3,
      30,
      0,
      "JSON"
    ],
    [
      37,
      30,
      1,
      32,
      4,
      "BBOX"
    ],
    [
      38,
      32,
      0,
      29,
      0,
      "MASK"
    ],
    [
      40,
      33,
      0,
      32,
      0,
      "SAM2MODEL"
    ],
    [
      41,
      16,
      0,
      32,
      1,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.620921323059155,
      "offset": [
        -697.5498046875,
        113.30300235748291
      ]
    },
    "reroutes": [
      {
        "id": 1,
        "pos": [
          1829.7442626953125,
          3.2779242992401123
        ],
        "linkIds": [
          34,
          41
        ]
      }
    ],
    "linkExtensions": [
      {
        "id": 34,
        "parentId": 1
      },
      {
        "id": 41,
        "parentId": 1
      }
    ],
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Florence2 と SAM2.1 の組み合わせです。

人や動物といった分かりやすい対象ならなんでも良いのですが、「サングラスをかけた男性」「木の下に寝転んだ猫」など複雑な条件で指定したいときは、このようなLLMベースのモデルが力を発揮します。

🔥SAM 3

SAM3_Segmentation_(RMBG).json

{
  "id": "45213769-31e7-40a4-9027-26c67d437c51",
  "revision": 0,
  "last_node_id": 11,
  "last_link_id": 11,
  "nodes": [
    {
      "id": 6,
      "type": "PreviewImage",
      "pos": [
        410.5883107288138,
        420.92796486120585
      ],
      "size": [
        597.0143975156826,
        437.7992150216443
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 4,
      "type": "LoadImage",
      "pos": [
        -513.4050648613645,
        420.92796486120585
      ],
      "size": [
        507.5333607299855,
        441.38462274968504
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            11
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (34).png",
        "image"
      ]
    },
    {
      "id": 3,
      "type": "SAM3Segment",
      "pos": [
        32.358303298717374,
        420.92796486120585
      ],
      "size": [
        340,
        332
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 11
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.4",
        "Node name for S&R": "SAM3Segment"
      },
      "widgets_values": [
        "a woman wearing an apron",
        "sam3",
        "Auto",
        0.5,
        0,
        0,
        false,
        "Color",
        "#00ff00"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      4,
      3,
      0,
      6,
      0,
      "IMAGE"
    ],
    [
      11,
      4,
      0,
      3,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.0152559799477263,
      "offset": [
        613.4050648613645,
        -320.92796486120585
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

SAMの最新版で、テキスト指示にも対応しており、物体検出とセグメンテーションを一度に実行できます。

精度も性能も速度も優れているので、とりあえずこれを使いましょう(´ε｀ )

さらに複雑なことをしたい場合は、Ltamann/ComfyUI-TBG-SAM3 などのカスタムノードも試してみてください。

SAM 3 × BiRefNet

SAM3_BiRefNet.json

{
  "id": "5231bbde-3d9e-483d-9963-63165fedc646",
  "revision": 0,
  "last_node_id": 12,
  "last_link_id": 18,
  "nodes": [
    {
      "id": 2,
      "type": "PreviewImage",
      "pos": [
        1836.5379900055684,
        293.7408968602474
      ],
      "size": [
        554.9600255276209,
        422.8923553539689
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 17
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 1,
      "type": "LoadImage",
      "pos": [
        477.2842309638515,
        293.7408968602474
      ],
      "size": [
        526.1926943110356,
        491.5335516952887
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            18
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (35).png",
        "image"
      ]
    },
    {
      "id": 11,
      "type": "BiRefNetRMBG",
      "pos": [
        1445.5176350953413,
        293.7408968602474
      ],
      "size": [
        340,
        254
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 16
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            17
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.4",
        "Node name for S&R": "BiRefNetRMBG"
      },
      "widgets_values": [
        "BiRefNet-general",
        0,
        0,
        false,
        false,
        "Alpha",
        "#222222"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 5,
      "type": "PreviewImage",
      "pos": [
        1448.15746204173,
        611.2211523676546
      ],
      "size": [
        332.392016078781,
        258
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 4
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.71",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 4,
      "type": "SAM3Segment",
      "pos": [
        1054.497280185114,
        293.7408968602474
      ],
      "size": [
        340,
        332
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 18
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4,
            16
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": []
        },
        {
          "name": "MASK_IMAGE",
          "type": "IMAGE",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-rmbg",
        "ver": "2.9.4",
        "Node name for S&R": "SAM3Segment"
      },
      "widgets_values": [
        "the woman on the right",
        "sam3",
        "Auto",
        0.5,
        0,
        7,
        false,
        "Color",
        "#00ff00"
      ],
      "color": "#432",
      "bgcolor": "#653"
    }
  ],
  "links": [
    [
      4,
      4,
      0,
      5,
      0,
      "IMAGE"
    ],
    [
      16,
      4,
      0,
      11,
      0,
      "IMAGE"
    ],
    [
      17,
      11,
      0,
      2,
      0,
      "IMAGE"
    ],
    [
      18,
      1,
      0,
      4,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.8390545288824087,
      "offset": [
        -377.2842309638515,
        -193.7408968602474
      ]
    },
    "frontendVersion": "1.33.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

セグメンテーションはそもそもオブジェクトを区別するものであり、精細な切り抜きに使うものではありません。

対して、マッティングは髪の毛のような微細なものや、ガラスのような半透明なものも扱えます。

これらを組み合わせることでお互いの能力をかけ合わせることができます。

AIを使ったマスク生成