There are actually various types of segmentation, but in the context of ComfyUI, it is a technology to create a mask of the shape of an object specified by points, BBOX, mask, or text.
Difference from Matting
You might feel it is similar to matting, but there are the following major differences.
Matting
Creates a mask with a binary choice of "foreground" or "background"
Segmentation
The purpose is to cut out (create a mask for) an arbitrary object
So, is matting a subset of segmentation? Not at all, their roles are fundamentally different.
Segmentation is more like a classification task
Matting is capable of high-quality cutouts that allow for transparency
When we say "segmentation" in ComfyUI, what is actually used is almost always SAM (Segment Anything) family models.
I said the purpose is to cut out (create a mask for) an arbitrary object, but for that, the AI needs to understand "the shape of that object" to some extent.
For example, if there is a fruit basket on a desk and you want to cut out an apple, if it doesn't know the shape of an "apple", it doesn't know how far to treat as an apple. SAM achieved this.
Main Models
SAM
The initial model announced by Meta.
Just click anywhere, and it returns the mask of that area.
A derivative model based on SAM with improved mask quality.
SAM 2 / 2.1
Supports video as well. It can output a mask while tracking the same object in a video.
SAM 3
It became possible to specify the target by text.
Until now, it was necessary to specify the target by points or BBOX, so if you wanted to create a mask automatically, you had to combine it with object detection.
Use Cases in ComfyUI
It is used in all kinds of situations, from cutouts to inpainting.
In addition, Segment Anything Playground has many examples such as blurring faces or making the background black and white. (By the way, most of these can be reproduced with ComfyUI.)
Before SAM 2.1, text object specification was not possible, so it was often used in combination with Object Detection such as Grounding DINO or Florence2.
SAM 3 can specify by text, but in terms of object detection, models better than SAM will likely appear in the future, so let's remember this combination.
Supplement: Segmentation Before SAM
Textbook-wise, segmentation has the following classifications.