What is SAM 3 / 3.1?

SAM 3 is a new model in Meta's Segment Anything Model series.

Earlier SAM models could understand the shape of objects, but to cut out a specific object, you needed to specify its location with a BBOX or coordinates.

With SAM 3, you can specify the target with text, like a VLM, and complete segmentation with SAM alone.

SAM 3.1 is an updated version of SAM 3. It improves tracking of multiple objects in video.


Model Download

📂ComfyUI/
└── 📂models/
    └── 📂checkpoints/
        └── sam3.1_multiplex_fp16.safetensors

workflow

Still Image

SAM3.1.json
  • Input the image, mask, and target information (text prompt, BBOX, coordinates) into the SAM3 Detect node.
  • The behavior is a little tricky. If multiple objects match the prompt, simply writing car detects only the most likely one.
    • If you want to segment up to the N-th item, write it like car:N.
    • If you simply want to detect all visible matching objects, writing something like car:99 is also fine.

Video

SAM3.1_video.json
  • Use the SAM3 Video Track node.
  • Pass the output to the SAM3 Track to Mask node to use it as a mask.
  • The SAM3 Track Preview node takes an image and track_data, then colors the masked area so it is easier to see.