Florence-2

什么是 Florence-2？

是看图像进行说明文生成・物体检出・分割・OCR 等，1 个模型能完成几个任务的通用 VLM（Visual Language Model）。

在这一页，聚焦于在 ComfyUI 经常使用的“说明文生成”“物体检出（坐标抽出）”“OCR”“关于图像的 Q&A”这 4 个进行处理。

自定义节点

kijai/ComfyUI-Florence2
- 模型在最初执行时被自动下载。

Florence2Run 节点

Florence2Run 是，为了对输入图像让 Florence-2 执行任务的主节点。通过切换 task，可以区分使用说明文生成或物体检出、OCR 等功能。

caption, detailed caption

从图像生成自然文的说明文。

Florence2-detailed_caption.json

{
  "id": "063054af-873b-492c-a642-b59c68b22c0b",
  "revision": 0,
  "last_node_id": 12,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 4,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        349.41423462195155,
        229.87996065705917
      ],
      "size": [
        286.86661124741727,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base-ft",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 1,
      "type": "Florence2Run",
      "pos": [
        674.4302630294422,
        423.43518886551453
      ],
      "size": [
        313.6363636363636,
        364
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": []
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": [
            13
          ]
        },
        {
          "name": "data",
          "type": "JSON",
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "",
        "detailed_caption",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        248.54931487603312,
        423.43518886551453
      ],
      "size": [
        390.44371448863615,
        395.81818181818187
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (74).png",
        "image"
      ]
    },
    {
      "id": 12,
      "type": "PreviewAny",
      "pos": [
        1025.4266881474668,
        427.6300114135301
      ],
      "size": [
        297.27272727272725,
        182.36363636363637
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 13
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": [
        null,
        null,
        false
      ]
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      3,
      4,
      0,
      1,
      1,
      "FL2MODEL"
    ],
    [
      13,
      1,
      2,
      12,
      0,
      "*"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1,
      "offset": [
        222.45068512396688,
        -43.87996065705917
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

caption
- 简单地说明图像的概要。
detailed caption
- 稍微详细地说明构图或外观。

但是，如果目的只是“提示词用的说明文”，使用 JoyCaption 等，说明文专用模型的一方会出现遥远地更柔软且高质量的东西。

caption_to_phrase_grounding

每指定说明文的短语，以矩形（边界框）的形式输出物体的位置。

Florence2-caption_to_phrase_grounding.json

{
  "id": "063054af-873b-492c-a642-b59c68b22c0b",
  "revision": 0,
  "last_node_id": 11,
  "last_link_id": 12,
  "nodes": [
    {
      "id": 4,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        349.41423462195155,
        229.87996065705917
      ],
      "size": [
        286.86661124741727,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base-ft",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 1,
      "type": "Florence2Run",
      "pos": [
        674.4302630294422,
        423.43518886551453
      ],
      "size": [
        313.6363636363636,
        364
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": [
            2
          ]
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": []
        },
        {
          "name": "data",
          "type": "JSON",
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "fox",
        "caption_to_phrase_grounding",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1023.5038603305788,
        423.43518886551453
      ],
      "size": [
        419.6727272727271,
        391.9818181818181
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 2
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 10,
      "type": "DownloadAndLoadSAM2Model",
      "pos": [
        1031.2774982383762,
        876.8182919589856
      ],
      "size": [
        210,
        130
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "sam2_model",
          "type": "SAM2MODEL",
          "links": [
            10
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "0c35fff5f382803e2310103357b5e985f5437f32",
        "Node name for S&R": "DownloadAndLoadSAM2Model"
      },
      "widgets_values": [
        "sam2.1_hiera_base_plus.safetensors",
        "single_image",
        "cuda",
        "fp16"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        248.54931487603312,
        423.43518886551453
      ],
      "size": [
        390.44371448863615,
        395.81818181818187
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1,
            11
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (73).png",
        "image"
      ]
    },
    {
      "id": 11,
      "type": "MaskPreview",
      "pos": [
        1535.0502255111053,
        980.9273828680758
      ],
      "size": [
        374.29999999999995,
        323
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "mask",
          "type": "MASK",
          "link": 12
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "MaskPreview"
      },
      "widgets_values": []
    },
    {
      "id": 8,
      "type": "Florence2toCoordinates",
      "pos": [
        1030.8481877951024,
        1066.5042611550825
      ],
      "size": [
        210,
        102
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "data",
          "type": "JSON",
          "link": 7
        }
      ],
      "outputs": [
        {
          "name": "center_coordinates",
          "type": "STRING",
          "links": [
            8
          ]
        },
        {
          "name": "bboxes",
          "type": "BBOX",
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "0c35fff5f382803e2310103357b5e985f5437f32",
        "Node name for S&R": "Florence2toCoordinates"
      },
      "widgets_values": [
        "0",
        false
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 9,
      "type": "Sam2Segmentation",
      "pos": [
        1281.994151431467,
        982.5618884278075
      ],
      "size": [
        212.087890625,
        182
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "sam2_model",
          "type": "SAM2MODEL",
          "link": 10
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 11
        },
        {
          "name": "coordinates_positive",
          "shape": 7,
          "type": "STRING",
          "link": 8
        },
        {
          "name": "coordinates_negative",
          "shape": 7,
          "type": "STRING",
          "link": null
        },
        {
          "name": "bboxes",
          "shape": 7,
          "type": "BBOX",
          "link": 9
        },
        {
          "name": "mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "mask",
          "type": "MASK",
          "links": [
            12
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-segment-anything-2",
        "ver": "0c35fff5f382803e2310103357b5e985f5437f32",
        "Node name for S&R": "Sam2Segmentation"
      },
      "widgets_values": [
        false,
        false
      ],
      "color": "#323",
      "bgcolor": "#535"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      2,
      1,
      0,
      3,
      0,
      "IMAGE"
    ],
    [
      3,
      4,
      0,
      1,
      1,
      "FL2MODEL"
    ],
    [
      7,
      1,
      3,
      8,
      0,
      "JSON"
    ],
    [
      8,
      8,
      0,
      9,
      2,
      "STRING"
    ],
    [
      9,
      8,
      1,
      9,
      4,
      "BBOX"
    ],
    [
      10,
      10,
      0,
      9,
      0,
      "SAM2MODEL"
    ],
    [
      11,
      2,
      0,
      9,
      1,
      "IMAGE"
    ],
    [
      12,
      9,
      0,
      11,
      0,
      "MASK"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.8264462809917358,
      "offset": [
        -56.58931487603314,
        -89.94996065705918
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

能取到“left tree”“red car”等，稍微复杂的指示的位置是特征。
🟨 用 Florence2 Coordinates 节点取出坐标，通过与 SAM2 等的分割模型组合，可以做只将特定的物体掩膜化这样的使用方法。

ocr

读取图像内的文字，作为文本输出。

Florence2-ocr.json

{
  "id": "063054af-873b-492c-a642-b59c68b22c0b",
  "revision": 0,
  "last_node_id": 12,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 4,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        349.41423462195155,
        229.87996065705917
      ],
      "size": [
        286.86661124741727,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base-ft",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 12,
      "type": "PreviewAny",
      "pos": [
        1025.4266881474668,
        427.6300114135301
      ],
      "size": [
        297.27272727272725,
        182.36363636363637
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 13
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": [
        null,
        null,
        null
      ]
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        248.54931487603312,
        423.43518886551453
      ],
      "size": [
        390.44371448863615,
        395.81818181818187
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (75).png",
        "image"
      ]
    },
    {
      "id": 1,
      "type": "Florence2Run",
      "pos": [
        674.4302630294422,
        423.43518886551453
      ],
      "size": [
        313.6363636363636,
        364
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": []
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": [
            13
          ]
        },
        {
          "name": "data",
          "type": "JSON",
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "",
        "ocr",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      3,
      4,
      0,
      1,
      1,
      "FL2MODEL"
    ],
    [
      13,
      1,
      2,
      12,
      0,
      "*"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.2100000000000006,
      "offset": [
        -148.54931487603312,
        -129.87996065705917
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

docvqa

回答关于图像的问题的任务。

Florence2-docvqa.json

{
  "id": "063054af-873b-492c-a642-b59c68b22c0b",
  "revision": 0,
  "last_node_id": 12,
  "last_link_id": 13,
  "nodes": [
    {
      "id": 4,
      "type": "DownloadAndLoadFlorence2Model",
      "pos": [
        349.41423462195155,
        229.87996065705917
      ],
      "size": [
        286.86661124741727,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "lora",
          "shape": 7,
          "type": "PEFTLORA",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "DownloadAndLoadFlorence2Model"
      },
      "widgets_values": [
        "microsoft/Florence-2-base-ft",
        "fp16",
        "sdpa",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 12,
      "type": "PreviewAny",
      "pos": [
        1025.4266881474668,
        427.6300114135301
      ],
      "size": [
        297.27272727272725,
        182.36363636363637
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 13
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": [
        null,
        null,
        null
      ]
    },
    {
      "id": 2,
      "type": "LoadImage",
      "pos": [
        248.54931487603312,
        423.43518886551453
      ],
      "size": [
        390.44371448863615,
        395.81818181818187
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pasted/image (76).png",
        "image"
      ]
    },
    {
      "id": 1,
      "type": "Florence2Run",
      "pos": [
        674.4302630294422,
        423.43518886551453
      ],
      "size": [
        313.6363636363636,
        364
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 1
        },
        {
          "name": "florence2_model",
          "type": "FL2MODEL",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "links": []
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        },
        {
          "name": "caption",
          "type": "STRING",
          "links": [
            13
          ]
        },
        {
          "name": "data",
          "type": "JSON",
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfyui-florence2",
        "ver": "00b63382966a444a9fefacb65b8deb188d12a458",
        "Node name for S&R": "Florence2Run"
      },
      "widgets_values": [
        "How many eggs are on the ramen?",
        "docvqa",
        true,
        false,
        1024,
        3,
        true,
        "",
        1234,
        "fixed"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      0,
      "IMAGE"
    ],
    [
      3,
      4,
      0,
      1,
      1,
      "FL2MODEL"
    ],
    [
      13,
      1,
      2,
      12,
      0,
      "*"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.2100000000000006,
      "offset": [
        -148.54931487603312,
        -129.87996065705917
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

投出“这幅图像中〇〇在哪里？”“这个表的值是？”之类的问题，可以以文本接收回答。
是向 ChatGPT 投图像提问相似的使用方法的印象。

Florence-2

什么是 Florence-2？

自定义节点

Florence2Run 节点

caption, detailed caption

caption_to_phrase_grounding

ocr

docvqa

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！

Florence-2

什么是 Florence-2？

自定义节点

Florence2Run 节点

caption, detailed caption

caption_to_phrase_grounding

ocr

docvqa

相关工作流