LLM / MLLM

What is LLM / MLLM?

An LLM is, very roughly speaking, an AI that reads text and responds with text — like ChatGPT.

An MLLM is an LLM that can also accept images and other inputs. As the name suggests, it's a "Multimodal" LLM.

What do you use it for in ComfyUI?

In ComfyUI, LLMs are used less for conversation and more as a "behind-the-scenes" helper — creating the baton to pass to image generation models.

Prompt expansion & translation
- Turning a rough human instruction into a detailed English prompt that AI can understand
Tag generation & image captioning
- Show it an image and have it output tags or a description
- Useful for training captions, or as a prompt for re-generation
Object detection & segmentation
- Some MLLMs can handle more specialized tasks
- MLLM-based object detection is especially handy because you can specify targets in natural language

4 ways to use LLMs in ComfyUI

ComfyUI is an engine specialized for image generation, so its LLM functionality is limited — the underlying mechanisms are completely different.

That means you'll generally use core nodes, custom nodes, or external integrations.

TextGenerate node

This is a recently added core node that tries to let you use the text encoder from image generation models as an LLM / MLLM.

Because it is forced to run entirely inside ComfyUI, it is much slower and less capable than dedicated engines such as llama-cpp.

It is certainly nice that this works in the core at all, but at the moment it is still hard to recommend.

TextGenerate_gemma3.json

{
  "id": "cc2eec2f-681d-45b7-a301-a8f315a9bce8",
  "revision": 0,
  "last_node_id": 7,
  "last_link_id": 6,
  "nodes": [
    {
      "id": 2,
      "type": "CLIPLoader",
      "pos": [
        729.6315004012546,
        764.4307765124615
      ],
      "size": [
        270,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            1
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.17.0",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "gemma_3_12B_it_fp8_scaled.safetensors",
        "stable_diffusion",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 3,
      "type": "LoadImage",
      "pos": [
        467.40009544257674,
        931.5659151585422
      ],
      "size": [
        270,
        326
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            4
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.18.0",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "000032_00007_.png",
        "image"
      ]
    },
    {
      "id": 4,
      "type": "PreviewAny",
      "pos": [
        1496.5240313386585,
        764.4307765124615
      ],
      "size": [
        286,
        154
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 3
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.18.0",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": [
        null,
        null,
        null
      ]
    },
    {
      "id": 5,
      "type": "ResizeImageMaskNode",
      "pos": [
        762.4053115790771,
        932.9639654966344
      ],
      "size": [
        236.556640625,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 4
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            5
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.18.0",
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale total pixels",
        0.25,
        "nearest-exact"
      ]
    },
    {
      "id": 1,
      "type": "TextGenerate",
      "pos": [
        1056.12688558396,
        764.4307765124615
      ],
      "size": [
        400,
        300
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 1
        },
        {
          "name": "image",
          "shape": 7,
          "type": "IMAGE",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "generated_text",
          "type": "STRING",
          "links": [
            3
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.17.0",
        "Node name for S&R": "TextGenerate"
      },
      "widgets_values": [
        "Please describe this image in detail in 200 characters",
        256,
        "on",
        0.7,
        64,
        0.95,
        0.05,
        1.05,
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      1,
      2,
      0,
      1,
      0,
      "CLIP"
    ],
    [
      3,
      1,
      0,
      4,
      0,
      "STRING"
    ],
    [
      4,
      3,
      0,
      5,
      0,
      "IMAGE"
    ],
    [
      5,
      5,
      0,
      1,
      1,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1,
      "offset": [
        -258.6635250536924,
        -464.0999220954758
      ]
    },
    "frontendVersion": "1.41.21",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Supported models

Gemma 3
Qwen3
Qwen-3.5

ComfyUI custom nodes

Download a model file and run it on your own PC, just like image generation models.

Lightweight models specialized for specific tasks — like caption generation or object detection — are the main use case here.

Representative supported models

External LLM server integration

Delegate LLM inference to a dedicated engine like Ollama or LM Studio, and call it from ComfyUI via API.

Running on the same PC still means competing for VRAM, but the key advantage is keeping the inference environment separate from ComfyUI.

No pollution of ComfyUI's dependencies, making maintenance easier
Run it on a separate PC and connect over the network to eliminate VRAM contention entirely

→ External LLM Server Integration

Official paid API nodes

ComfyUI's official nodes for calling closed services like ChatGPT or Gemini via API.

Bluntly put, these are far smarter and faster than local models.

Zero load on your PC. You can run image generation while prompts are being refined in the background, with no impact on generation speed
That said, pay-as-you-go billing applies, and NSFW content will be blocked by guardrails, so keep that in mind

→ API Nodes

Side note: Are you already using one?

Recent image generation models (like Qwen-Image and Z-Image) embed MLLMs such as Qwen or Gemma as their text encoder — the component that understands your prompt.

They use it to understand text prompts and reference images for generation and editing, but in a sense, that's all they're using it for — a rather lavish arrangement. It would be interesting if we could use that MLLM directly for other purposes someday…

LLM / MLLM

What is LLM / MLLM?

What do you use it for in ComfyUI?

4 ways to use LLMs in ComfyUI

TextGenerate node

ComfyUI custom nodes

External LLM server integration

Official paid API nodes

Side note: Are you already using one?

What is the JSON copy button?

This page has an issue!

Please explain more!

Feedback / Other

Thank you