提示词生成・编辑

什么是提示词生成・编辑？

在除了提示词以外几乎没有参数可以正常触碰的时期，提示词工程学（Prompt Engineering）和“咒语”这个词非常流行（真怀念啊）。

与现在的自然语言提示词相比，面向 Stable Diffusion 1.5 的提示词就像是罗列了标签的咒语一样。模型侧的理解能力也很低，需要一边观察实际的输出一边反复试验提示词。

但是，每次都由人类来写这个很麻烦，而且难免会变成工匠技艺。试图让 LLM 来分担这一部分的，就是本页面所说的“提示词生成・编辑”。

Stable Diffusion 时代的提示词生成

Stable Diffusion / SDXL 世代的模型无法很好地理解自然语言，基本写法是将逗号分隔的标签连起来。

masterpiece, (best quality:1.05), 1girl, blue hair, …

虽然会做一些排列意义相近的单词、贴近模型学习所用文本的习惯……这样的功夫，但每次都要手动构建“偏向 AI 的写法”很麻烦。

因此登场的就是，“将粗略编写的提示词转换为 Stable Diffusion 风格的标签序列”的专用模型。

代表例

dart
- 生成 Danbooru 标签序列的轻量模型。如果在这个模型中输入粗略的标签或说明，它会将其转换为适合 Stable Diffusion 的浓厚标签序列。
Qwen 1.8B Stable Diffusion Prompt
- 专注于 SD 用提示词生成（日语→英语标签序列等）的小型 Qwen 系模型。

两者都不是为了“人类是否容易阅读”，而是专注于 吐出 SD1.5 / SDXL 容易处理形式的提示词 的工具。

最近的模型和提示词

像 FLUX 这样的 DiT 系模型，以及最近的图像编辑模型，其文本编码器变成了 T5 或 Qwen 等 LLM 基础的文本编码器。

多亏了这一点，与 Stable Diffusion 时代相比，自然语言的解释能力大幅提升，所谓的“咒语提示词”那样的技巧几乎不再需要了。

另一方面，也并不是“随便写写就能稳定地得出好结果”。

面对人类也是一样的。可以说，简洁地说明以下要素是优秀导演的工作。

距离、视角、焦距、时间段、张数等定量信息
背景、构图、布光、风格、表情等各要素的指定

虽说如此，每次手写这些很麻烦，所以使用 ChatGPT 等 LLM。即使是“把这个日语提示词详细化给 FLUX.2 用”、“加上构图、布光、相机信息进行整形”、“把这个提示词整形为 Qwen-Image 用”这样粗略的委托，也足以提升提示词的密度。

根据图像生成模型的不同，有时也会准备专用的 LLM，但并不会有那么大的改善。归根结底，图像生成模型的性能才是最重要的。

在 ComfyUI 中的运用

虽然也有一些可以在 ComfyUI 本地运行的 LLM，但也请考虑 通过 API 节点调用 Gemini 或 ChatGPT。

Z-Image_Gemini-3.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 60,
  "last_link_id": 105,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        492,
        394.392333984375
      ],
      "size": [
        418.3189392089844,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        250.6552734375,
        -167.9522705078125
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_turbo_bf16.safetensors",
        "fp8_e4m3fn"
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        586.9390258789062,
        -167.9522705078125
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        492,
        175
      ],
      "size": [
        330.26959228515625,
        142.00363159179688
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 102
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        120.78603616968121,
        342.5854112036154
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ]
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        597.2695922851562,
        482.05751390379885
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptySD3LatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1442.0747874475098,
        188.22962825237536
      ],
      "size": [
        510.21224258223606,
        595.4940064248622
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 59,
      "type": "PreviewAny",
      "pos": [
        492,
        1.5167060232018699
      ],
      "size": [
        330,
        111
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 104
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": []
    },
    {
      "id": 57,
      "type": "GeminiNode",
      "pos": [
        131.26602226763393,
        0.08407710682253366
      ],
      "size": [
        273,
        266
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        },
        {
          "name": "audio",
          "shape": 7,
          "type": "AUDIO",
          "link": null
        },
        {
          "name": "video",
          "shape": 7,
          "type": "VIDEO",
          "link": null
        },
        {
          "name": "files",
          "shape": 7,
          "type": "GEMINI_INPUT_FILES",
          "link": null
        },
        {
          "name": "prompt",
          "type": "STRING",
          "widget": {
            "name": "prompt"
          },
          "link": 105
        }
      ],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            102,
            104
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "GeminiNode"
      },
      "widgets_values": [
        "",
        "gemini-3-pro-preview",
        12345,
        "fixed",
        "Status: Completed\nPrice: $0.0113\nTime elapsed: 10s"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -136.07276600955444,
        -300.4671673650518
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ]
    },
    {
      "id": 60,
      "type": "StringConcatenate",
      "pos": [
        -181.55781163713942,
        -8.244166137499546
      ],
      "size": [
        283.8399999999999,
        276.23
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            105
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "StringConcatenate"
      },
      "widgets_values": [
        "You are a prompt refiner for image generation models (e.g. Stable Diffusion, FLUX, Qwen-Image, etc.).\n\nThe user will give you a short, rough prompt describing an image. Your job is to rewrite it into a single, detailed prompt that is easy for an image generation model to follow.\n\nInteraction rules:\n- This is strictly single-turn. For each input, you read the rough prompt once and respond once.\n- Do NOT ask the user questions.\n- Do NOT rely on or refer to any previous conversation history.\n\nGoals:\n- Keep the same core subject, theme, and intent as the original prompt.\n- Do NOT change the meaning or add new story elements; only clarify and enrich what is already implied.\n- Make implicit visual details explicit: subject appearance, pose, composition, environment, lighting, mood, and style.\n- Focus only on what should be visible in a single still image.\n\nWhen expanding the prompt:\n- Prefer concrete, visual, testable details over emotional or metaphorical language.\n- Describe:\n  - Who or what is in the image (age, gender expression, clothing, notable features, materials, etc.).\n  - Pose and action of the main subject.\n  - Camera and composition (shot type, angle, distance, framing, depth of field).\n  - Environment and background (indoor/outdoor, location type, props, weather, time of day).\n  - Lighting (soft/hard, key direction, contrast, highlights, reflections, etc.).\n  - Color palette and overall mood, if implied.\n  - Rendering style (photograph, watercolor, anime illustration, 3D render, flat graphic, etc.), based on the user’s words.\n- If the prompt clearly suggests a photograph, add subtle camera details (for example: lens focal length, aperture, high-resolution, realistic textures) but keep them plausible and not overly technical.\n- If the prompt clearly suggests illustration or anime style, describe line quality, shading style, and level of detail instead of camera specs.\n- Do not invent extra characters, locations, or objects that are not suggested in the original prompt.\n\nLanguage rules:\n- Always respond in English, regardless of the input language.\n- Use one concise paragraph or 1–3 sentences, not a long list.\n- Avoid overly poetic or flowery language; keep it functional and descriptive.\n- Do NOT mention “prompt”, “model”, “negative prompt”, “system prompt”, or give any meta commentary.\n\nOutput format:\n- Output ONLY the refined image-generation prompt as plain text.\n- Do NOT add explanations, headings, bullet points, quotes, or any extra filler.\n",
        "万華鏡の中で撮影したかのようなファッションショー",
        "---"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        999.1927782010846,
        509.5303495842456
      ],
      "size": [
        210,
        58
      ],
      "flags": {
        "collapsed": false
      },
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        55555,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        1
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      102,
      57,
      0,
      6,
      1,
      "STRING"
    ],
    [
      104,
      57,
      0,
      59,
      0,
      "*"
    ],
    [
      105,
      60,
      0,
      57,
      4,
      "STRING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.3310000000000004,
      "offset": [
        66.47718451467517,
        14.701606057764138
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

我自己也是一个想要坚持使用本地模型的人，但老实说，相比于运行图像生成模型，在本地常用具有一定质量的 LLM，对 PC 规格的要求往往更严苛。

值得庆幸的是，LLM 的 API 使用费用相当便宜。很久以前购买的 5 美元额度到现在还没用完 (´・ω・｀)

提示词生成・编辑

什么是提示词生成・编辑？

Stable Diffusion 时代的提示词生成

代表例

最近的模型和提示词

在 ComfyUI 中的运用

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！