Flux.1 Kontext

什么是 Flux.1 Kontext？

Flux.1 Kontext 是以 Flux.1 为基础的指令基础图像编辑模型。

点燃以 nano banana 为首的 AI 图像编辑这个任务流行的导火索的，毫无疑问就是这个模型吧。

和 Flux.1 一样有 pro/max/dev 3 个变体，但能在本地使用的只有 dev。

什么是指令基础图像编辑？

输入图像和文本指令，就会遵从那个指令编辑图像的模型，本站称之为 指令基础图像编辑模型。

例如，想把照片里写着的女性的头发变成红色。
至今为止，是掩盖头发，为了不想改变发型而添加 ControlNet Canny，在此之上用“红发的女性照片”等提示词进行 inpainting。

如果是指令基础图像编辑就很简单。将图像交给模型，像制作人拜托设计师那样指示 “把女性的头发变红”就行了。

改变表情，删除碍事的对象，转换画风。

全部，仅靠一个模型和提示词就能实现。

模型的下载

Kontext 也是，基本构成和通常的 Flux.1 一样。

diffusion_models
- flux1-dev-kontext_fp8_scaled.safetensors
clip / T5 / VAE
gguf（任意）
- flux1-kontext-dev.gguf

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── flux1-dev-kontext_fp8_scaled.safetensors
    ├── 📂clip/
    │   ├── clip_l.safetensors
    │   └── t5xxl_fp8_e4m3fn_scaled.safetensors
    ├── 📂vae/
    │   └── ae.safetensors
    └── 📂unet/
        └── flux1-kontext-dev.gguf      ← 仅在使用 gguf 时

工作流（基本形）

Kontext 的工作流本身，是在通常的 Flux.1 中只添加了 ReferenceLatent 的简单构成。

Flux.1-Kontext.json

{
  "id": "18404b37-92b0-4d11-a39c-ae941838eb83",
  "revision": 0,
  "last_node_id": 77,
  "last_link_id": 131,
  "nodes": [
    {
      "id": 51,
      "type": "ReferenceLatent",
      "pos": [
        883.7505187988281,
        190
      ],
      "size": [
        204.134765625,
        46
      ],
      "flags": {
        "collapsed": false
      },
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "conditioning",
          "type": "CONDITIONING",
          "link": 74
        },
        {
          "name": "latent",
          "shape": 7,
          "type": "LATENT",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            114
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.41",
        "Node name for S&R": "ReferenceLatent"
      },
      "widgets_values": [],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 68,
      "type": "FluxGuidance",
      "pos": [
        1115.2528076171875,
        190
      ],
      "size": [
        211.3223114013672,
        58
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "conditioning",
          "type": "CONDITIONING",
          "link": 114
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            115
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.41",
        "Node name for S&R": "FluxGuidance"
      },
      "widgets_values": [
        3.5
      ]
    },
    {
      "id": 69,
      "type": "DualCLIPLoader",
      "pos": [
        174.92930603027344,
        261.95574951171875
      ],
      "size": [
        270,
        130
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            117,
            118
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.41",
        "Node name for S&R": "DualCLIPLoader"
      },
      "widgets_values": [
        "clip_l.safetensors",
        "t5xxl_fp8_e4m3fn.safetensors",
        "flux",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 33,
      "type": "CLIPTextEncode",
      "pos": [
        517.7193603515625,
        378
      ],
      "size": [
        336.888427734375,
        103.97698974609375
      ],
      "flags": {
        "collapsed": true
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 118
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 52,
      "type": "VAEEncode",
      "pos": [
        719.3842163085938,
        468.98004150390625
      ],
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 130
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 77
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            76,
            116
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.41",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": [],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 43,
      "type": "VAELoader",
      "pos": [
        462.1297302246094,
        613.5346069335938
      ],
      "size": [
        234.05543518066406,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            62,
            77
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        516.5379638671875,
        190
      ],
      "size": [
        339.84503173828125,
        123.01304626464844
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 117
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            74
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "Change camera angle to a high-angle shot, looking down at the subject from above while keeping the subject's position, scale, and pose identical. Preserve the lighting and overall style."
      ]
    },
    {
      "id": 67,
      "type": "MarkdownNote",
      "pos": [
        76.41261291503906,
        -64.39566040039062
      ],
      "size": [
        368.5166931152344,
        248.5858612060547
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n- [flux1-dev-kontext_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/tree/main/split_files/diffusion_models)\n- [clip_l.safetensors](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors)\n- [t5xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn_scaled.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Omnigen2_ComfyUI_repackaged/tree/main/split_files/vae)\n\n```\n📂ComfyUI/\n└── 📂models/\n    ├── 📂clip/\n    │   ├── clip_l.safetensors\n    │   └── t5xxl_fp8_e4m3fn.safetensors\n    ├── 📂diffusion_models/\n    │   └── flux1-dev-kontext_fp8_scaled.safetensors\n    └── 📂vae/\n         └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 53,
      "type": "LoadImage",
      "pos": [
        153.1424102783203,
        468.98004150390625
      ],
      "size": [
        277.51690673828125,
        455.66180419921875
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            129
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.41",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pexels-photo-28266413.jpg",
        "image"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 76,
      "type": "FluxKontextImageScale",
      "pos": [
        481.14453125,
        468.98004150390625
      ],
      "size": [
        194.9458984375,
        26
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 129
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            130
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.43",
        "Node name for S&R": "FluxKontextImageScale"
      },
      "widgets_values": [],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 31,
      "type": "KSampler",
      "pos": [
        1355.8184814453125,
        194.12423706054688
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 128
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 115
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 99
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 116
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        20,
        1,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 74,
      "type": "UNETLoader",
      "pos": [
        1056.5750732421875,
        38.088253021240234
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            128
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.43",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Flux.1\\flux1-dev-kontext_fp8_scaled.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1700.061767578125,
        194.12423706054688
      ],
      "size": [
        165.4577742454253,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 52
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 62
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            131
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.39",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 77,
      "type": "SaveImage",
      "pos": [
        1896.4101908313817,
        194.12423706054688
      ],
      "size": [
        413,
        603.9366300000002
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 131
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      52,
      31,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      62,
      43,
      0,
      8,
      1,
      "VAE"
    ],
    [
      74,
      6,
      0,
      51,
      0,
      "CONDITIONING"
    ],
    [
      76,
      52,
      0,
      51,
      1,
      "LATENT"
    ],
    [
      77,
      43,
      0,
      52,
      1,
      "VAE"
    ],
    [
      99,
      33,
      0,
      31,
      2,
      "CONDITIONING"
    ],
    [
      114,
      51,
      0,
      68,
      0,
      "CONDITIONING"
    ],
    [
      115,
      68,
      0,
      31,
      1,
      "CONDITIONING"
    ],
    [
      116,
      52,
      0,
      31,
      3,
      "LATENT"
    ],
    [
      117,
      69,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      118,
      69,
      0,
      33,
      0,
      "CLIP"
    ],
    [
      128,
      74,
      0,
      31,
      0,
      "MODEL"
    ],
    [
      129,
      53,
      0,
      76,
      0,
      "IMAGE"
    ],
    [
      130,
      76,
      0,
      52,
      0,
      "IMAGE"
    ],
    [
      131,
      8,
      0,
      77,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        23.587387084960938,
        164.39566040039062
      ]
    },
    "frontendVersion": "1.35.0",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟪 读取 flux1-dev-kontext_fp8_scaled.safetensors。
🟩 在 FluxKontextImageScale 节点，将输入图像调整为面向 Kontext 的分辨率。
- 虽然有 Flux 推荐的分辨率，但会从中 自动选择纵横比相近的分辨率。
🟩 将调整后的图像转换为 latent，连接到 ReferenceLatent。

提示词的写法

基本上遵从官方的提示词指南。

FLUX.1 Kontext Prompting Guide

话虽如此，并没有什么特别的记法。以 “把〇〇做成△△” 的形式，用英语原样写想做的事情大体就能动。

如果连不想改变的地方都变了的时候（例：只想改变发型但连背景都变了），像下面这样明示“不希望改变的条件”。

e.g. Keep the person's pose, position, and size the same.

话虽如此，作为模型的性能，经常也有不怎么听从指令的情况。
还不可以要求太多。

能做的事

图像编辑

Change the hair to a messy blonde bob.

画风转换

This character is made out of Lego blocks.

对象除去

Remove the woman

文本置换

Replace [OPEN] with [FLUX]

Subject 转印

A photo of a girl who received a stuffed elephant as a Christmas present.

根据向导的位置指定

Add a sailing ship to the box position.

杂乱 Collage 的 Refine

进行将手动制作的拼贴图像溶入的编辑。

Transform the flat duck sticker into a realistic plush duck toy with the same blue hat and place it in the woman’s arms so she is naturally hugging it. Also turn the outlined pendant lamp into a realistic lamp, removing the white sticker edges and matching the scene’s lighting, color, and perspective.

也有以此能力为基础的 LoRA。

Place it Flux Kontext LoRA

Flux.1 Kontext

什么是 Flux.1 Kontext？

什么是指令基础图像编辑？

模型的下载

工作流（基本形）

提示词的写法

能做的事

图像编辑

画风转换

对象除去

文本置换

Subject 转印

根据向导的位置指定

杂乱 Collage 的 Refine

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！

Flux.1 Kontext

什么是 Flux.1 Kontext？

什么是指令基础图像编辑？

模型的下载

工作流（基本形）

提示词的写法

能做的事

图像编辑

画风转换

对象除去

文本置换

Subject 转印

根据向导的位置指定

杂乱 Collage 的 Refine

相关工作流