什么是 Subject 转移?
正式名称是“Subject-Driven Image Generation (基于主题驱动的图像生成)”的任务。
Subject 不限于人,角色、布偶、特定的狗、吉祥物、手办等,泛指“这张图像中出现的‘那个东西’”。 Subject 转移是用于生成包含参考图像中相同 Subject 的图像的技术。
转移 ID(人物的面部、本人特征)的技术虽然包含在 Subject 转移中,但因为被特别看待,且有很多专注于 ID 转移的技术,所以另行处理。
LoRA
不用说,这是学习模型无法绘制的东西,使其能够绘制的方法。
从登场到现在,在灵活性和稳定性方面无人能出其右。
最大的问题是 需要学习。并不轻松。
image2prompt
作为最朴素的方法,有“从图像生成描述,并用该描述运行 text2image”的手法。
你可能会想用这么原始的方法?但如果有能完美说明参考图像的 MLLM,和能完美重现该说明的图像生成模型的话,原理上是可能的。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 104,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
492,
394.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
250.6552734375,
-167.9522705078125
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
586.9390258789062,
-167.9522705078125
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
492,
175
],
"size": [
330.26959228515625,
142.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 102
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
898.7548217773438,
510.4016418457031
],
"size": [
315,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
120.78603616968121,
342.5854112036154
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-226.4552737849208,
-0.14719505696391977
],
"size": [
298.080078125,
431
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
103
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"viewfilename=ComfyUI_temp_mohpt_00009_.png",
"image"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-136.07276600955444,
-300.4671673650518
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
482.05751390379885
],
"size": [
237,
106
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1442.0747874475098,
188.22962825237536
],
"size": [
510.21224258223606,
595.4940064248622
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 57,
"type": "GeminiNode",
"pos": [
131.26602226763393,
0.08407710682253366
],
"size": [
273,
266
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"shape": 7,
"type": "IMAGE",
"link": 103
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "video",
"shape": 7,
"type": "VIDEO",
"link": null
},
{
"name": "files",
"shape": 7,
"type": "GEMINI_INPUT_FILES",
"link": null
}
],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
102,
104
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "GeminiNode"
},
"widgets_values": [
"You are a vision-language model that converts one input image into a single English prompt for a text-to-image generator. Your goal is to let the generator recreate the image as exactly as possible. Use only objective, non-emotional language (no “beautiful”, “cool”, “dramatic”, etc.). Be as quantitative as you reasonably can: counts of objects, relative positions (left/right/top/bottom/center/foreground/background), relative sizes, viewpoint (eye-level, low angle, top-down, etc.), and approximate aspect ratio (e.g., horizontal 16:9, square 1:1, vertical 9:16). Always describe: main subjects (appearance, pose, clothing, accessories, relative positions), background and environment (indoor/outdoor, location type, important objects), lighting (type and direction), colors and tone (dominant colors, dark/bright, high/low contrast), and overall style (photo, anime, 3D render, flat illustration, etc.), plus any visible text or logos and where they appear. If the image looks photographic or like a realistic render, also mention a simple shot type (close-up, medium shot, full body, wide shot), rough focal length (e.g., 35mm, 50mm), and depth of field (shallow or deep) when this is clearly implied. Do not refer to “the input image” or give instructions; just state the desired image content. Output exactly one line: a single comma-separated English prompt, with no headings, bullet points, or explanation.",
"gemini-3-pro-preview",
12345,
"fixed",
"Status: Completed\nPrice: $0.0196\nTime elapsed: 17s"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 59,
"type": "PreviewAny",
"pos": [
492,
1.5167060232018699
],
"size": [
330,
111
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "source",
"type": "*",
"link": 104
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewAny"
},
"widgets_values": []
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
102,
57,
0,
6,
1,
"STRING"
],
[
103,
58,
0,
57,
0,
"IMAGE"
],
[
104,
57,
0,
59,
0,
"*"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1000000000000005,
"offset": [
326.4552737849208,
400.4671673650518
]
},
"frontendVersion": "1.34.2",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
最近模型的性能正在使之成为可能。作为“最廉价的仿 Subject 转移”,值得试一次。
SeeCoder / UnCLIP 系
image2prompt 是“图像 → 文本 → 嵌入”这两个阶段,而在 SeeCoder 或 UnCLIP 系中,直接进行“图像 → 嵌入”。
从图像制作相当于文本嵌入的向量,并将其代替 text encoder 使用。

{
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 3,
"type": "KSampler",
"pos": [
1230,
180
],
"size": {
"0": 278.28021240234375,
"1": 556.486328125
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 86
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 102
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 84
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
1007766865747969,
"randomize",
20,
8,
"dpmpp_2m",
"karras",
1
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1530,
190
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 90,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
9
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 57,
"type": "VAELoader",
"pos": [
1532,
290
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
90
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
0,
240
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
86
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
87,
88
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"😎-v1.x\\AuroraONE_F16.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
430,
430
],
"size": [
409.83612060546875,
83.2110595703125
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 88
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"(worst quality:1.2),text,3d,outline,blush"
],
"color": "#223",
"bgcolor": "#335"
},
{
"id": 54,
"type": "EmptyLatentImage",
"pos": [
827,
614
],
"size": {
"0": 315,
"1": 106
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
84
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
768,
1
]
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
430,
300
],
"size": {
"0": 412.5623779296875,
"1": 76
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 87
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
99
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Trigger word)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"high quality,high detailed,anime illustration,shot from side"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 58,
"type": "ConditioningCombine",
"pos": [
882,
271
],
"size": [
228.39999389648438,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "conditioning_1",
"type": "CONDITIONING",
"link": 98
},
{
"name": "conditioning_2",
"type": "CONDITIONING",
"link": 99
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
102
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ConditioningCombine"
},
"color": "#322",
"bgcolor": "#533"
},
{
"id": 55,
"type": "SEECoderImageEncode",
"pos": [
551,
105
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 85,
"slot_index": 0
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
98
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "SEECoderImageEncode"
},
"widgets_values": [
"seecoder-anime-v1-0.safetensors"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 56,
"type": "LoadImage",
"pos": [
295,
-220
],
"size": [
210,
389.91945068359314
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
85
],
"shape": 3
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"apple.png",
"image"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1783,
190
],
"size": [
441.322519450684,
711.7099524414066
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"properties": {},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
84,
54,
0,
3,
3,
"LATENT"
],
[
85,
56,
0,
55,
0,
"IMAGE"
],
[
86,
4,
0,
3,
0,
"MODEL"
],
[
87,
4,
1,
13,
0,
"CLIP"
],
[
88,
4,
1,
7,
0,
"CLIP"
],
[
90,
57,
0,
8,
1,
"VAE"
],
[
98,
55,
0,
58,
0,
"CONDITIONING"
],
[
99,
13,
0,
58,
1,
"CONDITIONING"
],
[
102,
58,
0,
3,
1,
"CONDITIONING"
]
],
"groups": [],
"config": {},
"extra": {},
"version": 0.4
}
虽然比起 image2prompt 在“文本化”中的信息损失更少,但由于无法进行文本编辑,所以易用性较差。
IP-Adapter
作为“无需学习进行 Subject 转移”的方法,这是在实务中最早达到实用水平的技术。
IP-Adapter 是用于向现有的 text2image 模型插入“来自图像的条件”的适配器。作为仅次于 ControlNet 的代表性适配器被广泛使用。
从参考图像中提取特征向量,将该特征注入 UNet 内部(Cross-Attention 周围等)以反映到生成图像中。因为可以与文本提示词同时使用,所以可以区分使用“Subject 用图像指定”、“场景或风格用文本指定”。
IC-LoRA / ACE++
以 Flux 为首的 DiT 系模型,作为潜在能力具有“制作具有一致性的图像”的能力。
利用这种性质的 Subject 转移就是 IC-LoRA / ACE++。

{
"id": "68ee8198-d33d-48ba-a3f6-65bf5c84d6e4",
"revision": 0,
"last_node_id": 26,
"last_link_id": 34,
"nodes": [
{
"id": 11,
"type": "UnetLoaderGGUF",
"pos": [
610,
40
],
"size": [
315,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
21
]
}
],
"properties": {
"cnr_id": "ComfyUI-GGUF",
"ver": "bc5223b0e37e053dbec2ea5e5f52c2fd4b8f712a",
"Node name for S&R": "UnetLoaderGGUF"
},
"widgets_values": [
"FLUX_gguf\\flux1-fill-dev-Q4_K_S.gguf"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 15,
"type": "VAELoader",
"pos": [
660,
410
],
"size": [
248.4499969482422,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
18,
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"FLUXvae.safetensors"
]
},
{
"id": 20,
"type": "VAEDecode",
"pos": [
1660,
188.83277893066406
],
"size": [
190,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 22
},
{
"name": "vae",
"type": "VAE",
"link": 23
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
27
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAEDecode"
}
},
{
"id": 12,
"type": "LoadImage",
"pos": [
296.1838684082031,
566.498291015625
],
"size": [
290,
498.96368408203125
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
24
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-15169599.jpg",
"image",
""
]
},
{
"id": 17,
"type": "ACEPlusLoraConditioning",
"pos": [
968.0706787109375,
210.35354614257812
],
"size": [
315,
138
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 16
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 17
},
{
"name": "vae",
"type": "VAE",
"link": 18
},
{
"name": "pixels",
"type": "IMAGE",
"link": 19
},
{
"name": "mask",
"type": "MASK",
"link": 20
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
13
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
14
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
15
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraConditioning"
},
"widgets_values": [
false
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 23,
"type": "PreviewImage",
"pos": [
2140,
190
],
"size": [
590,
580
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 31
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
988.9389038085938,
571.0610961914062
],
"size": [
435.3353271484375,
324.3360290527344
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 32
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 21,
"type": "ACEPlusLoraProcessor",
"pos": [
630,
570
],
"size": [
315,
234
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 24
},
{
"name": "edit_image",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "edit_mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
19,
32
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
20
]
},
{
"name": "OUT_H",
"type": "INT",
"links": [
29
]
},
{
"name": "OUT_W",
"type": "INT",
"links": [
28
]
},
{
"name": "SLICE_W",
"type": "INT",
"links": [
30
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraProcessor"
},
"widgets_values": [
true,
1024,
1024,
"repainting",
3072
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 25,
"type": "CLIPTextEncode",
"pos": [
260,
170
],
"size": [
357.0466003417969,
137.17037963867188
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 33
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
10
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A photograph of a woman wearing a yellow sweater, taken in front of a café in the UK, with a blurred background, intended for a magazine cover."
]
},
{
"id": 13,
"type": "FluxGuidance",
"pos": [
645.9932250976562,
176.34109497070312
],
"size": [
242.8545684814453,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
16
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "FluxGuidance"
},
"widgets_values": [
30
]
},
{
"id": 10,
"type": "DualCLIPLoader",
"pos": [
-97.66555786132812,
274.1638488769531
],
"size": [
315,
130
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
11,
33
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux",
"default"
]
},
{
"id": 14,
"type": "CLIPTextEncode",
"pos": [
264.6689147949219,
366.498291015625
],
"size": [
397.89935302734375,
132.290771484375
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
17
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 16,
"type": "KSampler",
"pos": [
1314.6689453125,
188.83277893066406
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 12
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 13
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 14
},
{
"name": "latent_image",
"type": "LATENT",
"link": 15
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
1,
"euler",
"normal",
1
]
},
{
"id": 22,
"type": "ImageCrop",
"pos": [
1891.829345703125,
190
],
"size": [
210,
130
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 27
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 28
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 29
},
{
"name": "x",
"type": "INT",
"widget": {
"name": "x"
},
"link": 30
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "ImageCrop"
},
"widgets_values": [
512,
512,
0,
0
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 18,
"type": "LoraLoaderModelOnly",
"pos": [
960,
40
],
"size": [
315,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 21
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
12
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"ACE_Plus\\comfyui_portrait_lora64.safetensors",
1
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
10,
25,
0,
13,
0,
"CONDITIONING"
],
[
11,
10,
0,
14,
0,
"CLIP"
],
[
12,
18,
0,
16,
0,
"MODEL"
],
[
13,
17,
0,
16,
1,
"CONDITIONING"
],
[
14,
17,
1,
16,
2,
"CONDITIONING"
],
[
15,
17,
2,
16,
3,
"LATENT"
],
[
16,
13,
0,
17,
0,
"CONDITIONING"
],
[
17,
14,
0,
17,
1,
"CONDITIONING"
],
[
18,
15,
0,
17,
2,
"VAE"
],
[
19,
21,
0,
17,
3,
"IMAGE"
],
[
20,
21,
1,
17,
4,
"MASK"
],
[
21,
11,
0,
18,
0,
"MODEL"
],
[
22,
16,
0,
20,
0,
"LATENT"
],
[
23,
15,
0,
20,
1,
"VAE"
],
[
24,
12,
0,
21,
0,
"IMAGE"
],
[
27,
20,
0,
22,
0,
"IMAGE"
],
[
28,
21,
3,
22,
1,
"INT"
],
[
29,
21,
2,
22,
2,
"INT"
],
[
30,
21,
4,
22,
3,
"INT"
],
[
31,
22,
0,
23,
0,
"IMAGE"
],
[
32,
21,
0,
24,
0,
"IMAGE"
],
[
33,
10,
0,
25,
0,
"CLIP"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4836049022304428,
"offset": [
121.19217889705396,
180.49827241415346
]
},
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
在图像画布的左侧配置参考图像(包含 Subject),将右侧全部遮罩并让其生成 (inpainting)。模型一边看着左侧的信息一边填充右侧,因此“可以使用与左侧相同的 Subject 生成新的图像”。
指示基图像编辑模型
“基于指令的图像编辑模型”也可以用于 Subject 转移。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 125,
"last_link_id": 323,
"nodes": [
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
634.9767456054688,
-1.8326886892318726
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 282
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
123
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1000000000000005
]
},
{
"id": 63,
"type": "VAEEncode",
"pos": [
714.6403198242188,
673.7313842773438
],
"size": [
140,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 239
},
{
"name": "vae",
"type": "VAE",
"link": 115
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 112,
"type": "CLIPLoader",
"pos": [
75.53079223632812,
277.016357421875
],
"size": [
270,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
290,
291
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_2.5_vl_7b_fp8_scaled.safetensors",
"qwen_image",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
107.53079223632812,
446.7167663574219
],
"size": [
238,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76,
115,
292,
293
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"qwen_image_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 114,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
419.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 291
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 293
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 295
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 320
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
315
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
""
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 111,
"type": "UNETLoader",
"pos": [
330.1968994140625,
-1.8326886892318726
],
"size": [
276.62274169921875,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
282
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Qwen-Image\\qwen_image_edit_2509_fp8_e4m3fn.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 82,
"type": "ImageScaleToTotalPixels",
"pos": [
-224.63221740722656,
668.4074096679688
],
"size": [
270,
82
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 275
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
244
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "ImageScaleToTotalPixels"
},
"widgets_values": [
"nearest-exact",
1
]
},
{
"id": 97,
"type": "SaveImage",
"pos": [
1495.48046875,
143.6978759765625
],
"size": [
506.0589904785156,
566.5868530273438
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 254
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-84.94583892822266,
-171.1671905517578
],
"size": [
386.9856262207031,
251.33447265625
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [qwen_image_edit_2509_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_edit_2509_fp8_e4m3fn.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae)\n\n\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── qwen_image_edit_2509_fp8_e4m3fn.safetensors\n ├── 📂text_encoders/\n │ └── qwen_2.5_vl_7b_fp8.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 99,
"type": "LoadImage",
"pos": [
-522.9654541015625,
668.4074096679688
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
275
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-33109412 (1).jpg",
"image"
]
},
{
"id": 124,
"type": "LoadImage",
"pos": [
79.30519104003906,
1079.8746337890625
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
320,
321
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-32490940.jpg",
"image"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1293.939697265625,
143.6978759765625
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
254
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 83,
"type": "ImageResizeKJv2",
"pos": [
75.53079223632812,
668.4074096679688
],
"size": [
270,
336
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 244
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
294,
295
]
},
{
"name": "width",
"type": "INT",
"links": null
},
{
"name": "height",
"type": "INT",
"links": null
},
{
"name": "mask",
"type": "MASK",
"links": []
}
],
"properties": {
"cnr_id": "comfyui-kjnodes",
"ver": "e2ce0843d1183aea86ce6a1617426f492dcdc802",
"Node name for S&R": "ImageResizeKJv2"
},
"widgets_values": [
0,
0,
"nearest-exact",
"crop",
"0, 0, 0",
"center",
8,
"cpu"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
933.5941772460938,
143.6978759765625
],
"size": [
315,
262
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 123
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 314
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 315
},
{
"name": "latent_image",
"type": "LATENT",
"link": 112
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
2.5,
"res_multistep",
"simple",
1
]
},
{
"id": 113,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
163.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 290
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 292
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 294
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 321
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
314
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
"Please change the male's outfit in image1 to match the male's outfit in image2."
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
112,
63,
0,
3,
3,
"LATENT"
],
[
115,
39,
0,
63,
1,
"VAE"
],
[
123,
54,
0,
3,
0,
"MODEL"
],
[
239,
83,
0,
63,
0,
"IMAGE"
],
[
244,
82,
0,
83,
0,
"IMAGE"
],
[
254,
8,
0,
97,
0,
"IMAGE"
],
[
275,
99,
0,
82,
0,
"IMAGE"
],
[
282,
111,
0,
54,
0,
"MODEL"
],
[
290,
112,
0,
113,
0,
"CLIP"
],
[
291,
112,
0,
114,
0,
"CLIP"
],
[
292,
39,
0,
113,
1,
"VAE"
],
[
293,
39,
0,
114,
1,
"VAE"
],
[
294,
83,
0,
113,
2,
"IMAGE"
],
[
295,
83,
0,
114,
2,
"IMAGE"
],
[
314,
113,
0,
3,
1,
"CONDITIONING"
],
[
315,
114,
0,
3,
2,
"CONDITIONING"
],
[
320,
124,
0,
114,
3,
"IMAGE"
],
[
321,
124,
0,
113,
3,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
622.9654541015625,
271.1671905517578
]
},
"frontendVersion": "1.28.1",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
这些模型可以通过“把这只狗放在别的背景里”、“把这个人配置在森林里”这样的文本指令来编辑图像。
此外,如果是支持多个参考图像的模型,甚至可以做到“将图像 A 中人物的服装”替换为“图像 B 中人物的服装”。