What is Subject Transfer?
Officially, it is a task called "Subject-Driven Image Generation."
Subject refers not only to people but also to characters, stuffed animals, specific dogs, mascots, figures, etc., generally "that thing shown in this image." Subject Transfer is a technology for generating images containing the same Subject shown in the reference image.
Technology to transfer ID (person's face/identity) is included in Subject Transfer, but it is treated specially, and there are many technologies specialized for ID Transfer, so it is treated separately.
LoRA
Needless to say, it is a method to learn and enable the model to draw things it cannot draw.
From its appearance to the present, nothing beats this in flexibility and stability.
The big problem is that training is required. There is no casualness.
image2prompt
As the most primitive method, there is a method of "generating a caption from an image and running text2image with that caption."
You might think, "With such a primitive method?" but it is theoretically possible if there is an MLLM that can perfectly describe the reference image and an image generation model that can perfectly reproduce that description.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 104,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
492,
394.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
250.6552734375,
-167.9522705078125
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
586.9390258789062,
-167.9522705078125
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
492,
175
],
"size": [
330.26959228515625,
142.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 102
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
898.7548217773438,
510.4016418457031
],
"size": [
315,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
120.78603616968121,
342.5854112036154
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-226.4552737849208,
-0.14719505696391977
],
"size": [
298.080078125,
431
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
103
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"viewfilename=ComfyUI_temp_mohpt_00009_.png",
"image"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-136.07276600955444,
-300.4671673650518
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
482.05751390379885
],
"size": [
237,
106
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1442.0747874475098,
188.22962825237536
],
"size": [
510.21224258223606,
595.4940064248622
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 57,
"type": "GeminiNode",
"pos": [
131.26602226763393,
0.08407710682253366
],
"size": [
273,
266
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"shape": 7,
"type": "IMAGE",
"link": 103
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "video",
"shape": 7,
"type": "VIDEO",
"link": null
},
{
"name": "files",
"shape": 7,
"type": "GEMINI_INPUT_FILES",
"link": null
}
],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
102,
104
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "GeminiNode"
},
"widgets_values": [
"You are a vision-language model that converts one input image into a single English prompt for a text-to-image generator. Your goal is to let the generator recreate the image as exactly as possible. Use only objective, non-emotional language (no “beautiful”, “cool”, “dramatic”, etc.). Be as quantitative as you reasonably can: counts of objects, relative positions (left/right/top/bottom/center/foreground/background), relative sizes, viewpoint (eye-level, low angle, top-down, etc.), and approximate aspect ratio (e.g., horizontal 16:9, square 1:1, vertical 9:16). Always describe: main subjects (appearance, pose, clothing, accessories, relative positions), background and environment (indoor/outdoor, location type, important objects), lighting (type and direction), colors and tone (dominant colors, dark/bright, high/low contrast), and overall style (photo, anime, 3D render, flat illustration, etc.), plus any visible text or logos and where they appear. If the image looks photographic or like a realistic render, also mention a simple shot type (close-up, medium shot, full body, wide shot), rough focal length (e.g., 35mm, 50mm), and depth of field (shallow or deep) when this is clearly implied. Do not refer to “the input image” or give instructions; just state the desired image content. Output exactly one line: a single comma-separated English prompt, with no headings, bullet points, or explanation.",
"gemini-3-pro-preview",
12345,
"fixed",
"Status: Completed\nPrice: $0.0196\nTime elapsed: 17s"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 59,
"type": "PreviewAny",
"pos": [
492,
1.5167060232018699
],
"size": [
330,
111
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "source",
"type": "*",
"link": 104
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewAny"
},
"widgets_values": []
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
102,
57,
0,
6,
1,
"STRING"
],
[
103,
58,
0,
57,
0,
"IMAGE"
],
[
104,
57,
0,
59,
0,
"*"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1000000000000005,
"offset": [
326.4552737849208,
400.4671673650518
]
},
"frontendVersion": "1.34.2",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
The performance of recent models is making this possible. It is worth trying once as the "cheapest pseudo-Subject Transfer."
SeeCoder / UnCLIP Family
image2prompt was a two-step process of "Image → Text → Embedding," but SeeCoder and UnCLIP systems perform "Image → Embedding" directly.
It creates a vector corresponding to text embedding from the image and uses it instead of the text encoder.

{
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 3,
"type": "KSampler",
"pos": [
1230,
180
],
"size": {
"0": 278.28021240234375,
"1": 556.486328125
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 86
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 102
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 84
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
1007766865747969,
"randomize",
20,
8,
"dpmpp_2m",
"karras",
1
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1530,
190
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 90,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
9
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 57,
"type": "VAELoader",
"pos": [
1532,
290
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
90
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
0,
240
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
86
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
87,
88
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"😎-v1.x\\AuroraONE_F16.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
430,
430
],
"size": [
409.83612060546875,
83.2110595703125
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 88
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"(worst quality:1.2),text,3d,outline,blush"
],
"color": "#223",
"bgcolor": "#335"
},
{
"id": 54,
"type": "EmptyLatentImage",
"pos": [
827,
614
],
"size": {
"0": 315,
"1": 106
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
84
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
768,
1
]
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
430,
300
],
"size": {
"0": 412.5623779296875,
"1": 76
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 87
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
99
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Trigger word)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"high quality,high detailed,anime illustration,shot from side"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 58,
"type": "ConditioningCombine",
"pos": [
882,
271
],
"size": [
228.39999389648438,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "conditioning_1",
"type": "CONDITIONING",
"link": 98
},
{
"name": "conditioning_2",
"type": "CONDITIONING",
"link": 99
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
102
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ConditioningCombine"
},
"color": "#322",
"bgcolor": "#533"
},
{
"id": 55,
"type": "SEECoderImageEncode",
"pos": [
551,
105
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 85,
"slot_index": 0
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
98
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "SEECoderImageEncode"
},
"widgets_values": [
"seecoder-anime-v1-0.safetensors"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 56,
"type": "LoadImage",
"pos": [
295,
-220
],
"size": [
210,
389.91945068359314
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
85
],
"shape": 3
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"apple.png",
"image"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1783,
190
],
"size": [
441.322519450684,
711.7099524414066
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"properties": {},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
84,
54,
0,
3,
3,
"LATENT"
],
[
85,
56,
0,
55,
0,
"IMAGE"
],
[
86,
4,
0,
3,
0,
"MODEL"
],
[
87,
4,
1,
13,
0,
"CLIP"
],
[
88,
4,
1,
7,
0,
"CLIP"
],
[
90,
57,
0,
8,
1,
"VAE"
],
[
98,
55,
0,
58,
0,
"CONDITIONING"
],
[
99,
13,
0,
58,
1,
"CONDITIONING"
],
[
102,
58,
0,
3,
1,
"CONDITIONING"
]
],
"groups": [],
"config": {},
"extra": {},
"version": 0.4
}
While there is less information loss in "textualization" than image2prompt, usability is inferior because it cannot be edited as text.
IP-Adapter
It is the technology that first reached a practical level in business as a method of "doing Subject Transfer without training."
IP-Adapter is an adapter for inserting "conditions from images" into existing text2image models. It was widely used as a representative adapter following ControlNet.
It extracts feature vectors from the reference image and injects those features into the UNet (around Cross-Attention, etc.) to reflect them in the generated image. Since it can be used simultaneously with text prompts, you can use "Specify Subject by image" and "Specify scene and style by text" separately.
IC-LoRA / ACE++
DiT-based models including Flux have the potential to "create consistent images."
Subject Transfer using this property is IC-LoRA / ACE++.

{
"id": "68ee8198-d33d-48ba-a3f6-65bf5c84d6e4",
"revision": 0,
"last_node_id": 26,
"last_link_id": 34,
"nodes": [
{
"id": 11,
"type": "UnetLoaderGGUF",
"pos": [
610,
40
],
"size": [
315,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
21
]
}
],
"properties": {
"cnr_id": "ComfyUI-GGUF",
"ver": "bc5223b0e37e053dbec2ea5e5f52c2fd4b8f712a",
"Node name for S&R": "UnetLoaderGGUF"
},
"widgets_values": [
"FLUX_gguf\\flux1-fill-dev-Q4_K_S.gguf"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 15,
"type": "VAELoader",
"pos": [
660,
410
],
"size": [
248.4499969482422,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
18,
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"FLUXvae.safetensors"
]
},
{
"id": 20,
"type": "VAEDecode",
"pos": [
1660,
188.83277893066406
],
"size": [
190,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 22
},
{
"name": "vae",
"type": "VAE",
"link": 23
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
27
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAEDecode"
}
},
{
"id": 12,
"type": "LoadImage",
"pos": [
296.1838684082031,
566.498291015625
],
"size": [
290,
498.96368408203125
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
24
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-15169599.jpg",
"image",
""
]
},
{
"id": 17,
"type": "ACEPlusLoraConditioning",
"pos": [
968.0706787109375,
210.35354614257812
],
"size": [
315,
138
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 16
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 17
},
{
"name": "vae",
"type": "VAE",
"link": 18
},
{
"name": "pixels",
"type": "IMAGE",
"link": 19
},
{
"name": "mask",
"type": "MASK",
"link": 20
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
13
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
14
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
15
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraConditioning"
},
"widgets_values": [
false
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 23,
"type": "PreviewImage",
"pos": [
2140,
190
],
"size": [
590,
580
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 31
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
988.9389038085938,
571.0610961914062
],
"size": [
435.3353271484375,
324.3360290527344
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 32
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 21,
"type": "ACEPlusLoraProcessor",
"pos": [
630,
570
],
"size": [
315,
234
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 24
},
{
"name": "edit_image",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "edit_mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
19,
32
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
20
]
},
{
"name": "OUT_H",
"type": "INT",
"links": [
29
]
},
{
"name": "OUT_W",
"type": "INT",
"links": [
28
]
},
{
"name": "SLICE_W",
"type": "INT",
"links": [
30
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraProcessor"
},
"widgets_values": [
true,
1024,
1024,
"repainting",
3072
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 25,
"type": "CLIPTextEncode",
"pos": [
260,
170
],
"size": [
357.0466003417969,
137.17037963867188
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 33
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
10
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A photograph of a woman wearing a yellow sweater, taken in front of a café in the UK, with a blurred background, intended for a magazine cover."
]
},
{
"id": 13,
"type": "FluxGuidance",
"pos": [
645.9932250976562,
176.34109497070312
],
"size": [
242.8545684814453,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
16
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "FluxGuidance"
},
"widgets_values": [
30
]
},
{
"id": 10,
"type": "DualCLIPLoader",
"pos": [
-97.66555786132812,
274.1638488769531
],
"size": [
315,
130
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
11,
33
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux",
"default"
]
},
{
"id": 14,
"type": "CLIPTextEncode",
"pos": [
264.6689147949219,
366.498291015625
],
"size": [
397.89935302734375,
132.290771484375
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
17
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 16,
"type": "KSampler",
"pos": [
1314.6689453125,
188.83277893066406
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 12
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 13
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 14
},
{
"name": "latent_image",
"type": "LATENT",
"link": 15
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
1,
"euler",
"normal",
1
]
},
{
"id": 22,
"type": "ImageCrop",
"pos": [
1891.829345703125,
190
],
"size": [
210,
130
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 27
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 28
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 29
},
{
"name": "x",
"type": "INT",
"widget": {
"name": "x"
},
"link": 30
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "ImageCrop"
},
"widgets_values": [
512,
512,
0,
0
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 18,
"type": "LoraLoaderModelOnly",
"pos": [
960,
40
],
"size": [
315,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 21
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
12
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"ACE_Plus\\comfyui_portrait_lora64.safetensors",
1
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
10,
25,
0,
13,
0,
"CONDITIONING"
],
[
11,
10,
0,
14,
0,
"CLIP"
],
[
12,
18,
0,
16,
0,
"MODEL"
],
[
13,
17,
0,
16,
1,
"CONDITIONING"
],
[
14,
17,
1,
16,
2,
"CONDITIONING"
],
[
15,
17,
2,
16,
3,
"LATENT"
],
[
16,
13,
0,
17,
0,
"CONDITIONING"
],
[
17,
14,
0,
17,
1,
"CONDITIONING"
],
[
18,
15,
0,
17,
2,
"VAE"
],
[
19,
21,
0,
17,
3,
"IMAGE"
],
[
20,
21,
1,
17,
4,
"MASK"
],
[
21,
11,
0,
18,
0,
"MODEL"
],
[
22,
16,
0,
20,
0,
"LATENT"
],
[
23,
15,
0,
20,
1,
"VAE"
],
[
24,
12,
0,
21,
0,
"IMAGE"
],
[
27,
20,
0,
22,
0,
"IMAGE"
],
[
28,
21,
3,
22,
1,
"INT"
],
[
29,
21,
2,
22,
2,
"INT"
],
[
30,
21,
4,
22,
3,
"INT"
],
[
31,
22,
0,
23,
0,
"IMAGE"
],
[
32,
21,
0,
24,
0,
"IMAGE"
],
[
33,
10,
0,
25,
0,
"CLIP"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4836049022304428,
"offset": [
121.19217889705396,
180.49827241415346
]
},
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Place the reference image (including the Subject) on the left side of the image canvas, mask the entire right side, and generate (inpaint). Since the model fills the right side while looking at the information on the left, it can "generate a new image using the same Subject as the left side."
Instruction-Based Image Editing Models
"Instruction-Based Image Editing Models" can also be used for Subject Transfer.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 125,
"last_link_id": 323,
"nodes": [
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
634.9767456054688,
-1.8326886892318726
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 282
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
123
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1000000000000005
]
},
{
"id": 63,
"type": "VAEEncode",
"pos": [
714.6403198242188,
673.7313842773438
],
"size": [
140,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 239
},
{
"name": "vae",
"type": "VAE",
"link": 115
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 112,
"type": "CLIPLoader",
"pos": [
75.53079223632812,
277.016357421875
],
"size": [
270,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
290,
291
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_2.5_vl_7b_fp8_scaled.safetensors",
"qwen_image",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
107.53079223632812,
446.7167663574219
],
"size": [
238,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76,
115,
292,
293
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"qwen_image_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 114,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
419.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 291
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 293
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 295
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 320
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
315
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
""
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 111,
"type": "UNETLoader",
"pos": [
330.1968994140625,
-1.8326886892318726
],
"size": [
276.62274169921875,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
282
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Qwen-Image\\qwen_image_edit_2509_fp8_e4m3fn.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 82,
"type": "ImageScaleToTotalPixels",
"pos": [
-224.63221740722656,
668.4074096679688
],
"size": [
270,
82
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 275
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
244
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "ImageScaleToTotalPixels"
},
"widgets_values": [
"nearest-exact",
1
]
},
{
"id": 97,
"type": "SaveImage",
"pos": [
1495.48046875,
143.6978759765625
],
"size": [
506.0589904785156,
566.5868530273438
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 254
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-84.94583892822266,
-171.1671905517578
],
"size": [
386.9856262207031,
251.33447265625
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [qwen_image_edit_2509_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_edit_2509_fp8_e4m3fn.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae)\n\n\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── qwen_image_edit_2509_fp8_e4m3fn.safetensors\n ├── 📂text_encoders/\n │ └── qwen_2.5_vl_7b_fp8.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 99,
"type": "LoadImage",
"pos": [
-522.9654541015625,
668.4074096679688
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
275
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-33109412 (1).jpg",
"image"
]
},
{
"id": 124,
"type": "LoadImage",
"pos": [
79.30519104003906,
1079.8746337890625
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
320,
321
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-32490940.jpg",
"image"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1293.939697265625,
143.6978759765625
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
254
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 83,
"type": "ImageResizeKJv2",
"pos": [
75.53079223632812,
668.4074096679688
],
"size": [
270,
336
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 244
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
294,
295
]
},
{
"name": "width",
"type": "INT",
"links": null
},
{
"name": "height",
"type": "INT",
"links": null
},
{
"name": "mask",
"type": "MASK",
"links": []
}
],
"properties": {
"cnr_id": "comfyui-kjnodes",
"ver": "e2ce0843d1183aea86ce6a1617426f492dcdc802",
"Node name for S&R": "ImageResizeKJv2"
},
"widgets_values": [
0,
0,
"nearest-exact",
"crop",
"0, 0, 0",
"center",
8,
"cpu"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
933.5941772460938,
143.6978759765625
],
"size": [
315,
262
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 123
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 314
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 315
},
{
"name": "latent_image",
"type": "LATENT",
"link": 112
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
2.5,
"res_multistep",
"simple",
1
]
},
{
"id": 113,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
163.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 290
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 292
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 294
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 321
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
314
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
"Please change the male's outfit in image1 to match the male's outfit in image2."
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
112,
63,
0,
3,
3,
"LATENT"
],
[
115,
39,
0,
63,
1,
"VAE"
],
[
123,
54,
0,
3,
0,
"MODEL"
],
[
239,
83,
0,
63,
0,
"IMAGE"
],
[
244,
82,
0,
83,
0,
"IMAGE"
],
[
254,
8,
0,
97,
0,
"IMAGE"
],
[
275,
99,
0,
82,
0,
"IMAGE"
],
[
282,
111,
0,
54,
0,
"MODEL"
],
[
290,
112,
0,
113,
0,
"CLIP"
],
[
291,
112,
0,
114,
0,
"CLIP"
],
[
292,
39,
0,
113,
1,
"VAE"
],
[
293,
39,
0,
114,
1,
"VAE"
],
[
294,
83,
0,
113,
2,
"IMAGE"
],
[
295,
83,
0,
114,
2,
"IMAGE"
],
[
314,
113,
0,
3,
1,
"CONDITIONING"
],
[
315,
114,
0,
3,
2,
"CONDITIONING"
],
[
320,
124,
0,
114,
3,
"IMAGE"
],
[
321,
124,
0,
113,
3,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
622.9654541015625,
271.1671905517578
]
},
"frontendVersion": "1.28.1",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
These models can edit images with text instructions like "put this dog in a different background" or "place this person in the forest."
Also, if it supports multiple reference images, you can do things like replacing "the clothes of the person in image A" with "the clothes of the person in image B."