Subject転送とは?
正式には「Subject-Driven Image Generation」と呼ばれるタスクです。
Subjectは、人に限らず、キャラクター・ぬいぐるみ・特定の犬・マスコット・フィギュアなど、「この画像に写っている"それ"」全般を指します。
Subject転送は、参照画像に写っている同じSubjectが含まれる画像を生成するための技術です。
ID(人物の顔・本人性)を転送する技術は、Subject転送に含まれますが、特別視されており、ID転送に特化した技術も多いため別で扱います。
LoRA
言わずもがな、モデルが描けないものを学習して描けるようにする方法です。
登場時から現在に至るまで、柔軟性・安定性においてこれに勝るものはありません。
大きな問題は学習が必要ということ。気軽さはありません。
image2prompt
もっとも素朴なやり方として、「画像からキャプションを生成し、そのキャプションでtext2imageを回す」という手法があります。
そんな原始的な方法で?と思うかもしれませんが、参照画像を完璧に説明できるMLLMと、その説明を完璧に再現する画像生成モデルがあれば原理的には可能です。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 59,
"last_link_id": 104,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
492,
394.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
250.6552734375,
-167.9522705078125
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Z-Image\\z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
586.9390258789062,
-167.9522705078125
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
492,
175
],
"size": [
330.26959228515625,
142.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
},
{
"name": "text",
"type": "STRING",
"widget": {
"name": "text"
},
"link": 102
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
898.7548217773438,
510.4016418457031
],
"size": [
315,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
120.78603616968121,
342.5854112036154
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
]
},
{
"id": 58,
"type": "LoadImage",
"pos": [
-226.4552737849208,
-0.14719505696391977
],
"size": [
298.080078125,
431
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
103
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"viewfilename=ComfyUI_temp_mohpt_00009_.png",
"image"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-136.07276600955444,
-300.4671673650518
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
]
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
597.2695922851562,
482.05751390379885
],
"size": [
237,
106
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1442.0747874475098,
188.22962825237536
],
"size": [
510.21224258223606,
595.4940064248622
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 57,
"type": "GeminiNode",
"pos": [
131.26602226763393,
0.08407710682253366
],
"size": [
273,
266
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"shape": 7,
"type": "IMAGE",
"link": 103
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "video",
"shape": 7,
"type": "VIDEO",
"link": null
},
{
"name": "files",
"shape": 7,
"type": "GEMINI_INPUT_FILES",
"link": null
}
],
"outputs": [
{
"name": "STRING",
"type": "STRING",
"links": [
102,
104
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "GeminiNode"
},
"widgets_values": [
"You are a vision-language model that converts one input image into a single English prompt for a text-to-image generator. Your goal is to let the generator recreate the image as exactly as possible. Use only objective, non-emotional language (no “beautiful”, “cool”, “dramatic”, etc.). Be as quantitative as you reasonably can: counts of objects, relative positions (left/right/top/bottom/center/foreground/background), relative sizes, viewpoint (eye-level, low angle, top-down, etc.), and approximate aspect ratio (e.g., horizontal 16:9, square 1:1, vertical 9:16). Always describe: main subjects (appearance, pose, clothing, accessories, relative positions), background and environment (indoor/outdoor, location type, important objects), lighting (type and direction), colors and tone (dominant colors, dark/bright, high/low contrast), and overall style (photo, anime, 3D render, flat illustration, etc.), plus any visible text or logos and where they appear. If the image looks photographic or like a realistic render, also mention a simple shot type (close-up, medium shot, full body, wide shot), rough focal length (e.g., 35mm, 50mm), and depth of field (shallow or deep) when this is clearly implied. Do not refer to “the input image” or give instructions; just state the desired image content. Output exactly one line: a single comma-separated English prompt, with no headings, bullet points, or explanation.",
"gemini-3-pro-preview",
12345,
"fixed",
"Status: Completed\nPrice: $0.0196\nTime elapsed: 17s"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 59,
"type": "PreviewAny",
"pos": [
492,
1.5167060232018699
],
"size": [
330,
111
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "source",
"type": "*",
"link": 104
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75",
"Node name for S&R": "PreviewAny"
},
"widgets_values": []
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
102,
57,
0,
6,
1,
"STRING"
],
[
103,
58,
0,
57,
0,
"IMAGE"
],
[
104,
57,
0,
59,
0,
"*"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1000000000000005,
"offset": [
326.4552737849208,
400.4671673650518
]
},
"frontendVersion": "1.34.2",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
最近のモデルの性能は、それを可能にしつつあります。「一番安上がりなSubject転送もどき」として、一度試してみる価値はあります。
SeeCoder / UnCLIP系
image2promptは「画像→テキスト→埋め込み」という二段階でしたが、SeeCoderやUnCLIP系では、「画像→埋め込み」を直接行います。
画像からテキスト埋め込みに相当するベクトルを作り、それをtext encoderの代わりに使います。

{
"last_node_id": 59,
"last_link_id": 102,
"nodes": [
{
"id": 3,
"type": "KSampler",
"pos": [
1230,
180
],
"size": {
"0": 278.28021240234375,
"1": 556.486328125
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 86
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 102
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 84
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
1007766865747969,
"randomize",
20,
8,
"dpmpp_2m",
"karras",
1
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1530,
190
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 90,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
9
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 57,
"type": "VAELoader",
"pos": [
1532,
290
],
"size": {
"0": 315,
"1": 58
},
"flags": {
"collapsed": true
},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
90
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
0,
240
],
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
86
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
87,
88
],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"😎-v1.x\\AuroraONE_F16.safetensors"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
430,
430
],
"size": [
409.83612060546875,
83.2110595703125
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 88
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
6
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"(worst quality:1.2),text,3d,outline,blush"
],
"color": "#223",
"bgcolor": "#335"
},
{
"id": 54,
"type": "EmptyLatentImage",
"pos": [
827,
614
],
"size": {
"0": 315,
"1": 106
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
84
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
768,
1
]
},
{
"id": 13,
"type": "CLIPTextEncode",
"pos": [
430,
300
],
"size": {
"0": 412.5623779296875,
"1": 76
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 87
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
99
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Trigger word)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"high quality,high detailed,anime illustration,shot from side"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 58,
"type": "ConditioningCombine",
"pos": [
882,
271
],
"size": [
228.39999389648438,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "conditioning_1",
"type": "CONDITIONING",
"link": 98
},
{
"name": "conditioning_2",
"type": "CONDITIONING",
"link": 99
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
102
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ConditioningCombine"
},
"color": "#322",
"bgcolor": "#533"
},
{
"id": 55,
"type": "SEECoderImageEncode",
"pos": [
551,
105
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 85,
"slot_index": 0
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
98
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "SEECoderImageEncode"
},
"widgets_values": [
"seecoder-anime-v1-0.safetensors"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 56,
"type": "LoadImage",
"pos": [
295,
-220
],
"size": [
210,
389.91945068359314
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
85
],
"shape": 3
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"apple.png",
"image"
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1783,
190
],
"size": [
441.322519450684,
711.7099524414066
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"properties": {},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
84,
54,
0,
3,
3,
"LATENT"
],
[
85,
56,
0,
55,
0,
"IMAGE"
],
[
86,
4,
0,
3,
0,
"MODEL"
],
[
87,
4,
1,
13,
0,
"CLIP"
],
[
88,
4,
1,
7,
0,
"CLIP"
],
[
90,
57,
0,
8,
1,
"VAE"
],
[
98,
55,
0,
58,
0,
"CONDITIONING"
],
[
99,
13,
0,
58,
1,
"CONDITIONING"
],
[
102,
58,
0,
3,
1,
"CONDITIONING"
]
],
"groups": [],
"config": {},
"extra": {},
"version": 0.4
}
image2promptよりも「テキスト化」での情報ロスが少ない一方で、テキストとしての編集ができないため、使い勝手は劣ります。
IP-Adapter
「学習なしでSubject転送をやる」方法として、実務で最初に実用レベルに乗った技術です。
IP-Adapterは、既存のtext2imageモデルに「画像からの条件」を差し込むためのアダプタです。ControlNetに次ぐ代表的なアダプタとして広く使われていました。
参照画像から特徴ベクトルを抽出し、その特徴をUNetの中(Cross-Attention周辺など)に注入して生成画像に反映させます。テキストプロンプトとも同時に使えるので、「Subjectは画像で指定」「シーンやスタイルはテキストで指定」という使い分けができます。
IC-LoRA / ACE++
Fluxをはじめとする DiT系モデルは、潜在能力として「一貫性のある画像を作る」ことができます。
この性質を利用したSubject転送がIC-LoRA / ACE++です。

{
"id": "68ee8198-d33d-48ba-a3f6-65bf5c84d6e4",
"revision": 0,
"last_node_id": 26,
"last_link_id": 34,
"nodes": [
{
"id": 11,
"type": "UnetLoaderGGUF",
"pos": [
610,
40
],
"size": [
315,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
21
]
}
],
"properties": {
"cnr_id": "ComfyUI-GGUF",
"ver": "bc5223b0e37e053dbec2ea5e5f52c2fd4b8f712a",
"Node name for S&R": "UnetLoaderGGUF"
},
"widgets_values": [
"FLUX_gguf\\flux1-fill-dev-Q4_K_S.gguf"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 15,
"type": "VAELoader",
"pos": [
660,
410
],
"size": [
248.4499969482422,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
18,
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"FLUXvae.safetensors"
]
},
{
"id": 20,
"type": "VAEDecode",
"pos": [
1660,
188.83277893066406
],
"size": [
190,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 22
},
{
"name": "vae",
"type": "VAE",
"link": 23
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
27
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "VAEDecode"
}
},
{
"id": 12,
"type": "LoadImage",
"pos": [
296.1838684082031,
566.498291015625
],
"size": [
290,
498.96368408203125
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
24
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-15169599.jpg",
"image",
""
]
},
{
"id": 17,
"type": "ACEPlusLoraConditioning",
"pos": [
968.0706787109375,
210.35354614257812
],
"size": [
315,
138
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 16
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 17
},
{
"name": "vae",
"type": "VAE",
"link": 18
},
{
"name": "pixels",
"type": "IMAGE",
"link": 19
},
{
"name": "mask",
"type": "MASK",
"link": 20
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
13
]
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
14
]
},
{
"name": "latent",
"type": "LATENT",
"links": [
15
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraConditioning"
},
"widgets_values": [
false
],
"color": "#2a363b",
"bgcolor": "#3f5159"
},
{
"id": 23,
"type": "PreviewImage",
"pos": [
2140,
190
],
"size": [
590,
580
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 31
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
988.9389038085938,
571.0610961914062
],
"size": [
435.3353271484375,
324.3360290527344
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 32
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "PreviewImage"
},
"widgets_values": [
""
]
},
{
"id": 21,
"type": "ACEPlusLoraProcessor",
"pos": [
630,
570
],
"size": [
315,
234
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "reference_image",
"shape": 7,
"type": "IMAGE",
"link": 24
},
{
"name": "edit_image",
"shape": 7,
"type": "IMAGE",
"link": null
},
{
"name": "edit_mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
19,
32
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
20
]
},
{
"name": "OUT_H",
"type": "INT",
"links": [
29
]
},
{
"name": "OUT_W",
"type": "INT",
"links": [
28
]
},
{
"name": "SLICE_W",
"type": "INT",
"links": [
30
]
}
],
"properties": {
"Node name for S&R": "ACEPlusLoraProcessor"
},
"widgets_values": [
true,
1024,
1024,
"repainting",
3072
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 25,
"type": "CLIPTextEncode",
"pos": [
260,
170
],
"size": [
357.0466003417969,
137.17037963867188
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 33
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
10
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A photograph of a woman wearing a yellow sweater, taken in front of a café in the UK, with a blurred background, intended for a magazine cover."
]
},
{
"id": 13,
"type": "FluxGuidance",
"pos": [
645.9932250976562,
176.34109497070312
],
"size": [
242.8545684814453,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
16
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "FluxGuidance"
},
"widgets_values": [
30
]
},
{
"id": 10,
"type": "DualCLIPLoader",
"pos": [
-97.66555786132812,
274.1638488769531
],
"size": [
315,
130
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
11,
33
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux",
"default"
]
},
{
"id": 14,
"type": "CLIPTextEncode",
"pos": [
264.6689147949219,
366.498291015625
],
"size": [
397.89935302734375,
132.290771484375
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
17
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 16,
"type": "KSampler",
"pos": [
1314.6689453125,
188.83277893066406
],
"size": [
315,
262
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 12
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 13
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 14
},
{
"name": "latent_image",
"type": "LATENT",
"link": 15
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
22
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
1,
"euler",
"normal",
1
]
},
{
"id": 22,
"type": "ImageCrop",
"pos": [
1891.829345703125,
190
],
"size": [
210,
130
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 27
},
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 28
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 29
},
{
"name": "x",
"type": "INT",
"widget": {
"name": "x"
},
"link": 30
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
31
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "ImageCrop"
},
"widgets_values": [
512,
512,
0,
0
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 18,
"type": "LoraLoaderModelOnly",
"pos": [
960,
40
],
"size": [
315,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 21
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
12
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.27",
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"ACE_Plus\\comfyui_portrait_lora64.safetensors",
1
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
10,
25,
0,
13,
0,
"CONDITIONING"
],
[
11,
10,
0,
14,
0,
"CLIP"
],
[
12,
18,
0,
16,
0,
"MODEL"
],
[
13,
17,
0,
16,
1,
"CONDITIONING"
],
[
14,
17,
1,
16,
2,
"CONDITIONING"
],
[
15,
17,
2,
16,
3,
"LATENT"
],
[
16,
13,
0,
17,
0,
"CONDITIONING"
],
[
17,
14,
0,
17,
1,
"CONDITIONING"
],
[
18,
15,
0,
17,
2,
"VAE"
],
[
19,
21,
0,
17,
3,
"IMAGE"
],
[
20,
21,
1,
17,
4,
"MASK"
],
[
21,
11,
0,
18,
0,
"MODEL"
],
[
22,
16,
0,
20,
0,
"LATENT"
],
[
23,
15,
0,
20,
1,
"VAE"
],
[
24,
12,
0,
21,
0,
"IMAGE"
],
[
27,
20,
0,
22,
0,
"IMAGE"
],
[
28,
21,
3,
22,
1,
"INT"
],
[
29,
21,
2,
22,
2,
"INT"
],
[
30,
21,
4,
22,
3,
"INT"
],
[
31,
22,
0,
23,
0,
"IMAGE"
],
[
32,
21,
0,
24,
0,
"IMAGE"
],
[
33,
10,
0,
25,
0,
"CLIP"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.4836049022304428,
"offset": [
121.19217889705396,
180.49827241415346
]
},
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
画像キャンバスの左側に参照画像(Subjectを含む)を配置し、右側を全てマスクにして生成(inpainting)させます。モデルは左側の情報を見ながら右側を埋めるため、「左側と同じSubjectを使って新しい画像を生成する」ことができます。
指示ベース画像編集モデル
「指示ベース画像編集モデル」も、Subject転送に使えます。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 125,
"last_link_id": 323,
"nodes": [
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
634.9767456054688,
-1.8326886892318726
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 282
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
123
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
3.1000000000000005
]
},
{
"id": 63,
"type": "VAEEncode",
"pos": [
714.6403198242188,
673.7313842773438
],
"size": [
140,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 239
},
{
"name": "vae",
"type": "VAE",
"link": 115
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
112
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 112,
"type": "CLIPLoader",
"pos": [
75.53079223632812,
277.016357421875
],
"size": [
270,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
290,
291
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_2.5_vl_7b_fp8_scaled.safetensors",
"qwen_image",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
107.53079223632812,
446.7167663574219
],
"size": [
238,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76,
115,
292,
293
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"qwen_image_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 114,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
419.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 291
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 293
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 295
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 320
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
315
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
""
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 111,
"type": "UNETLoader",
"pos": [
330.1968994140625,
-1.8326886892318726
],
"size": [
276.62274169921875,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
282
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Qwen-Image\\qwen_image_edit_2509_fp8_e4m3fn.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 82,
"type": "ImageScaleToTotalPixels",
"pos": [
-224.63221740722656,
668.4074096679688
],
"size": [
270,
82
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 275
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
244
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "ImageScaleToTotalPixels"
},
"widgets_values": [
"nearest-exact",
1
]
},
{
"id": 97,
"type": "SaveImage",
"pos": [
1495.48046875,
143.6978759765625
],
"size": [
506.0589904785156,
566.5868530273438
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 254
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-84.94583892822266,
-171.1671905517578
],
"size": [
386.9856262207031,
251.33447265625
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [qwen_image_edit_2509_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_edit_2509_fp8_e4m3fn.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae)\n\n\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── qwen_image_edit_2509_fp8_e4m3fn.safetensors\n ├── 📂text_encoders/\n │ └── qwen_2.5_vl_7b_fp8.safetensors\n └── 📂vae/\n └── wan_2.1_vae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 99,
"type": "LoadImage",
"pos": [
-522.9654541015625,
668.4074096679688
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
275
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-33109412 (1).jpg",
"image"
]
},
{
"id": 124,
"type": "LoadImage",
"pos": [
79.30519104003906,
1079.8746337890625
],
"size": [
268.17022705078125,
414.46728515625
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
320,
321
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.51",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-32490940.jpg",
"image"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1293.939697265625,
143.6978759765625
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
254
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 83,
"type": "ImageResizeKJv2",
"pos": [
75.53079223632812,
668.4074096679688
],
"size": [
270,
336
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 244
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
239,
294,
295
]
},
{
"name": "width",
"type": "INT",
"links": null
},
{
"name": "height",
"type": "INT",
"links": null
},
{
"name": "mask",
"type": "MASK",
"links": []
}
],
"properties": {
"cnr_id": "comfyui-kjnodes",
"ver": "e2ce0843d1183aea86ce6a1617426f492dcdc802",
"Node name for S&R": "ImageResizeKJv2"
},
"widgets_values": [
0,
0,
"nearest-exact",
"crop",
"0, 0, 0",
"center",
8,
"cpu"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
933.5941772460938,
143.6978759765625
],
"size": [
315,
262
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 123
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 314
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 315
},
{
"name": "latent_image",
"type": "LATENT",
"link": 112
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
2.5,
"res_multistep",
"simple",
1
]
},
{
"id": 113,
"type": "TextEncodeQwenImageEditPlus",
"pos": [
454.6401672363281,
163.63690185546875
],
"size": [
400,
200
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 290
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": 292
},
{
"name": "image1",
"shape": 7,
"type": "IMAGE",
"link": 294
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 321
},
{
"name": "image3",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
314
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.59",
"Node name for S&R": "TextEncodeQwenImageEditPlus"
},
"widgets_values": [
"Please change the male's outfit in image1 to match the male's outfit in image2."
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
112,
63,
0,
3,
3,
"LATENT"
],
[
115,
39,
0,
63,
1,
"VAE"
],
[
123,
54,
0,
3,
0,
"MODEL"
],
[
239,
83,
0,
63,
0,
"IMAGE"
],
[
244,
82,
0,
83,
0,
"IMAGE"
],
[
254,
8,
0,
97,
0,
"IMAGE"
],
[
275,
99,
0,
82,
0,
"IMAGE"
],
[
282,
111,
0,
54,
0,
"MODEL"
],
[
290,
112,
0,
113,
0,
"CLIP"
],
[
291,
112,
0,
114,
0,
"CLIP"
],
[
292,
39,
0,
113,
1,
"VAE"
],
[
293,
39,
0,
114,
1,
"VAE"
],
[
294,
83,
0,
113,
2,
"IMAGE"
],
[
295,
83,
0,
114,
2,
"IMAGE"
],
[
314,
113,
0,
3,
1,
"CONDITIONING"
],
[
315,
114,
0,
3,
2,
"CONDITIONING"
],
[
320,
124,
0,
114,
3,
"IMAGE"
],
[
321,
124,
0,
113,
3,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
622.9654541015625,
271.1671905517578
]
},
"frontendVersion": "1.28.1",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
これらのモデルは、「この犬を別の背景に置いて」「この人物を森の中に配置して」のようなテキスト指示で画像を編集できます。
また、複数の参照画像に対応しているものなら、「画像Aの人物の服装」を「画像Bの人物の服装」に置き換えるといったことができちゃいます。