Subject Transfer

What is Subject Transfer?

Officially, it is a task called "Subject-Driven Image Generation."

Subject refers not only to people but also to characters, stuffed animals, specific dogs, mascots, figures, etc., generally "that thing shown in this image." Subject Transfer is a technology for generating images containing the same Subject shown in the reference image.

Technology to transfer ID (person's face/identity) is included in Subject Transfer, but it is treated specially, and there are many technologies specialized for ID Transfer, so it is treated separately.

LoRA

Needless to say, it is a method to learn and enable the model to draw things it cannot draw.

From its appearance to the present, nothing beats this in flexibility and stability.

The big problem is that training is required. There is no casualness.

image2prompt

As the most primitive method, there is a method of "generating a caption from an image and running text2image with that caption."

You might think, "With such a primitive method?" but it is theoretically possible if there is an MLLM that can perfectly describe the reference image and an image generation model that can perfectly reproduce that description.

Z-Image_Gemini-3.0.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 59,
  "last_link_id": 104,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        492,
        394.392333984375
      ],
      "size": [
        418.3189392089844,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        250.6552734375,
        -167.9522705078125
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Z-Image\\z_image_turbo_bf16.safetensors",
        "fp8_e4m3fn"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        12345,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        586.9390258789062,
        -167.9522705078125
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        492,
        175
      ],
      "size": [
        330.26959228515625,
        142.00363159179688
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        },
        {
          "name": "text",
          "type": "STRING",
          "widget": {
            "name": "text"
          },
          "link": 102
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        898.7548217773438,
        510.4016418457031
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "ae.safetensors"
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        120.78603616968121,
        342.5854112036154
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ]
    },
    {
      "id": 58,
      "type": "LoadImage",
      "pos": [
        -226.4552737849208,
        -0.14719505696391977
      ],
      "size": [
        298.080078125,
        431
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            103
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "viewfilename=ComfyUI_temp_mohpt_00009_.png",
        "image"
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -136.07276600955444,
        -300.4671673650518
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ]
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        597.2695922851562,
        482.05751390379885
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "EmptySD3LatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1442.0747874475098,
        188.22962825237536
      ],
      "size": [
        510.21224258223606,
        595.4940064248622
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 57,
      "type": "GeminiNode",
      "pos": [
        131.26602226763393,
        0.08407710682253366
      ],
      "size": [
        273,
        266
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "shape": 7,
          "type": "IMAGE",
          "link": 103
        },
        {
          "name": "audio",
          "shape": 7,
          "type": "AUDIO",
          "link": null
        },
        {
          "name": "video",
          "shape": 7,
          "type": "VIDEO",
          "link": null
        },
        {
          "name": "files",
          "shape": 7,
          "type": "GEMINI_INPUT_FILES",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "STRING",
          "type": "STRING",
          "links": [
            102,
            104
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "GeminiNode"
      },
      "widgets_values": [
        "You are a vision-language model that converts one input image into a single English prompt for a text-to-image generator. Your goal is to let the generator recreate the image as exactly as possible. Use only objective, non-emotional language (no “beautiful”, “cool”, “dramatic”, etc.). Be as quantitative as you reasonably can: counts of objects, relative positions (left/right/top/bottom/center/foreground/background), relative sizes, viewpoint (eye-level, low angle, top-down, etc.), and approximate aspect ratio (e.g., horizontal 16:9, square 1:1, vertical 9:16). Always describe: main subjects (appearance, pose, clothing, accessories, relative positions), background and environment (indoor/outdoor, location type, important objects), lighting (type and direction), colors and tone (dominant colors, dark/bright, high/low contrast), and overall style (photo, anime, 3D render, flat illustration, etc.), plus any visible text or logos and where they appear. If the image looks photographic or like a realistic render, also mention a simple shot type (close-up, medium shot, full body, wide shot), rough focal length (e.g., 35mm, 50mm), and depth of field (shallow or deep) when this is clearly implied. Do not refer to “the input image” or give instructions; just state the desired image content. Output exactly one line: a single comma-separated English prompt, with no headings, bullet points, or explanation.",
        "gemini-3-pro-preview",
        12345,
        "fixed",
        "Status: Completed\nPrice: $0.0196\nTime elapsed: 17s"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 59,
      "type": "PreviewAny",
      "pos": [
        492,
        1.5167060232018699
      ],
      "size": [
        330,
        111
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "source",
          "type": "*",
          "link": 104
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75",
        "Node name for S&R": "PreviewAny"
      },
      "widgets_values": []
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      102,
      57,
      0,
      6,
      1,
      "STRING"
    ],
    [
      103,
      58,
      0,
      57,
      0,
      "IMAGE"
    ],
    [
      104,
      57,
      0,
      59,
      0,
      "*"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1000000000000005,
      "offset": [
        326.4552737849208,
        400.4671673650518
      ]
    },
    "frontendVersion": "1.34.2",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

The performance of recent models is making this possible. It is worth trying once as the "cheapest pseudo-Subject Transfer."

SeeCoder / UnCLIP Family

image2prompt was a two-step process of "Image → Text → Embedding," but SeeCoder and UnCLIP systems perform "Image → Embedding" directly.

It creates a vector corresponding to text embedding from the image and uses it instead of the text encoder.

SeeCoder.json

{
  "last_node_id": 59,
  "last_link_id": 102,
  "nodes": [
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1230,
        180
      ],
      "size": {
        "0": 278.28021240234375,
        "1": 556.486328125
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 86
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 102
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 84
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1007766865747969,
        "randomize",
        20,
        8,
        "dpmpp_2m",
        "karras",
        1
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1530,
        190
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 90,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            9
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 57,
      "type": "VAELoader",
      "pos": [
        1532,
        290
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {
        "collapsed": true
      },
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            90
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        0,
        240
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            86
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            87,
            88
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "😎-v1.x\\AuroraONE_F16.safetensors"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        430,
        430
      ],
      "size": [
        409.83612060546875,
        83.2110595703125
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 88
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "(worst quality:1.2),text,3d,outline,blush"
      ],
      "color": "#223",
      "bgcolor": "#335"
    },
    {
      "id": 54,
      "type": "EmptyLatentImage",
      "pos": [
        827,
        614
      ],
      "size": {
        "0": 315,
        "1": 106
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            84
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        768,
        1
      ]
    },
    {
      "id": 13,
      "type": "CLIPTextEncode",
      "pos": [
        430,
        300
      ],
      "size": {
        "0": 412.5623779296875,
        "1": 76
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 87
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            99
          ],
          "slot_index": 0
        }
      ],
      "title": "CLIP Text Encode (Trigger word)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "high quality,high detailed,anime illustration,shot from side"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 58,
      "type": "ConditioningCombine",
      "pos": [
        882,
        271
      ],
      "size": [
        228.39999389648438,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "conditioning_1",
          "type": "CONDITIONING",
          "link": 98
        },
        {
          "name": "conditioning_2",
          "type": "CONDITIONING",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            102
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ConditioningCombine"
      },
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 55,
      "type": "SEECoderImageEncode",
      "pos": [
        551,
        105
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 85,
          "slot_index": 0
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            98
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "SEECoderImageEncode"
      },
      "widgets_values": [
        "seecoder-anime-v1-0.safetensors"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 56,
      "type": "LoadImage",
      "pos": [
        295,
        -220
      ],
      "size": [
        210,
        389.91945068359314
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            85
          ],
          "shape": 3
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "apple.png",
        "image"
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1783,
        190
      ],
      "size": [
        441.322519450684,
        711.7099524414066
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "properties": {},
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      84,
      54,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      85,
      56,
      0,
      55,
      0,
      "IMAGE"
    ],
    [
      86,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      87,
      4,
      1,
      13,
      0,
      "CLIP"
    ],
    [
      88,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      90,
      57,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      55,
      0,
      58,
      0,
      "CONDITIONING"
    ],
    [
      99,
      13,
      0,
      58,
      1,
      "CONDITIONING"
    ],
    [
      102,
      58,
      0,
      3,
      1,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {},
  "version": 0.4
}

While there is less information loss in "textualization" than image2prompt, usability is inferior because it cannot be edited as text.

IP-Adapter

It is the technology that first reached a practical level in business as a method of "doing Subject Transfer without training."

IP-Adapter is an adapter for inserting "conditions from images" into existing text2image models. It was widely used as a representative adapter following ControlNet.

It extracts feature vectors from the reference image and injects those features into the UNet (around Cross-Attention, etc.) to reflect them in the generated image. Since it can be used simultaneously with text prompts, you can use "Specify Subject by image" and "Specify scene and style by text" separately.

IC-LoRA / ACE++

DiT-based models including Flux have the potential to "create consistent images."

Subject Transfer using this property is IC-LoRA / ACE++.

ACE_Plus_portrait.json

{
  "id": "68ee8198-d33d-48ba-a3f6-65bf5c84d6e4",
  "revision": 0,
  "last_node_id": 26,
  "last_link_id": 34,
  "nodes": [
    {
      "id": 11,
      "type": "UnetLoaderGGUF",
      "pos": [
        610,
        40
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            21
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-GGUF",
        "ver": "bc5223b0e37e053dbec2ea5e5f52c2fd4b8f712a",
        "Node name for S&R": "UnetLoaderGGUF"
      },
      "widgets_values": [
        "FLUX_gguf\\flux1-fill-dev-Q4_K_S.gguf"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 15,
      "type": "VAELoader",
      "pos": [
        660,
        410
      ],
      "size": [
        248.4499969482422,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            18,
            23
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "FLUXvae.safetensors"
      ]
    },
    {
      "id": 20,
      "type": "VAEDecode",
      "pos": [
        1660,
        188.83277893066406
      ],
      "size": [
        190,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 22
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 23
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            27
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 12,
      "type": "LoadImage",
      "pos": [
        296.1838684082031,
        566.498291015625
      ],
      "size": [
        290,
        498.96368408203125
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            24
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pexels-photo-15169599.jpg",
        "image",
        ""
      ]
    },
    {
      "id": 17,
      "type": "ACEPlusLoraConditioning",
      "pos": [
        968.0706787109375,
        210.35354614257812
      ],
      "size": [
        315,
        138
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 16
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 17
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 18
        },
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 19
        },
        {
          "name": "mask",
          "type": "MASK",
          "link": 20
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            13
          ]
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            14
          ]
        },
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            15
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ACEPlusLoraConditioning"
      },
      "widgets_values": [
        false
      ],
      "color": "#2a363b",
      "bgcolor": "#3f5159"
    },
    {
      "id": 23,
      "type": "PreviewImage",
      "pos": [
        2140,
        190
      ],
      "size": [
        590,
        580
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 31
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 24,
      "type": "PreviewImage",
      "pos": [
        988.9389038085938,
        571.0610961914062
      ],
      "size": [
        435.3353271484375,
        324.3360290527344
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 32
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 21,
      "type": "ACEPlusLoraProcessor",
      "pos": [
        630,
        570
      ],
      "size": [
        315,
        234
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "reference_image",
          "shape": 7,
          "type": "IMAGE",
          "link": 24
        },
        {
          "name": "edit_image",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        },
        {
          "name": "edit_mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            19,
            32
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": [
            20
          ]
        },
        {
          "name": "OUT_H",
          "type": "INT",
          "links": [
            29
          ]
        },
        {
          "name": "OUT_W",
          "type": "INT",
          "links": [
            28
          ]
        },
        {
          "name": "SLICE_W",
          "type": "INT",
          "links": [
            30
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ACEPlusLoraProcessor"
      },
      "widgets_values": [
        true,
        1024,
        1024,
        "repainting",
        3072
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 25,
      "type": "CLIPTextEncode",
      "pos": [
        260,
        170
      ],
      "size": [
        357.0466003417969,
        137.17037963867188
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            10
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "A photograph of a woman wearing a yellow sweater, taken in front of a café in the UK, with a blurred background, intended for a magazine cover."
      ]
    },
    {
      "id": 13,
      "type": "FluxGuidance",
      "pos": [
        645.9932250976562,
        176.34109497070312
      ],
      "size": [
        242.8545684814453,
        58
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "conditioning",
          "type": "CONDITIONING",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            16
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "FluxGuidance"
      },
      "widgets_values": [
        30
      ]
    },
    {
      "id": 10,
      "type": "DualCLIPLoader",
      "pos": [
        -97.66555786132812,
        274.1638488769531
      ],
      "size": [
        315,
        130
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            11,
            33
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "DualCLIPLoader"
      },
      "widgets_values": [
        "clip_l.safetensors",
        "t5xxl_fp8_e4m3fn.safetensors",
        "flux",
        "default"
      ]
    },
    {
      "id": 14,
      "type": "CLIPTextEncode",
      "pos": [
        264.6689147949219,
        366.498291015625
      ],
      "size": [
        397.89935302734375,
        132.290771484375
      ],
      "flags": {
        "collapsed": true
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 11
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            17
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 16,
      "type": "KSampler",
      "pos": [
        1314.6689453125,
        188.83277893066406
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 12
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 13
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 14
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 15
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            22
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        30,
        1,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 22,
      "type": "ImageCrop",
      "pos": [
        1891.829345703125,
        190
      ],
      "size": [
        210,
        130
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 27
        },
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 28
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 29
        },
        {
          "name": "x",
          "type": "INT",
          "widget": {
            "name": "x"
          },
          "link": 30
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            31
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "ImageCrop"
      },
      "widgets_values": [
        512,
        512,
        0,
        0
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 18,
      "type": "LoraLoaderModelOnly",
      "pos": [
        960,
        40
      ],
      "size": [
        315,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            12
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.27",
        "Node name for S&R": "LoraLoaderModelOnly"
      },
      "widgets_values": [
        "ACE_Plus\\comfyui_portrait_lora64.safetensors",
        1
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      10,
      25,
      0,
      13,
      0,
      "CONDITIONING"
    ],
    [
      11,
      10,
      0,
      14,
      0,
      "CLIP"
    ],
    [
      12,
      18,
      0,
      16,
      0,
      "MODEL"
    ],
    [
      13,
      17,
      0,
      16,
      1,
      "CONDITIONING"
    ],
    [
      14,
      17,
      1,
      16,
      2,
      "CONDITIONING"
    ],
    [
      15,
      17,
      2,
      16,
      3,
      "LATENT"
    ],
    [
      16,
      13,
      0,
      17,
      0,
      "CONDITIONING"
    ],
    [
      17,
      14,
      0,
      17,
      1,
      "CONDITIONING"
    ],
    [
      18,
      15,
      0,
      17,
      2,
      "VAE"
    ],
    [
      19,
      21,
      0,
      17,
      3,
      "IMAGE"
    ],
    [
      20,
      21,
      1,
      17,
      4,
      "MASK"
    ],
    [
      21,
      11,
      0,
      18,
      0,
      "MODEL"
    ],
    [
      22,
      16,
      0,
      20,
      0,
      "LATENT"
    ],
    [
      23,
      15,
      0,
      20,
      1,
      "VAE"
    ],
    [
      24,
      12,
      0,
      21,
      0,
      "IMAGE"
    ],
    [
      27,
      20,
      0,
      22,
      0,
      "IMAGE"
    ],
    [
      28,
      21,
      3,
      22,
      1,
      "INT"
    ],
    [
      29,
      21,
      2,
      22,
      2,
      "INT"
    ],
    [
      30,
      21,
      4,
      22,
      3,
      "INT"
    ],
    [
      31,
      22,
      0,
      23,
      0,
      "IMAGE"
    ],
    [
      32,
      21,
      0,
      24,
      0,
      "IMAGE"
    ],
    [
      33,
      10,
      0,
      25,
      0,
      "CLIP"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.4836049022304428,
      "offset": [
        121.19217889705396,
        180.49827241415346
      ]
    },
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Place the reference image (including the Subject) on the left side of the image canvas, mask the entire right side, and generate (inpaint). Since the model fills the right side while looking at the information on the left, it can "generate a new image using the same Subject as the left side."

Instruction-Based Image Editing Models

"Instruction-Based Image Editing Models" can also be used for Subject Transfer.

Qwen-Image-Edit_2509_multi-ref.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 125,
  "last_link_id": 323,
  "nodes": [
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        634.9767456054688,
        -1.8326886892318726
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 282
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            123
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.49",
        "Node name for S&R": "ModelSamplingAuraFlow"
      },
      "widgets_values": [
        3.1000000000000005
      ]
    },
    {
      "id": 63,
      "type": "VAEEncode",
      "pos": [
        714.6403198242188,
        673.7313842773438
      ],
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 239
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 115
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            112
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 112,
      "type": "CLIPLoader",
      "pos": [
        75.53079223632812,
        277.016357421875
      ],
      "size": [
        270,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            290,
            291
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "qwen_2.5_vl_7b_fp8_scaled.safetensors",
        "qwen_image",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        107.53079223632812,
        446.7167663574219
      ],
      "size": [
        238,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76,
            115,
            292,
            293
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "qwen_image_vae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 114,
      "type": "TextEncodeQwenImageEditPlus",
      "pos": [
        454.6401672363281,
        419.63690185546875
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 291
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": 293
        },
        {
          "name": "image1",
          "shape": 7,
          "type": "IMAGE",
          "link": 295
        },
        {
          "name": "image2",
          "shape": 7,
          "type": "IMAGE",
          "link": 320
        },
        {
          "name": "image3",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            315
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.59",
        "Node name for S&R": "TextEncodeQwenImageEditPlus"
      },
      "widgets_values": [
        ""
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 111,
      "type": "UNETLoader",
      "pos": [
        330.1968994140625,
        -1.8326886892318726
      ],
      "size": [
        276.62274169921875,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            282
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "Qwen-Image\\qwen_image_edit_2509_fp8_e4m3fn.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 82,
      "type": "ImageScaleToTotalPixels",
      "pos": [
        -224.63221740722656,
        668.4074096679688
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 275
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            244
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "ImageScaleToTotalPixels"
      },
      "widgets_values": [
        "nearest-exact",
        1
      ]
    },
    {
      "id": 97,
      "type": "SaveImage",
      "pos": [
        1495.48046875,
        143.6978759765625
      ],
      "size": [
        506.0589904785156,
        566.5868530273438
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 254
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -84.94583892822266,
        -171.1671905517578
      ],
      "size": [
        386.9856262207031,
        251.33447265625
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [qwen_image_edit_2509_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_edit_2509_fp8_e4m3fn.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae)\n\n\n```\n📂ComfyUI/\n└──📂models/\n    ├── 📂diffusion_models/\n    │   └── qwen_image_edit_2509_fp8_e4m3fn.safetensors\n    ├── 📂text_encoders/\n    │   └── qwen_2.5_vl_7b_fp8.safetensors\n    └── 📂vae/\n         └── wan_2.1_vae.safetensors\n\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 99,
      "type": "LoadImage",
      "pos": [
        -522.9654541015625,
        668.4074096679688
      ],
      "size": [
        268.17022705078125,
        414.46728515625
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            275
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pexels-photo-33109412 (1).jpg",
        "image"
      ]
    },
    {
      "id": 124,
      "type": "LoadImage",
      "pos": [
        79.30519104003906,
        1079.8746337890625
      ],
      "size": [
        268.17022705078125,
        414.46728515625
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            320,
            321
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.51",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "pexels-photo-32490940.jpg",
        "image"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1293.939697265625,
        143.6978759765625
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            254
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 83,
      "type": "ImageResizeKJv2",
      "pos": [
        75.53079223632812,
        668.4074096679688
      ],
      "size": [
        270,
        336
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 244
        },
        {
          "name": "mask",
          "shape": 7,
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            239,
            294,
            295
          ]
        },
        {
          "name": "width",
          "type": "INT",
          "links": null
        },
        {
          "name": "height",
          "type": "INT",
          "links": null
        },
        {
          "name": "mask",
          "type": "MASK",
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfyui-kjnodes",
        "ver": "e2ce0843d1183aea86ce6a1617426f492dcdc802",
        "Node name for S&R": "ImageResizeKJv2"
      },
      "widgets_values": [
        0,
        0,
        "nearest-exact",
        "crop",
        "0, 0, 0",
        "center",
        8,
        "cpu"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        933.5941772460938,
        143.6978759765625
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 123
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 314
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 315
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 112
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        20,
        2.5,
        "res_multistep",
        "simple",
        1
      ]
    },
    {
      "id": 113,
      "type": "TextEncodeQwenImageEditPlus",
      "pos": [
        454.6401672363281,
        163.63690185546875
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 290
        },
        {
          "name": "vae",
          "shape": 7,
          "type": "VAE",
          "link": 292
        },
        {
          "name": "image1",
          "shape": 7,
          "type": "IMAGE",
          "link": 294
        },
        {
          "name": "image2",
          "shape": 7,
          "type": "IMAGE",
          "link": 321
        },
        {
          "name": "image3",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            314
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.59",
        "Node name for S&R": "TextEncodeQwenImageEditPlus"
      },
      "widgets_values": [
        "Please change the male's outfit in image1 to match the male's outfit in image2."
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      112,
      63,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      115,
      39,
      0,
      63,
      1,
      "VAE"
    ],
    [
      123,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      239,
      83,
      0,
      63,
      0,
      "IMAGE"
    ],
    [
      244,
      82,
      0,
      83,
      0,
      "IMAGE"
    ],
    [
      254,
      8,
      0,
      97,
      0,
      "IMAGE"
    ],
    [
      275,
      99,
      0,
      82,
      0,
      "IMAGE"
    ],
    [
      282,
      111,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      290,
      112,
      0,
      113,
      0,
      "CLIP"
    ],
    [
      291,
      112,
      0,
      114,
      0,
      "CLIP"
    ],
    [
      292,
      39,
      0,
      113,
      1,
      "VAE"
    ],
    [
      293,
      39,
      0,
      114,
      1,
      "VAE"
    ],
    [
      294,
      83,
      0,
      113,
      2,
      "IMAGE"
    ],
    [
      295,
      83,
      0,
      114,
      2,
      "IMAGE"
    ],
    [
      314,
      113,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      315,
      114,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      320,
      124,
      0,
      114,
      3,
      "IMAGE"
    ],
    [
      321,
      124,
      0,
      113,
      3,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.7513148009015777,
      "offset": [
        622.9654541015625,
        271.1671905517578
      ]
    },
    "frontendVersion": "1.28.1",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

These models can edit images with text instructions like "put this dog in a different background" or "place this person in the forest."

Also, if it supports multiple reference images, you can do things like replacing "the clothes of the person in image A" with "the clothes of the person in image B."

Subject Transfer

What is Subject Transfer?

LoRA

image2prompt

SeeCoder / UnCLIP Family

IP-Adapter

IC-LoRA / ACE++

Instruction-Based Image Editing Models

What is the JSON copy button?

This page has an issue!

Please explain more!

Thank you

Subject Transfer

What is Subject Transfer?

LoRA

image2prompt

SeeCoder / UnCLIP Family

IP-Adapter

IC-LoRA / ACE++

Instruction-Based Image Editing Models

Related workflows