What is Virtual Try-On?
If ID Transfer is "Subject Transfer specialized for people," then Virtual Try-On (VTON) can be said to be "Subject Transfer specialized for clothes."
It is also called Virtual Try-On (VTON).
Especially when used for product images, consistency such as:
- Patterns and details do not change
- Naturally fitting the body shape and pose
becomes important.
LoRA
As with anything, the most reliable and flexible method is to create a LoRA of the clothes.
By combining it with inpainting, you can dress a specific person in the clothes.
catvton-flux
There are many models specialized for VTON tasks (changing clothes), but catvton-flux is a representative example.
The basic idea is the same as IC-LoRA / ACE++, using a side-by-side layout.

{
"last_node_id": 65,
"last_link_id": 147,
"nodes": [
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
153.81593322753906,
193.08474731445312
],
"size": [
397.89935302734375,
132.290771484375
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
81
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1600,
40
],
"size": [
190,
46
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 60
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
102
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 23,
"type": "CLIPTextEncode",
"pos": [
150.60000610351562,
0.6999998688697815
],
"size": [
397.89935302734375,
120.82927703857422
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 62
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
41
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"The pair of images highlights a clothing and its styling on a model, high resolution, 4K, 8K; [IMAGE1] Detailed product shot of a clothing [IMAGE2] The same cloth is worn by a model in a lifestyle setting."
]
},
{
"id": 52,
"type": "ImageToMask",
"pos": [
-160,
1050
],
"size": [
210,
58
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 115
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
118
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ImageToMask"
},
"widgets_values": [
"red"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 53,
"type": "PreviewImage",
"pos": [
-160,
1180
],
"size": [
210,
246
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 116
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 51,
"type": "segformer_b2_clothes",
"pos": [
-510,
1050
],
"size": [
315,
346
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 146
}
],
"outputs": [
{
"name": "mask_image",
"type": "IMAGE",
"links": [
115,
116
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "segformer_b2_clothes"
},
"widgets_values": [
false,
false,
false,
true,
false,
false,
false,
false,
false,
false,
false,
false,
false
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 56,
"type": "LoadImage",
"pos": [
-870,
1010
],
"size": [
290,
510
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
146,
147
],
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-3155565.jpg",
"image"
]
},
{
"id": 48,
"type": "LoraLoaderModelOnly",
"pos": [
866.4467163085938,
-173.85031127929688
],
"size": [
315,
82
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 107
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
108
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"Flux\\catvton-flux-lora-alpha.safetensors",
1
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 38,
"type": "InpaintModelConditioning",
"pos": [
900,
60
],
"size": [
311,
138
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 80
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 81
},
{
"name": "vae",
"type": "VAE",
"link": 82
},
{
"name": "pixels",
"type": "IMAGE",
"link": 109
},
{
"name": "mask",
"type": "MASK",
"link": 110
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
77
],
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
78
],
"slot_index": 1
},
{
"name": "latent",
"type": "LATENT",
"links": [
88
],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "InpaintModelConditioning"
},
"widgets_values": [
true
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 26,
"type": "FluxGuidance",
"pos": [
603.0430908203125,
3.384554386138916
],
"size": [
242.8545684814453,
58
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 41
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
80
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "FluxGuidance"
},
"widgets_values": [
30
]
},
{
"id": 49,
"type": "AddMaskForICLora",
"pos": [
526.6661376953125,
418.3485412597656
],
"size": [
330,
246
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "first_image",
"type": "IMAGE",
"link": 145
},
{
"name": "first_mask",
"type": "MASK",
"link": null,
"shape": 7
},
{
"name": "second_image",
"type": "IMAGE",
"link": 147,
"shape": 7
},
{
"name": "second_mask",
"type": "MASK",
"link": 143,
"shape": 7
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
109,
133
],
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": [
110,
134
],
"slot_index": 1
},
{
"name": "x_offset",
"type": "INT",
"links": null
},
{
"name": "y_offset",
"type": "INT",
"links": null
},
{
"name": "target_width",
"type": "INT",
"links": null
},
{
"name": "target_height",
"type": "INT",
"links": null
},
{
"name": "total_width",
"type": "INT",
"links": null
},
{
"name": "total_height",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "AddMaskForICLora"
},
"widgets_values": [
"auto",
1536,
"#FF0000"
]
},
{
"id": 34,
"type": "DualCLIPLoader",
"pos": [
-212.79994201660156,
106.50000762939453
],
"size": [
315,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
62,
63
]
}
],
"properties": {
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux"
]
},
{
"id": 32,
"type": "VAELoader",
"pos": [
597.4476928710938,
254.066162109375
],
"size": [
248.4499969482422,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
60,
82
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"FLUXvae.safetensors"
]
},
{
"id": 57,
"type": "InvertMask",
"pos": [
70,
1050
],
"size": [
140,
26
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 118
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
142
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "InvertMask"
},
"widgets_values": [],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 3,
"type": "KSampler",
"pos": [
1250,
40
],
"size": [
315,
262
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 108
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 77
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 78
},
{
"name": "latent_image",
"type": "LATENT",
"link": 88
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
7
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
30,
1,
"euler",
"normal",
1
]
},
{
"id": 55,
"type": "LoadImage",
"pos": [
-870,
430
],
"size": [
290,
498.96368408203125
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
145
],
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"example_garment_00035_00.jpg",
"image"
]
},
{
"id": 65,
"type": "GrowMask",
"pos": [
236.827392578125,
1050
],
"size": [
222.82362365722656,
82
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 142
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
143
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "GrowMask"
},
"widgets_values": [
20,
true
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "PreviewImage",
"pos": [
1827.5045166015625,
40
],
"size": [
550,
420
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 102
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 63,
"type": "PreviewImage",
"pos": [
1081.99609375,
481.18829345703125
],
"size": [
361.63531494140625,
259.43603515625
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 141
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 62,
"type": "JoinImageWithAlpha",
"pos": [
885.78173828125,
481.5660705566406
],
"size": [
176.39999389648438,
46
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 133
},
{
"name": "alpha",
"type": "MASK",
"link": 134
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
141
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "JoinImageWithAlpha"
},
"widgets_values": []
},
{
"id": 31,
"type": "UNETLoader",
"pos": [
517.320556640625,
-173.85031127929688
],
"size": [
311,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
107
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Flux\\flux1-fill-dev.safetensors",
"fp8_e4m3fn"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
7,
3,
0,
8,
0,
"LATENT"
],
[
41,
23,
0,
26,
0,
"CONDITIONING"
],
[
60,
32,
0,
8,
1,
"VAE"
],
[
62,
34,
0,
23,
0,
"CLIP"
],
[
63,
34,
0,
7,
0,
"CLIP"
],
[
77,
38,
0,
3,
1,
"CONDITIONING"
],
[
78,
38,
1,
3,
2,
"CONDITIONING"
],
[
80,
26,
0,
38,
0,
"CONDITIONING"
],
[
81,
7,
0,
38,
1,
"CONDITIONING"
],
[
82,
32,
0,
38,
2,
"VAE"
],
[
88,
38,
2,
3,
3,
"LATENT"
],
[
102,
8,
0,
45,
0,
"IMAGE"
],
[
107,
31,
0,
48,
0,
"MODEL"
],
[
108,
48,
0,
3,
0,
"MODEL"
],
[
109,
49,
0,
38,
3,
"IMAGE"
],
[
110,
49,
1,
38,
4,
"MASK"
],
[
115,
51,
0,
52,
0,
"IMAGE"
],
[
116,
51,
0,
53,
0,
"IMAGE"
],
[
118,
52,
0,
57,
0,
"MASK"
],
[
133,
49,
0,
62,
0,
"IMAGE"
],
[
134,
49,
1,
62,
1,
"MASK"
],
[
141,
62,
0,
63,
0,
"IMAGE"
],
[
142,
57,
0,
65,
0,
"MASK"
],
[
143,
65,
0,
49,
3,
"MASK"
],
[
145,
55,
0,
49,
0,
"IMAGE"
],
[
146,
56,
0,
51,
0,
"IMAGE"
],
[
147,
56,
0,
49,
2,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.2284597357367277,
"offset": [
-695.7444488005788,
165.3071399176139
]
}
},
"version": 0.4
}
- Left side: Person image
- Right side: Image of clothes you want them to wear + mask
The model generates an "image of the person on the left wearing the clothes on the right" while looking at both.
Instruction-Based Image Editing (Side-by-Side)
Instruction-Based Image Editing models that do not support multi-reference cannot originally do things like "bring elements of image A to image B."
However, just like with IC-LoRA / ACE++, if you use the side-by-side technique and a LoRA trained for this purpose, you can do something similar.

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 88,
"last_link_id": 144,
"nodes": [
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
517.7193603515625,
378
],
"size": [
336.888427734375,
103.97698974609375
],
"flags": {
"collapsed": true
},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 118
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
99
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 52,
"type": "VAEEncode",
"pos": [
719.3842163085938,
468.98004150390625
],
"size": [
140,
46
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 127
},
{
"name": "vae",
"type": "VAE",
"link": 77
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
76,
116
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 77,
"type": "PreviewImage",
"pos": [
744.884033203125,
699.7015991210938
],
"size": [
406.20001220703125,
348.29998779296875
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 134
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.46",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 68,
"type": "FluxGuidance",
"pos": [
1115.2528076171875,
190
],
"size": [
211.3223114013672,
58
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 114
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
115
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "FluxGuidance"
},
"widgets_values": [
30
]
},
{
"id": 75,
"type": "ImageStitch",
"pos": [
165.8333740234375,
468.98004150390625
],
"size": [
270,
150
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "image1",
"type": "IMAGE",
"link": 131
},
{
"name": "image2",
"shape": 7,
"type": "IMAGE",
"link": 132
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
130
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.46",
"Node name for S&R": "ImageStitch"
},
"widgets_values": [
"right",
true,
0,
"white"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1700.061767578125,
194.12423706054688
],
"size": [
140,
46
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
135
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 78,
"type": "SaveImage",
"pos": [
1887.16259765625,
192.1141815185547
],
"size": [
733.415771484375,
629.1437377929688
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 135
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.56"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 73,
"type": "FluxKontextImageScale",
"pos": [
483.7315673828125,
468.98004150390625
],
"size": [
187.75448608398438,
26
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 130
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
127,
134
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.43",
"Node name for S&R": "FluxKontextImageScale"
},
"widgets_values": []
},
{
"id": 69,
"type": "DualCLIPLoader",
"pos": [
174.92930603027344,
261.95574951171875
],
"size": [
270,
130
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
117,
118
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux",
"default"
]
},
{
"id": 43,
"type": "VAELoader",
"pos": [
462.1297302246094,
613.5346069335938
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62,
77
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 51,
"type": "ReferenceLatent",
"pos": [
893.0234375,
190
],
"size": [
197.712890625,
46
],
"flags": {
"collapsed": false
},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 74
},
{
"name": "latent",
"shape": 7,
"type": "LATENT",
"link": 76
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ReferenceLatent"
},
"widgets_values": []
},
{
"id": 74,
"type": "LoraLoaderModelOnly",
"pos": [
1055.795166015625,
38.038578033447266
],
"size": [
261.9280090332031,
82
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 128
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
129
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.46",
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"Flux.1 Kontext\\03\\Cross-Image Try-On Flux Kontext_v0.2.safetensors",
1
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 87,
"type": "MarkdownNote",
"pos": [
-454.7641906738281,
988.8134155273438
],
"size": [
275.29998779296875,
88
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"Load the reference image for the clothing."
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 88,
"type": "MarkdownNote",
"pos": [
-144.00006103515625,
1108.5716552734375
],
"size": [
275.29998779296875,
88
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"Load the image of the person whose clothes will be changed."
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
516.5379638671875,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 117
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
74
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"Change all clothes on the right to match the left."
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 67,
"type": "MarkdownNote",
"pos": [
307.58013916015625,
-186.25665283203125
],
"size": [
406.0926818847656,
282.7126159667969
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- [flux1-kontext-dev.gguf](https://huggingface.co/QuantStack/FLUX.1-Kontext-dev-GGUF/tree/main)\n- [Cross-Image Try-On Flux Kontext_v0.2.safetensors](https://huggingface.co/nomadoor/crossimage-tryon-fluxkontext/blob/main/Cross-Image%20Try-On%20Flux%20Kontext_v0.2.safetensors)\n- [clip_l.safetensors](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors)\n- [t5xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn_scaled.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Omnigen2_ComfyUI_repackaged/tree/main/split_files/vae)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂clip/\n │ ├── clip_l.safetensors\n │ └── t5xxl_fp8_e4m3fn.safetensors\n ├── 📂loras/\n │ └── Cross-Image Try-On Flux Kontext_v0.2.safetensors\n ├── 📂unet/\n │ └── flux1-kontext-dev.gguf\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 71,
"type": "UnetLoaderGGUF",
"pos": [
744.73046875,
38.038578033447266
],
"size": [
270,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
128
]
}
],
"properties": {
"cnr_id": "ComfyUI-GGUF",
"ver": "b3ec875a68d94b758914fd48d30571d953bb7a54",
"Node name for S&R": "UnetLoaderGGUF"
},
"widgets_values": [
"FLUX_gguf\\flux1-kontext-dev-Q4_K_M.gguf"
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
1355.8184814453125,
194.12423706054688
],
"size": [
315,
262
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 129
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 115
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 99
},
{
"name": "latent_image",
"type": "LATENT",
"link": 116
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
603008709320546,
"randomize",
20,
1,
"euler",
"simple",
1
]
},
{
"id": 53,
"type": "LoadImage",
"pos": [
-450.5650329589844,
468.98004150390625
],
"size": [
277.51690673828125,
455.66180419921875
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
131
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pexels-photo-33163411.jpg",
"image"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 76,
"type": "LoadImage",
"pos": [
-142.36582946777344,
583.6825561523438
],
"size": [
277.51690673828125,
455.66180419921875
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
132
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"woman.png",
"image"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
52,
31,
0,
8,
0,
"LATENT"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
74,
6,
0,
51,
0,
"CONDITIONING"
],
[
76,
52,
0,
51,
1,
"LATENT"
],
[
77,
43,
0,
52,
1,
"VAE"
],
[
99,
33,
0,
31,
2,
"CONDITIONING"
],
[
114,
51,
0,
68,
0,
"CONDITIONING"
],
[
115,
68,
0,
31,
1,
"CONDITIONING"
],
[
116,
52,
0,
31,
3,
"LATENT"
],
[
117,
69,
0,
6,
0,
"CLIP"
],
[
118,
69,
0,
33,
0,
"CLIP"
],
[
127,
73,
0,
52,
0,
"IMAGE"
],
[
128,
71,
0,
74,
0,
"MODEL"
],
[
129,
74,
0,
31,
0,
"MODEL"
],
[
130,
75,
0,
73,
0,
"IMAGE"
],
[
131,
53,
0,
75,
0,
"IMAGE"
],
[
132,
76,
0,
75,
1,
"IMAGE"
],
[
134,
73,
0,
77,
0,
"IMAGE"
],
[
135,
8,
0,
78,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650709,
"offset": [
240.85639710021658,
74.53751462207948
]
},
"frontendVersion": "1.26.8",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- nomadoor/crossimage-tryon-fluxkontext
- Left side: Person image
- Right side: Image of clothes you want them to wear + mask
The model generates an "image of the person on the left wearing the clothes on the right" while looking at both.
I brought up the LoRA I made as a reference because I wanted to brag, but a day later, a much higher performance LoRA for Qwen-Image-Edit was announced ☹️ Clothes Try On (Clothing Transfer) - Qwen Edit
The biggest advantage of using instruction-based image editing is that masks become unnecessary.
For example, if you want to dress a person in a mini skirt in jeans, with normal VTON, you have to mask not only the mini skirt part but also the leg part that will become jeans. It is very difficult to automatically generate a mask for the area combining these two.
In contrast, instruction-based image editing does not require masks, so you can change clothes without worrying about such troublesome things.
Instruction-Based Image Editing (Multi-Reference)
If it is an instruction-based image editing model that supports multi-reference, it is easy.
Just pass the person you want to change clothes and the clothes to separate slots and give instructions like "dress this person in these clothes," and you can change clothes.