Qwen-Image-Layeredとは?
入力した画像を、任意枚数の レイヤー に分解する拡散モデルです。
昨今流行りの画像編集ですが、指示とは関係ない部分が変化してしまうことがあります。 それならば、これまでデザイナーがやってきたのと同じようにレイヤー分けをして、対象のレイヤーだけ編集すればよいよね?という動機から生まれたタスクですね。
透過画像(RGBA)を扱う初の汎用的な手法であることも注目すべき点です。
これまでの手法だと、後処理が必要だったり、デコード時だけ特殊処理が必要だったりしましたが、より素直に「RGBA画像として扱う」やり方が取られています。
モデルのダウンロード
-
diffusion_models
-
text_encoders
-
vae
-
gguf(任意)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── qwen_image_layered_fp8mixed.safetensors
├── 📂text_encoders/
│ └── qwen_2.5_vl_7b_fp8_scaled.safetensors
├── 📂unet/
│ └── Qwen_Image_Layered-XXXX.gguf ← gguf を使う場合のみ
└── 📂vae/
└── qwen_image_layered_vae.safetensors
workflow

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 87,
"last_link_id": 148,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"qwen_2.5_vl_7b_fp8_scaled.safetensors",
"qwen_image",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 57,
"type": "ReferenceLatent",
"pos": [
864.2781462760086,
186
],
"size": [
204.134765625,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 103
},
{
"name": "latent",
"shape": 7,
"type": "LATENT",
"link": 110
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
104
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ReferenceLatent"
},
"widgets_values": []
},
{
"id": 58,
"type": "ReferenceLatent",
"pos": [
864.2781462760086,
405.392333984375
],
"size": [
204.134765625,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 102
},
{
"name": "latent",
"shape": 7,
"type": "LATENT",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
105
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ReferenceLatent"
},
"widgets_values": []
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
838.0823302359695,
42.94671378647985
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.49",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
1
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
415.9506530761719,
405.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
102
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, worst quality, blurry, ugly"
]
},
{
"id": 64,
"type": "ImageScaleToTotalPixels",
"pos": [
249.72535062227473,
718.9234534762987
],
"size": [
229.5555480957031,
106
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 115
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
113,
114
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ImageScaleToTotalPixels"
},
"widgets_values": [
"nearest-exact",
0.5,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
103
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"Intimate macro of a 33-year-old Brazilian dancer's feet en pointe, focus on toes and ballet shoe, studio lighting from above, shot on Sony FE 90mm f/2.8 macro, realistic worn shoe fabric texture, individual toe details visible through shoe, strained tendons, slight blood spot on shoe tip, dusty studio floor texture, ankle ribbons tied tight uphill"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
497.22367921939565,
42.94671378647985
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Qwen-Image\\qwen_image_layered_fp8mixed.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
223.02005587937379,
578.5647381339587
],
"size": [
256.26084283860405,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
116,
122
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"qwen_image_layered_vae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 61,
"type": "LoadImage",
"pos": [
-134.3561028852609,
718.9234534762987
],
"size": [
353.5766357421875,
459.44451904296864
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
115
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"pasted/image (113).png",
"image"
]
},
{
"id": 60,
"type": "VAEEncode",
"pos": [
512.2301683876235,
581.0691055180919
],
"size": [
171.72218557769065,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 113
},
{
"name": "vae",
"type": "VAE",
"link": 116
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
109,
110
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 63,
"type": "GetImageSize",
"pos": [
512.2301683876235,
718.9234534762987
],
"size": [
210,
136
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 114
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
117
]
},
{
"name": "height",
"type": "INT",
"links": [
118
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 66,
"type": "VAEDecode",
"pos": [
1696.6426615505557,
173.13380452764375
],
"size": [
166.0271370269786,
46
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 121
},
{
"name": "vae",
"type": "VAE",
"link": 122
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
120,
128,
129
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 77,
"type": "ImageFromBatch",
"pos": [
1562.2305676328322,
947.806885766381
],
"size": [
210,
82
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 129
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
131
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ImageFromBatch"
},
"widgets_values": [
2,
1
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
1104.4448189452391,
173.13380452764375
],
"size": [
315,
262
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 104
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 105
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
119
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
2.5,
"euler",
"simple",
1
]
},
{
"id": 76,
"type": "ImageFromBatch",
"pos": [
1562.2305676328322,
797.301847591665
],
"size": [
210,
82
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 128
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
136
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ImageFromBatch"
},
"widgets_values": [
1,
1
]
},
{
"id": 67,
"type": "SaveImage",
"pos": [
1939.850523648282,
171.13269321533235
],
"size": [
428.5909735732416,
468.94454416638166
],
"flags": {
"collapsed": false
},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 120
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 65,
"type": "LatentCutToBatch",
"pos": [
1453.0437402478974,
173.13380452764375
],
"size": [
210,
82
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 119
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
121
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "LatentCutToBatch"
},
"widgets_values": [
"t",
1
],
"color": "#332922",
"bgcolor": "#593930"
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
12.546970997699502,
-11.88447421897053
],
"size": [
345.70001220703125,
225.77000427246094
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n- [qwen_image_layered_fp8mixed.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_layered_fp8mixed.safetensors)\n- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/blob/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)\n- [qwen_image_layered_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/blob/main/split_files/vae/qwen_image_layered_vae.safetensors)\n\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── qwen_image_layered_fp8mixed.safetensors\n ├── 📂text_encoders/\n │ └── qwen_2.5_vl_7b_fp8_scaled.safetensors\n └── 📂vae/\n └── qwen_image_layered_vae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 80,
"type": "PreviewImage",
"pos": [
2460.612938515339,
795.0211535441636
],
"size": [
353.88890380859357,
371.8889038085938
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 135
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 59,
"type": "EmptyQwenImageLayeredLatentImage",
"pos": [
755.2170791040447,
693.0025348793122
],
"size": [
305.1563720703124,
130
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 117
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 118
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "EmptyQwenImageLayeredLatentImage"
},
"widgets_values": [
640,
640,
2,
1
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 81,
"type": "SplitImageWithAlpha",
"pos": [
1792.8667692821584,
797.301847591665
],
"size": [
213.68285814424544,
46
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 136
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
144
]
},
{
"name": "MASK",
"type": "MASK",
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "SplitImageWithAlpha"
},
"widgets_values": []
},
{
"id": 87,
"type": "InvertMask",
"pos": [
2032.7321075290527,
965.5261917188795
],
"size": [
140,
26
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "mask",
"type": "MASK",
"link": 147
}
],
"outputs": [
{
"name": "MASK",
"type": "MASK",
"links": [
148
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "InvertMask"
}
},
{
"id": 79,
"type": "SplitImageWithAlpha",
"pos": [
1792.8667692821584,
947.806885766381
],
"size": [
213.68285814424544,
46
],
"flags": {},
"order": 21,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 131
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
143
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
147
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "SplitImageWithAlpha"
},
"widgets_values": []
},
{
"id": 74,
"type": "ImageCompositeMasked",
"pos": [
2200.3106540392373,
795.0211535441636
],
"size": [
228.33342285156277,
146
],
"flags": {},
"order": 23,
"mode": 0,
"inputs": [
{
"name": "destination",
"type": "IMAGE",
"link": 144
},
{
"name": "source",
"type": "IMAGE",
"link": 143
},
{
"name": "mask",
"shape": 7,
"type": "MASK",
"link": 148
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
135
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.6.0",
"Node name for S&R": "ImageCompositeMasked"
},
"widgets_values": [
0,
0,
false
]
}
],
"links": [
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
102,
7,
0,
58,
0,
"CONDITIONING"
],
[
103,
6,
0,
57,
0,
"CONDITIONING"
],
[
104,
57,
0,
3,
1,
"CONDITIONING"
],
[
105,
58,
0,
3,
2,
"CONDITIONING"
],
[
108,
59,
0,
3,
3,
"LATENT"
],
[
109,
60,
0,
58,
1,
"LATENT"
],
[
110,
60,
0,
57,
1,
"LATENT"
],
[
113,
64,
0,
60,
0,
"IMAGE"
],
[
114,
64,
0,
63,
0,
"IMAGE"
],
[
115,
61,
0,
64,
0,
"IMAGE"
],
[
116,
39,
0,
60,
1,
"VAE"
],
[
117,
63,
0,
59,
0,
"INT"
],
[
118,
63,
1,
59,
1,
"INT"
],
[
119,
3,
0,
65,
0,
"LATENT"
],
[
120,
66,
0,
67,
0,
"IMAGE"
],
[
121,
65,
0,
66,
0,
"LATENT"
],
[
122,
39,
0,
66,
1,
"VAE"
],
[
128,
66,
0,
76,
0,
"IMAGE"
],
[
129,
66,
0,
77,
0,
"IMAGE"
],
[
131,
77,
0,
79,
0,
"IMAGE"
],
[
135,
74,
0,
80,
0,
"IMAGE"
],
[
136,
76,
0,
81,
0,
"IMAGE"
],
[
143,
79,
0,
74,
1,
"IMAGE"
],
[
144,
81,
0,
74,
0,
"IMAGE"
],
[
147,
79,
1,
87,
0,
"MASK"
],
[
148,
87,
0,
74,
2,
"MASK"
]
],
"groups": [
{
"id": 1,
"title": "Image Composite",
"bounding": [
1552.2305676328322,
723.701847591665,
1277.625191808733,
456.85072813132865
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.6209213230591553,
"offset": [
306.76748492398815,
311.2185045299079
]
},
"frontendVersion": "1.36.12",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
-
入力画像のリサイズ
- 1024px まで大きくできますが、レイヤー数が増えるほど重くなりやすいので、ここでは 0.5M ピクセルに設定しています。
-
🟩
Empty Qwen Image Layered Latentlayers: 分割したいレイヤー数- こちらも増やすほど、メモリと時間コストが上がります。
-
🟫
LatentCutToBatch- なにをやっているのか分かりづらいとは思いますが、実装都合の「整形」だと思ってしまってください。
- このモデルはその名の通り複数枚の画像を「レイヤー」として出力しますが、現在の
VAE Decodeはレイヤーという概念をうまく理解できないため、単なるN枚のバッチ画像に変換します。
-
🟦画像をまた合成する(任意)
- 2つのレイヤーに分けた場合、合計3枚の RGBA 画像(元画像+分解結果)が出力されます。
-
2枚目以降の画像を、
ImageCompositeMaskedで重ね続ければ元の1枚の画像に戻せます。- ただし、このノードはRGB画像しか扱えないため、RGB画像 + マスクという形に変換する必要があります。
- cf. マスクとアルファチャンネル
-
面倒くさいと思いますが、ComfyUIに限らず、ノードベースUIとレイヤーシステムは相性が悪いです😥