PixelDiT
PixelDiT は、NVIDIA が公開している ピクセル拡散モデル です。
Stable Diffusion 以降の画像生成モデルの多くは、Latent Diffusion Model という仕組みを使っています。
画像を 1 ピクセルずつ計算するのは大変なので、一度 latent という小さな表現に圧縮することで計算量を減らしつつ、形や色、構図のような特徴を扱いやすくしているんですね。
ただ、latent からピクセルに戻すときに、細かい文字、模様といった細部がどうしても劣化してしまいます。
ピクセル拡散モデル は、latent を介さずに画像をピクセル空間のまま扱います。そのため、VAE による復元劣化は仕組み的に起きません。
計算量を下げるための latent だったんじゃないの?という疑問は残りますが、画像全体をそのまま細かく見るのではなく、パッチに分けて大まかに見つつ、細部はピクセル側で描き込むといった工夫でこれを解決しています。
モデルのダウンロード
- diffusion_models
- text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── pixeldit_1300m_1024px_bf16.safetensors
└── 📂text_encoders/
└── gemma_2_2b_it_elm_bf16.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 75,
"last_link_id": 127,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.6004778593708,
403.99281184374564
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"text, worst quality, blurry, ugly"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 71,
"type": "MarkdownNote",
"pos": [
-130.18155802626615,
-17.811007621292433
],
"size": [
351.89747511237124,
228.61658757745528
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── pixeldit_1300m_1024px_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
269.35973351536364,
43.42716662131588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixeldit_1300m_1024px_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
67.42716662131588
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 74,
"type": "ModelSamplingSD3",
"pos": [
608.2696075439453,
43.427166621315884
],
"size": [
226,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 73,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
532.0091326445271,
575.75393284228
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 126
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 127
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
123
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 75,
"type": "ResolutionSelector",
"pos": [
234.8164432684831,
575.75393284228
],
"size": [
270,
126
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
126
]
},
{
"name": "height",
"type": "INT",
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"2:3 (Portrait Photo)",
1,
16
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 123
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
30,
3,
"er_sde",
"simple",
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
188.1918182373047
],
"size": [
390.01472165749783,
646.8217101795782
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
123,
73,
0,
3,
3,
"LATENT"
],
[
124,
37,
0,
74,
0,
"MODEL"
],
[
125,
74,
0,
3,
0,
"MODEL"
],
[
126,
75,
0,
73,
0,
"INT"
],
[
127,
75,
1,
73,
1,
"INT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.620921323059155,
"offset": [
539.7718098712515,
304.32781282959496
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
ピクセル拡散モデルなので、本来は Load VAE も VAE Decode も必要ありません。
ただ、ComfyUI では既存の workflow 形式に合わせるため、Load VAE で pixel_space を選び、それを VAE Decode へ繋ぎます。
pixel_space という VAE でデコードしているように見えますが、これは KSampler から IMAGE 出力を得るための操作だと思ってください。
PiD
PiD は、VAE Decode の代わりに使う PixelDiT です。
通常は、生成した latent を VAE Decode して画像に戻します。 PiD では、その latent を PixelDiT に渡して、画像への復元と拡大をまとめてやっちゃおうという面白いアイデアです。
例えば、Z-Image-Turbo で 1024×1024 の latent を作り、VAE Decode する前に PiD へ渡します。
1024_to_4096 の PiD なら、それを 4096×4096 の画像として出力します。
既存モデルの生成力を使いつつ、VAE Decode による細部劣化を避けられる、というわけですね。
モデルのダウンロード
-
SDXL 用
-
Qwen-Image 用
-
Flux.1 / Z-Image 用
-
Flux.2 用
📂ComfyUI/
└── 📂models/
└── 📂diffusion_models/
├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
├── pid_flux1_512_to_2048_4step_bf16.safetensors
├── pid_flux1_1024_to_4096_4step_bf16.safetensors
├── pid_flux2_512_to_2048_4step_bf16.safetensors
└── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors
すべて入れる必要はありません。使うベースモデルに対応した PiD だけ配置します。
モデルの選び方
どの PiD モデルを選ぶかについて、二点注意する必要があります。
-
ベースモデルの種類
- 元のモデルが使っている latent タイプに合わせる必要があります。
- SDXL なら SDXL 用、Z-Image なら Flux.1 用 といった具合です。
-
拡大率
- モデル名を見ると
1024_to_4096のような文字が見えますが、これは拡大率です。 - これを使えば勝手に拡大されるわけではなく、たとえば
1024_to_4096なら、1024px 相当の latent / 出力を PiD に渡し、4096px の画像が出力されるようにパラメータを設定します。 - 大まかな解像度があっていればアスペクト比は自由です。
- モデル名を見ると
Z-Image-Turbo → PiD
Z-Image-Turbo の latent を、PiD でデコードしてみましょう。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 80,
"last_link_id": 131,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
32.131015771484385,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
405.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
579.7813758789064,
53.0477294921875
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingAuraFlow",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
3.1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-151.24897385253908,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
2059.9064044746965,
1212.0613014712694
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1785.4283649239185,
1093.5838570132628
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 60,
"type": "CLIPLoader",
"pos": [
602.886208918055,
1349.5611731753713
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1452.9092950777112
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1232.6023980872037
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
1095.328699296338,
1510.6730178870523
],
"size": [
210,
128
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 128
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
1095.4485588130935,
1694.6636578512905
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 131
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
243.49762343749995,
53.0477294921875
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"Z-Image/z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 61,
"type": "KSampler",
"pos": [
1706.228364923919,
1212.0613014712694
],
"size": [
315,
262
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 65,
"type": "SaveImage",
"pos": [
2250.8533542940327,
1212.0613014712694
],
"size": [
666.7297467636986,
558.9757191157356
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
953.7971717773437,
68.20164184570308
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 5,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
1337.8355683530854,
1508.893664010131
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1228.2752113281254,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 21,
"mode": 4,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
573.1119422851564,
473.02593102293815
],
"size": [
237,
106
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 129
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 130
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"Node name for S&R": "EmptySD3LatentImage",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
1071.5054949571797,
863.9897043960142
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 78,
"type": "PreviewImage",
"pos": [
1412.600554848482,
188.1918182373047
],
"size": [
475.4999999999998,
310.8999999999998
],
"flags": {},
"order": 23,
"mode": 4,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 127
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 66,
"type": "MarkdownNote",
"pos": [
319.0841671505862,
861.233126193333
],
"size": [
391.4749836827225,
249.70306513499378
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
1368.6965058530855,
1232.6023980872037
],
"size": [
270,
102
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 111
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 79,
"type": "ResolutionSelector",
"pos": [
259.561201016606,
493.48155615066963
],
"size": [
270,
126
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
128,
129
]
},
{
"name": "height",
"type": "INT",
"links": [
130,
131
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"3:2 (Photo)",
1,
16
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
874.5971717773439,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35,
111
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
737.4906566223231,
863.9897043960142
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
1339.486883153987,
863.9897043960142
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
111,
3,
0,
67,
1,
"LATENT"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
127,
8,
0,
78,
0,
"IMAGE"
],
[
128,
79,
0,
70,
0,
"INT"
],
[
129,
79,
0,
53,
0,
"INT"
],
[
130,
79,
1,
53,
1,
"INT"
],
[
131,
79,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 1,
"title": "Z-Image-Turbo",
"bounding": [
-161.24897385253908,
-83.40228652954102,
2064.456875764055,
806.5944331355984
],
"color": "#3f789e",
"flags": {}
},
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
290.6357989790571,
776.3072911794422,
2650.977159099303,
1063.3279125842512
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.430567643134249,
"offset": [
591.0919825277065,
375.87128501454
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟦 左上は通常の Z-Image-Turbo workflow です。
- 🟩 出力された latent は VAE Decode せず、PixelDiT 側の
PiD Conditioningに繋ぎます。
- 🟩 出力された latent は VAE Decode せず、PixelDiT 側の
- 今回は
1024_to_4096モデルを使います。- Z-Image-Turbo 側は 1M ピクセル相当で生成し、PiD 側ではその 4 倍の解像度を指定します。
- PiD は 4 ステップ蒸留モデルなので、ここでは
stepsを 4、cfgを 1.0 にしています。 Context Windows (Manual)ノードは、いわゆるタイリング用です。 OOM する場合や、縦長・横長の画像で出力が荒れる場合に使います。
任意の画像をアップスケール
PiD Conditioning に渡しているのは、ただの latent です。
そのため、前段でわざわざ text2image をしなくても、好きな画像を一度 VAE Encode して PiD に渡せば、アップスケーラーのように使うこともできます。

{
"id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
"revision": 0,
"last_node_id": 89,
"last_link_id": 143,
"nodes": [
{
"id": 60,
"type": "CLIPLoader",
"pos": [
178.4554950121201,
811.9490780397168
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
538.1873071000066,
694.9903029515484
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
671.8070430075916,
989.4245396947579
],
"size": [
210,
128
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 138
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
671.926902524347,
1173.415179658996
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 139
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 61,
"type": "KSampler",
"pos": [
1281.7976510179856,
674.4492063356142
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
647.074781051246,
326.3776092603588
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1360.9976510179852,
555.9717618776077
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 82,
"type": "ResizeImageMaskNode",
"pos": [
-187.68646898516153,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 134
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
135
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1,
"nearest-exact"
]
},
{
"id": 80,
"type": "VAEEncode",
"pos": [
407.88598227132763,
1095.0166207447435
],
"size": [
170.05260120738637,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 136
},
{
"name": "vae",
"type": "VAE",
"link": 132
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
133
]
}
],
"properties": {
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
540.3078029573284,
910.6321261399594
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 84,
"type": "GetImageSize",
"pos": [
409.91468906546555,
1197.5924621383035
],
"size": [
210,
136
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 137
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
138
]
},
{
"name": "height",
"type": "INT",
"links": [
139
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 65,
"type": "SaveImage",
"pos": [
1826.4226403881014,
674.4492063356142
],
"size": [
644.1825674446068,
806.9942591157356
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
313.05994271638855,
326.3776092603588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 81,
"type": "VAELoader",
"pos": [
85.79355151979729,
989.5464055577053
],
"size": [
287.64071438371656,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
132
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
1635.4756905687632,
674.4492063356142
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
915.0561692480533,
326.3776092603588
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
},
{
"id": 79,
"type": "LoadImage",
"pos": [
-532.4361549440936,
1095.0166207447435
],
"size": [
316.7987915039063,
467.0000366210936
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
134
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"ComfyUI_00091_.png",
"image"
]
},
{
"id": 83,
"type": "ResizeImageMaskNode",
"pos": [
106.84934568668905,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 135
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
136,
137
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
16,
"nearest-exact"
]
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
944.265791947152,
694.9903029515484
],
"size": [
270,
102
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 133
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 89,
"type": "MarkdownNote",
"pos": [
-135.8324089797664,
326.3776092603588
],
"size": [
413.71462239515324,
313.08611572179626
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_it_elm_bf16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
914.3139120643391,
987.6451858178366
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
}
],
"links": [
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
132,
81,
0,
80,
1,
"VAE"
],
[
133,
80,
0,
67,
1,
"LATENT"
],
[
134,
79,
0,
82,
0,
"IMAGE"
],
[
135,
82,
0,
83,
0,
"IMAGE"
],
[
136,
83,
0,
80,
0,
"IMAGE"
],
[
137,
83,
0,
84,
0,
"IMAGE"
],
[
138,
84,
0,
70,
0,
"INT"
],
[
139,
84,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
-554.3908824317602,
238.6952183841798,
3081.2835698845024,
1345.6484688397118
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.5209868481924432,
"offset": [
749.0102116191479,
6.514142817800455
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 入力画像を 1M ピクセル相当、かつ 16 の倍数になるようにリサイズ
- リサイズ後の高さと幅を取得し、4 倍した値を PiD 側の出力サイズに使用
PiD モデルごとに対応する VAE が違うため、PiD モデルに合った VAE で Encode する必要があります。
新しい Flux.2 VAE を使いたくなりますが、色が大きく変わってしまうため、ここでは安定している Flux.1用PiD + ae.safetensors の組み合わせにしています。
- ae.safetensors (335 MB)
📂ComfyUI/
└── 📂models/
└── 📂vae/
└── ae.safetensors
やっていることは本質的には描き直しなので、アップスケーラーというよりエンハンスです。
忠実な再現が必要な用途にはあまり向きません。