PixelDiT
PixelDiT is a pixel diffusion model released by NVIDIA.
Many image generation models after Stable Diffusion use a mechanism called a Latent Diffusion Model.
Calculating an image pixel by pixel is expensive, so these models first compress the image into a smaller representation called a latent. This reduces computation while making it easier to handle features like shape, color, and composition.
However, when the latent is converted back into pixels, fine details such as small text and patterns can degrade.
A pixel diffusion model works directly with the image in pixel space instead of going through a latent. Because of that, VAE reconstruction loss does not occur in the same way.
That raises the obvious question: wasn't the latent there to reduce computation? PixelDiT handles this by splitting the image into patches, looking at the whole image roughly while drawing details on the pixel side.
Model Download
- diffusion_models
- text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── pixeldit_1300m_1024px_bf16.safetensors
└── 📂text_encoders/
└── gemma_2_2b_it_elm_bf16.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 75,
"last_link_id": 127,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.6004778593708,
403.99281184374564
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"text, worst quality, blurry, ugly"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 71,
"type": "MarkdownNote",
"pos": [
-130.18155802626615,
-17.811007621292433
],
"size": [
351.89747511237124,
228.61658757745528
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── pixeldit_1300m_1024px_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
269.35973351536364,
43.42716662131588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixeldit_1300m_1024px_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
67.42716662131588
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 74,
"type": "ModelSamplingSD3",
"pos": [
608.2696075439453,
43.427166621315884
],
"size": [
226,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 73,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
532.0091326445271,
575.75393284228
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 126
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 127
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
123
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 75,
"type": "ResolutionSelector",
"pos": [
234.8164432684831,
575.75393284228
],
"size": [
270,
126
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
126
]
},
{
"name": "height",
"type": "INT",
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"2:3 (Portrait Photo)",
1,
16
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 123
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
30,
3,
"er_sde",
"simple",
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
188.1918182373047
],
"size": [
390.01472165749783,
646.8217101795782
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
123,
73,
0,
3,
3,
"LATENT"
],
[
124,
37,
0,
74,
0,
"MODEL"
],
[
125,
74,
0,
3,
0,
"MODEL"
],
[
126,
75,
0,
73,
0,
"INT"
],
[
127,
75,
1,
73,
1,
"INT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.620921323059155,
"offset": [
539.7718098712515,
304.32781282959496
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Because this is a pixel diffusion model, it does not inherently need Load VAE or VAE Decode.
In ComfyUI, however, the workflow still follows the existing format: select pixel_space in Load VAE, then connect it to VAE Decode.
It may look as if the image is being decoded with a VAE called pixel_space, but think of it as the step that gets an IMAGE output from KSampler.
PiD
PiD is PixelDiT used in place of VAE Decode.
Normally, the generated latent is passed through VAE Decode to become an image. With PiD, that latent is passed to PixelDiT instead, so restoration into an image and upscaling are handled together.
For example, Z-Image-Turbo can generate a 1024×1024 latent, then send it to PiD before VAE Decode.
With a 1024_to_4096 PiD model, the result is output as a 4096×4096 image.
In short, you can use the generation ability of an existing model while avoiding fine-detail degradation from VAE Decode.
Model Download
-
For SDXL
-
For Qwen-Image
-
For Flux.1 / Z-Image
-
For Flux.2
📂ComfyUI/
└── 📂models/
└── 📂diffusion_models/
├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
├── pid_flux1_512_to_2048_4step_bf16.safetensors
├── pid_flux1_1024_to_4096_4step_bf16.safetensors
├── pid_flux2_512_to_2048_4step_bf16.safetensors
└── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors
You do not need to install all of them. Place only the PiD model that matches the base model you use.
Choosing a Model
There are two points to watch when choosing a PiD model.
-
Base model type
- It needs to match the latent type used by the original model.
- Use the SDXL version for SDXL, and the Flux.1 version for Z-Image.
-
Scale
- Model names include strings such as
1024_to_4096; this indicates the scale. - It does not upscale automatically just because you choose the model. For
1024_to_4096, pass a latent / output around 1024px to PiD, then set the parameters so that PiD outputs a 4096px image. - The aspect ratio is flexible as long as the rough resolution matches.
- Model names include strings such as
Z-Image-Turbo → PiD
Let's decode a Z-Image-Turbo latent with PiD.

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 80,
"last_link_id": 131,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
32.131015771484385,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
405.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
579.7813758789064,
53.0477294921875
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingAuraFlow",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
3.1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-151.24897385253908,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
2059.9064044746965,
1212.0613014712694
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1785.4283649239185,
1093.5838570132628
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 60,
"type": "CLIPLoader",
"pos": [
602.886208918055,
1349.5611731753713
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1452.9092950777112
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1232.6023980872037
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
1095.328699296338,
1510.6730178870523
],
"size": [
210,
128
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 128
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
1095.4485588130935,
1694.6636578512905
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 131
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
243.49762343749995,
53.0477294921875
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"Z-Image/z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 61,
"type": "KSampler",
"pos": [
1706.228364923919,
1212.0613014712694
],
"size": [
315,
262
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 65,
"type": "SaveImage",
"pos": [
2250.8533542940327,
1212.0613014712694
],
"size": [
666.7297467636986,
558.9757191157356
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
953.7971717773437,
68.20164184570308
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 5,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
1337.8355683530854,
1508.893664010131
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1228.2752113281254,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 21,
"mode": 4,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
573.1119422851564,
473.02593102293815
],
"size": [
237,
106
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 129
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 130
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"Node name for S&R": "EmptySD3LatentImage",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
1071.5054949571797,
863.9897043960142
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 78,
"type": "PreviewImage",
"pos": [
1412.600554848482,
188.1918182373047
],
"size": [
475.4999999999998,
310.8999999999998
],
"flags": {},
"order": 23,
"mode": 4,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 127
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 66,
"type": "MarkdownNote",
"pos": [
319.0841671505862,
861.233126193333
],
"size": [
391.4749836827225,
249.70306513499378
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
1368.6965058530855,
1232.6023980872037
],
"size": [
270,
102
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 111
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 79,
"type": "ResolutionSelector",
"pos": [
259.561201016606,
493.48155615066963
],
"size": [
270,
126
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
128,
129
]
},
{
"name": "height",
"type": "INT",
"links": [
130,
131
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"3:2 (Photo)",
1,
16
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
874.5971717773439,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35,
111
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
737.4906566223231,
863.9897043960142
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
1339.486883153987,
863.9897043960142
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
111,
3,
0,
67,
1,
"LATENT"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
127,
8,
0,
78,
0,
"IMAGE"
],
[
128,
79,
0,
70,
0,
"INT"
],
[
129,
79,
0,
53,
0,
"INT"
],
[
130,
79,
1,
53,
1,
"INT"
],
[
131,
79,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 1,
"title": "Z-Image-Turbo",
"bounding": [
-161.24897385253908,
-83.40228652954102,
2064.456875764055,
806.5944331355984
],
"color": "#3f789e",
"flags": {}
},
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
290.6357989790571,
776.3072911794422,
2650.977159099303,
1063.3279125842512
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.430567643134249,
"offset": [
591.0919825277065,
375.87128501454
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟦 The upper-left part is a normal Z-Image-Turbo workflow.
- 🟩 Instead of sending the output latent to VAE Decode, connect it to PixelDiT's
PiD Conditioning.
- 🟩 Instead of sending the output latent to VAE Decode, connect it to PixelDiT's
- This example uses the
1024_to_4096model.- Z-Image-Turbo generates at around 1M pixels, and PiD is set to output at 4× that resolution.
- PiD is a 4-step distilled model, so this workflow uses
steps4 andcfg1.0. - The
Context Windows (Manual)node is for tiling. Use it when you run into OOM, or when tall / wide images come out rough.
Upscaling Any Image
What gets passed to PiD Conditioning is just a latent.
So the previous step does not need to be text2image. You can VAE Encode any image you like, pass it to PiD, and use it like an upscaler.

{
"id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
"revision": 0,
"last_node_id": 89,
"last_link_id": 143,
"nodes": [
{
"id": 60,
"type": "CLIPLoader",
"pos": [
178.4554950121201,
811.9490780397168
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
538.1873071000066,
694.9903029515484
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
671.8070430075916,
989.4245396947579
],
"size": [
210,
128
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 138
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
671.926902524347,
1173.415179658996
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 139
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 61,
"type": "KSampler",
"pos": [
1281.7976510179856,
674.4492063356142
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
647.074781051246,
326.3776092603588
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1360.9976510179852,
555.9717618776077
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 82,
"type": "ResizeImageMaskNode",
"pos": [
-187.68646898516153,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 134
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
135
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1,
"nearest-exact"
]
},
{
"id": 80,
"type": "VAEEncode",
"pos": [
407.88598227132763,
1095.0166207447435
],
"size": [
170.05260120738637,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 136
},
{
"name": "vae",
"type": "VAE",
"link": 132
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
133
]
}
],
"properties": {
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
540.3078029573284,
910.6321261399594
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 84,
"type": "GetImageSize",
"pos": [
409.91468906546555,
1197.5924621383035
],
"size": [
210,
136
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 137
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
138
]
},
{
"name": "height",
"type": "INT",
"links": [
139
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 65,
"type": "SaveImage",
"pos": [
1826.4226403881014,
674.4492063356142
],
"size": [
644.1825674446068,
806.9942591157356
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
313.05994271638855,
326.3776092603588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 81,
"type": "VAELoader",
"pos": [
85.79355151979729,
989.5464055577053
],
"size": [
287.64071438371656,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
132
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
1635.4756905687632,
674.4492063356142
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
915.0561692480533,
326.3776092603588
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
},
{
"id": 79,
"type": "LoadImage",
"pos": [
-532.4361549440936,
1095.0166207447435
],
"size": [
316.7987915039063,
467.0000366210936
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
134
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"ComfyUI_00091_.png",
"image"
]
},
{
"id": 83,
"type": "ResizeImageMaskNode",
"pos": [
106.84934568668905,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 135
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
136,
137
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
16,
"nearest-exact"
]
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
944.265791947152,
694.9903029515484
],
"size": [
270,
102
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 133
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 89,
"type": "MarkdownNote",
"pos": [
-135.8324089797664,
326.3776092603588
],
"size": [
413.71462239515324,
313.08611572179626
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_it_elm_bf16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
914.3139120643391,
987.6451858178366
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
}
],
"links": [
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
132,
81,
0,
80,
1,
"VAE"
],
[
133,
80,
0,
67,
1,
"LATENT"
],
[
134,
79,
0,
82,
0,
"IMAGE"
],
[
135,
82,
0,
83,
0,
"IMAGE"
],
[
136,
83,
0,
80,
0,
"IMAGE"
],
[
137,
83,
0,
84,
0,
"IMAGE"
],
[
138,
84,
0,
70,
0,
"INT"
],
[
139,
84,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
-554.3908824317602,
238.6952183841798,
3081.2835698845024,
1345.6484688397118
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.5209868481924432,
"offset": [
749.0102116191479,
6.514142817800455
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- Resize the input image to around 1M pixels, with dimensions that are multiples of 16
- Get the resized height and width, multiply them by 4, and use those values as the PiD output size
Each PiD model expects a matching VAE, so you need to Encode with the VAE that matches the PiD model.
It is tempting to use the newer Flux.2 VAE, but it changes the colors quite a lot. Here, the more stable Flux.1 PiD + ae.safetensors combination is used.
- ae.safetensors (335 MB)
📂ComfyUI/
└── 📂models/
└── 📂vae/
└── ae.safetensors
What this does is essentially redrawing, so it is more of an enhance step than a normal upscaler.
It is not well suited when faithful reproduction is required.