PixelDiT / PiD

PixelDiT

PixelDiT is a pixel diffusion model released by NVIDIA.

Many image generation models after Stable Diffusion use a mechanism called a Latent Diffusion Model.

Calculating an image pixel by pixel is expensive, so these models first compress the image into a smaller representation called a latent. This reduces computation while making it easier to handle features like shape, color, and composition.

However, when the latent is converted back into pixels, fine details such as small text and patterns can degrade.

A pixel diffusion model works directly with the image in pixel space instead of going through a latent. Because of that, VAE reconstruction loss does not occur in the same way.

That raises the obvious question: wasn't the latent there to reduce computation? PixelDiT handles this by splitting the image into patches, looking at the whole image roughly while drawing details on the pixel side.

Model Download

diffusion_models
- pixeldit_1300m_1024px_bf16.safetensors (2.6 GB)
text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── pixeldit_1300m_1024px_bf16.safetensors
    └── 📂text_encoders/
        └── gemma_2_2b_it_elm_bf16.safetensors

text2image

PixelDiT_text2image.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 75,
  "last_link_id": 127,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.6004778593708,
        403.99281184374564
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "text, worst quality, blurry, ugly"
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 71,
      "type": "MarkdownNote",
      "pos": [
        -130.18155802626615,
        -17.811007621292433
      ],
      "size": [
        351.89747511237124,
        228.61658757745528
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │   └── pixeldit_1300m_1024px_bf16.safetensors\n    └── 📂text_encoders/\n         └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        269.35973351536364,
        43.42716662131588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixeldit_1300m_1024px_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        977.9548217773436,
        67.42716662131588
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 74,
      "type": "ModelSamplingSD3",
      "pos": [
        608.2696075439453,
        43.427166621315884
      ],
      "size": [
        226,
        58
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 73,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        532.0091326445271,
        575.75393284228
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 126
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 127
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            123
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 75,
      "type": "ResolutionSelector",
      "pos": [
        234.8164432684831,
        575.75393284228
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            126
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "2:3 (Portrait Photo)",
        1,
        16
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 123
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        30,
        3,
        "er_sde",
        "simple",
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1443.3798111474612,
        188.1918182373047
      ],
      "size": [
        390.01472165749783,
        646.8217101795782
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      123,
      73,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      124,
      37,
      0,
      74,
      0,
      "MODEL"
    ],
    [
      125,
      74,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      126,
      75,
      0,
      73,
      0,
      "INT"
    ],
    [
      127,
      75,
      1,
      73,
      1,
      "INT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.620921323059155,
      "offset": [
        539.7718098712515,
        304.32781282959496
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Because this is a pixel diffusion model, it does not inherently need Load VAE or VAE Decode.

In ComfyUI, however, the workflow still follows the existing format: select pixel_space in Load VAE, then connect it to VAE Decode.

It may look as if the image is being decoded with a VAE called pixel_space, but think of it as the step that gets an IMAGE output from KSampler.

PiD

PiD is PixelDiT used in place of VAE Decode.

Normally, the generated latent is passed through VAE Decode to become an image. With PiD, that latent is passed to PixelDiT instead, so restoration into an image and upscaling are handled together.

For example, Z-Image-Turbo can generate a 1024×1024 latent, then send it to PiD before VAE Decode. With a 1024_to_4096 PiD model, the result is output as a 4096×4096 image.

In short, you can use the generation ability of an existing model while avoiding fine-detail degradation from VAE Decode.

Model Download

For SDXL
- pid_sdxl_1024_to_4096_4step_bf16.safetensors (2.72 GB)
For Qwen-Image
- pid_qwenimage_1024_to_4096_4step_bf16.safetensors (2.72 GB)
For Flux.1 / Z-Image
- pid_flux1_512_to_2048_4step_bf16.safetensors (2.72 GB)
- pid_flux1_1024_to_4096_4step_bf16.safetensors (2.72 GB)
For Flux.2
- pid_flux2_512_to_2048_4step_bf16.safetensors (2.73 GB)
- pid_flux2_1024_to_4096_4step_2606_bf16.safetensors (2.73 GB)

📂ComfyUI/
└── 📂models/
    └── 📂diffusion_models/
        ├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
        ├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux1_512_to_2048_4step_bf16.safetensors
        ├── pid_flux1_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux2_512_to_2048_4step_bf16.safetensors
        └── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors

You do not need to install all of them. Place only the PiD model that matches the base model you use.

Choosing a Model

There are two points to watch when choosing a PiD model.

Base model type
- It needs to match the latent type used by the original model.
- Use the SDXL version for SDXL, and the Flux.1 version for Z-Image.
Scale
- Model names include strings such as 1024_to_4096; this indicates the scale.
- It does not upscale automatically just because you choose the model. For 1024_to_4096, pass a latent / output around 1024px to PiD, then set the parameters so that PiD outputs a 4096px image.
- The aspect ratio is flexible as long as the rough resolution matches.

Z-Image-Turbo → PiD

Let's decode a Z-Image-Turbo latent with PiD.

Z-Image-Turbo_to_PiD_4k.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 80,
  "last_link_id": 131,
  "nodes": [
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        32.131015771484385,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        405.392333984375
      ],
      "size": [
        418.3189392089844,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        579.7813758789064,
        53.0477294921875
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingAuraFlow",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -151.24897385253908,
        -13.402286529541016
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        2059.9064044746965,
        1212.0613014712694
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 25,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1785.4283649239185,
        1093.5838570132628
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        602.886208918055,
        1349.5611731753713
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1452.9092950777112
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1232.6023980872037
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        1095.328699296338,
        1510.6730178870523
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 128
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        1095.4485588130935,
        1694.6636578512905
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 131
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        243.49762343749995,
        53.0477294921875
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "Z-Image/z_image_turbo_bf16.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1706.228364923919,
        1212.0613014712694
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 24,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        2250.8533542940327,
        1212.0613014712694
      ],
      "size": [
        666.7297467636986,
        558.9757191157356
      ],
      "flags": {},
      "order": 26,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        953.7971717773437,
        68.20164184570308
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 5,
      "mode": 4,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        1337.8355683530854,
        1508.893664010131
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1228.2752113281254,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 21,
      "mode": 4,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        573.1119422851564,
        473.02593102293815
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 129
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 130
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptySD3LatentImage",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        1104,
        1472,
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        1071.5054949571797,
        863.9897043960142
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 78,
      "type": "PreviewImage",
      "pos": [
        1412.600554848482,
        188.1918182373047
      ],
      "size": [
        475.4999999999998,
        310.8999999999998
      ],
      "flags": {},
      "order": 23,
      "mode": 4,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 127
        }
      ],
      "outputs": [],
      "properties": {
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 66,
      "type": "MarkdownNote",
      "pos": [
        319.0841671505862,
        861.233126193333
      ],
      "size": [
        391.4749836827225,
        249.70306513499378
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    └── 📂text_encoders/\n          └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        1368.6965058530855,
        1232.6023980872037
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 22,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 111
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 79,
      "type": "ResolutionSelector",
      "pos": [
        259.561201016606,
        493.48155615066963
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            128,
            129
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            130,
            131
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "3:2 (Photo)",
        1,
        16
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        874.5971717773439,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35,
            111
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        737.4906566223231,
        863.9897043960142
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        1339.486883153987,
        863.9897043960142
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      111,
      3,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      127,
      8,
      0,
      78,
      0,
      "IMAGE"
    ],
    [
      128,
      79,
      0,
      70,
      0,
      "INT"
    ],
    [
      129,
      79,
      0,
      53,
      0,
      "INT"
    ],
    [
      130,
      79,
      1,
      53,
      1,
      "INT"
    ],
    [
      131,
      79,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 1,
      "title": "Z-Image-Turbo",
      "bounding": [
        -161.24897385253908,
        -83.40228652954102,
        2064.456875764055,
        806.5944331355984
      ],
      "color": "#3f789e",
      "flags": {}
    },
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        290.6357989790571,
        776.3072911794422,
        2650.977159099303,
        1063.3279125842512
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.430567643134249,
      "offset": [
        591.0919825277065,
        375.87128501454
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟦 The upper-left part is a normal Z-Image-Turbo workflow.
- 🟩 Instead of sending the output latent to VAE Decode, connect it to PixelDiT's PiD Conditioning.
This example uses the 1024_to_4096 model.
- Z-Image-Turbo generates at around 1M pixels, and PiD is set to output at 4× that resolution.
PiD is a 4-step distilled model, so this workflow uses steps 4 and cfg 1.0.
The Context Windows (Manual) node is for tiling. Use it when you run into OOM, or when tall / wide images come out rough.

Upscaling Any Image

What gets passed to PiD Conditioning is just a latent.

So the previous step does not need to be text2image. You can VAE Encode any image you like, pass it to PiD, and use it like an upscaler.

PiD_flux1_4x_enhance.json

{
  "id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
  "revision": 0,
  "last_node_id": 89,
  "last_link_id": 143,
  "nodes": [
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        178.4554950121201,
        811.9490780397168
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        538.1873071000066,
        694.9903029515484
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        671.8070430075916,
        989.4245396947579
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 138
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        671.926902524347,
        1173.415179658996
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 139
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1281.7976510179856,
        674.4492063356142
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        647.074781051246,
        326.3776092603588
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1360.9976510179852,
        555.9717618776077
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 82,
      "type": "ResizeImageMaskNode",
      "pos": [
        -187.68646898516153,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 134
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            135
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale total pixels",
        1,
        "nearest-exact"
      ]
    },
    {
      "id": 80,
      "type": "VAEEncode",
      "pos": [
        407.88598227132763,
        1095.0166207447435
      ],
      "size": [
        170.05260120738637,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 136
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 132
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            133
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        540.3078029573284,
        910.6321261399594
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 84,
      "type": "GetImageSize",
      "pos": [
        409.91468906546555,
        1197.5924621383035
      ],
      "size": [
        210,
        136
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 137
        }
      ],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            138
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            139
          ]
        },
        {
          "name": "batch_size",
          "type": "INT",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "GetImageSize"
      },
      "widgets_values": []
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        1826.4226403881014,
        674.4492063356142
      ],
      "size": [
        644.1825674446068,
        806.9942591157356
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        313.05994271638855,
        326.3776092603588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 81,
      "type": "VAELoader",
      "pos": [
        85.79355151979729,
        989.5464055577053
      ],
      "size": [
        287.64071438371656,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            132
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        1635.4756905687632,
        674.4492063356142
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        915.0561692480533,
        326.3776092603588
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    },
    {
      "id": 79,
      "type": "LoadImage",
      "pos": [
        -532.4361549440936,
        1095.0166207447435
      ],
      "size": [
        316.7987915039063,
        467.0000366210936
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            134
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "ComfyUI_00091_.png",
        "image"
      ]
    },
    {
      "id": 83,
      "type": "ResizeImageMaskNode",
      "pos": [
        106.84934568668905,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 135
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            136,
            137
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale to multiple",
        16,
        "nearest-exact"
      ]
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        944.265791947152,
        694.9903029515484
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 133
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 89,
      "type": "MarkdownNote",
      "pos": [
        -135.8324089797664,
        326.3776092603588
      ],
      "size": [
        413.71462239515324,
        313.08611572179626
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n  * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    ├── 📂text_encoders/\n    │    └── gemma_2_2b_it_elm_bf16.safetensors\n    └── 📂vae/\n          └── ae.safetensors\n\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        914.3139120643391,
        987.6451858178366
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    }
  ],
  "links": [
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      132,
      81,
      0,
      80,
      1,
      "VAE"
    ],
    [
      133,
      80,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      134,
      79,
      0,
      82,
      0,
      "IMAGE"
    ],
    [
      135,
      82,
      0,
      83,
      0,
      "IMAGE"
    ],
    [
      136,
      83,
      0,
      80,
      0,
      "IMAGE"
    ],
    [
      137,
      83,
      0,
      84,
      0,
      "IMAGE"
    ],
    [
      138,
      84,
      0,
      70,
      0,
      "INT"
    ],
    [
      139,
      84,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        -554.3908824317602,
        238.6952183841798,
        3081.2835698845024,
        1345.6484688397118
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.5209868481924432,
      "offset": [
        749.0102116191479,
        6.514142817800455
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Resize the input image to around 1M pixels, with dimensions that are multiples of 16
Get the resized height and width, multiply them by 4, and use those values as the PiD output size

Each PiD model expects a matching VAE, so you need to Encode with the VAE that matches the PiD model.

It is tempting to use the newer Flux.2 VAE, but it changes the colors quite a lot. Here, the more stable Flux.1 PiD + ae.safetensors combination is used.

ae.safetensors (335 MB)

📂ComfyUI/
└── 📂models/
    └── 📂vae/
        └── ae.safetensors

What this does is essentially redrawing, so it is more of an enhance step than a normal upscaler.
It is not well suited when faithful reproduction is required.

PixelDiT / PiD

PixelDiT

Model Download

text2image

PiD

Model Download

Choosing a Model

Z-Image-Turbo → PiD

Upscaling Any Image

References

What is the JSON copy button?

This page has an issue!

Please explain more!

Feedback / Other

Thank you

PixelDiT / PiD

PixelDiT

Model Download

text2image

PiD

Model Download

Choosing a Model

Z-Image-Turbo → PiD

Upscaling Any Image

References

Related pages