PixelDiT / PiD

PixelDiT

PixelDiT 是 NVIDIA 公开的 像素扩散模型。

Stable Diffusion 之后的很多图像生成模型，都使用 Latent Diffusion Model 这种机制。

逐像素计算图像的成本很高，所以模型会先把图像压缩成较小的 latent。这样可以减少计算量，同时也更容易处理形状、颜色、构图等特征。

不过，从 latent 还原回像素时，细小文字、纹样这类细节还是容易劣化。

像素扩散模型 不经过 latent，而是直接在像素空间里处理图像。因此，VAE 还原带来的劣化在机制上不会以同样的方式发生。

那不是正因为计算量大，才要用 latent 吗？PixelDiT 的做法是把图像切成 patch，一边粗略地看整体，一边在像素侧补细节。

模型的下载

diffusion_models
- pixeldit_1300m_1024px_bf16.safetensors (2.6 GB)
text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── pixeldit_1300m_1024px_bf16.safetensors
    └── 📂text_encoders/
        └── gemma_2_2b_it_elm_bf16.safetensors

text2image

PixelDiT_text2image.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 75,
  "last_link_id": 127,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.6004778593708,
        403.99281184374564
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "text, worst quality, blurry, ugly"
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 71,
      "type": "MarkdownNote",
      "pos": [
        -130.18155802626615,
        -17.811007621292433
      ],
      "size": [
        351.89747511237124,
        228.61658757745528
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │   └── pixeldit_1300m_1024px_bf16.safetensors\n    └── 📂text_encoders/\n         └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        269.35973351536364,
        43.42716662131588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixeldit_1300m_1024px_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        977.9548217773436,
        67.42716662131588
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 74,
      "type": "ModelSamplingSD3",
      "pos": [
        608.2696075439453,
        43.427166621315884
      ],
      "size": [
        226,
        58
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 73,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        532.0091326445271,
        575.75393284228
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 126
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 127
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            123
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 75,
      "type": "ResolutionSelector",
      "pos": [
        234.8164432684831,
        575.75393284228
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            126
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "2:3 (Portrait Photo)",
        1,
        16
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 123
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        30,
        3,
        "er_sde",
        "simple",
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1443.3798111474612,
        188.1918182373047
      ],
      "size": [
        390.01472165749783,
        646.8217101795782
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      123,
      73,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      124,
      37,
      0,
      74,
      0,
      "MODEL"
    ],
    [
      125,
      74,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      126,
      75,
      0,
      73,
      0,
      "INT"
    ],
    [
      127,
      75,
      1,
      73,
      1,
      "INT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.620921323059155,
      "offset": [
        539.7718098712515,
        304.32781282959496
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

因为是像素扩散模型，本来并不需要 Load VAE 或 VAE Decode。

不过在 ComfyUI 里，为了配合既有的 workflow 形式，需要在 Load VAE 里选择 pixel_space，再连接到 VAE Decode。

看起来像是用名为 pixel_space 的 VAE 在解码，但这里可以理解为从 KSampler 取得 IMAGE 输出的操作。

PiD

PiD 是用来替代 VAE Decode 的 PixelDiT。

通常情况下，生成出的 latent 会经过 VAE Decode 还原成图像。 PiD 则是把这个 latent 交给 PixelDiT，让图像还原和放大一起完成。

例如，先用 Z-Image-Turbo 生成 1024×1024 的 latent，再在 VAE Decode 之前交给 PiD。如果使用 1024_to_4096 的 PiD，就会输出 4096×4096 的图像。

也就是说，可以利用现有模型的生成能力，同时避开 VAE Decode 对细节造成的劣化。

模型的下载

SDXL 用
- pid_sdxl_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Qwen-Image 用
- pid_qwenimage_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Flux.1 / Z-Image 用
- pid_flux1_512_to_2048_4step_bf16.safetensors (2.72 GB)
- pid_flux1_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Flux.2 用
- pid_flux2_512_to_2048_4step_bf16.safetensors (2.73 GB)
- pid_flux2_1024_to_4096_4step_2606_bf16.safetensors (2.73 GB)

📂ComfyUI/
└── 📂models/
    └── 📂diffusion_models/
        ├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
        ├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux1_512_to_2048_4step_bf16.safetensors
        ├── pid_flux1_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux2_512_to_2048_4step_bf16.safetensors
        └── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors

不需要全部放进去。只放和使用的基础模型对应的 PiD 就可以。

模型的选择

选择 PiD 模型时，需要注意两点。

基础模型的种类
- 需要和原模型使用的 latent 类型一致。
- SDXL 就用 SDXL 用，Z-Image 就用 Flux.1 用。
放大倍率
- 模型名里会看到 1024_to_4096 这样的字样，这表示放大倍率。
- 并不是选了这个模型就会自动放大。比如 1024_to_4096，需要把 1024px 左右的 latent / 输出交给 PiD，并设置参数，让 PiD 输出 4096px 的图像。
- 大致分辨率对上即可，宽高比可以自由调整。

Z-Image-Turbo → PiD

试着用 PiD 解码 Z-Image-Turbo 的 latent。

Z-Image-Turbo_to_PiD_4k.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 80,
  "last_link_id": 131,
  "nodes": [
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        32.131015771484385,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        405.392333984375
      ],
      "size": [
        418.3189392089844,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        579.7813758789064,
        53.0477294921875
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingAuraFlow",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -151.24897385253908,
        -13.402286529541016
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        2059.9064044746965,
        1212.0613014712694
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 25,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1785.4283649239185,
        1093.5838570132628
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        602.886208918055,
        1349.5611731753713
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1452.9092950777112
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1232.6023980872037
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        1095.328699296338,
        1510.6730178870523
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 128
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        1095.4485588130935,
        1694.6636578512905
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 131
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        243.49762343749995,
        53.0477294921875
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "Z-Image/z_image_turbo_bf16.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1706.228364923919,
        1212.0613014712694
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 24,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        2250.8533542940327,
        1212.0613014712694
      ],
      "size": [
        666.7297467636986,
        558.9757191157356
      ],
      "flags": {},
      "order": 26,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        953.7971717773437,
        68.20164184570308
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 5,
      "mode": 4,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        1337.8355683530854,
        1508.893664010131
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1228.2752113281254,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 21,
      "mode": 4,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        573.1119422851564,
        473.02593102293815
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 129
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 130
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptySD3LatentImage",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        1104,
        1472,
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        1071.5054949571797,
        863.9897043960142
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 78,
      "type": "PreviewImage",
      "pos": [
        1412.600554848482,
        188.1918182373047
      ],
      "size": [
        475.4999999999998,
        310.8999999999998
      ],
      "flags": {},
      "order": 23,
      "mode": 4,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 127
        }
      ],
      "outputs": [],
      "properties": {
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 66,
      "type": "MarkdownNote",
      "pos": [
        319.0841671505862,
        861.233126193333
      ],
      "size": [
        391.4749836827225,
        249.70306513499378
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    └── 📂text_encoders/\n          └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        1368.6965058530855,
        1232.6023980872037
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 22,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 111
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 79,
      "type": "ResolutionSelector",
      "pos": [
        259.561201016606,
        493.48155615066963
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            128,
            129
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            130,
            131
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "3:2 (Photo)",
        1,
        16
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        874.5971717773439,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35,
            111
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        737.4906566223231,
        863.9897043960142
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        1339.486883153987,
        863.9897043960142
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      111,
      3,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      127,
      8,
      0,
      78,
      0,
      "IMAGE"
    ],
    [
      128,
      79,
      0,
      70,
      0,
      "INT"
    ],
    [
      129,
      79,
      0,
      53,
      0,
      "INT"
    ],
    [
      130,
      79,
      1,
      53,
      1,
      "INT"
    ],
    [
      131,
      79,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 1,
      "title": "Z-Image-Turbo",
      "bounding": [
        -161.24897385253908,
        -83.40228652954102,
        2064.456875764055,
        806.5944331355984
      ],
      "color": "#3f789e",
      "flags": {}
    },
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        290.6357989790571,
        776.3072911794422,
        2650.977159099303,
        1063.3279125842512
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.430567643134249,
      "offset": [
        591.0919825277065,
        375.87128501454
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟦 左上是普通的 Z-Image-Turbo workflow。
- 🟩 输出的 latent 不走 VAE Decode，而是连接到 PixelDiT 侧的 PiD Conditioning。
这里使用 1024_to_4096 模型。
- Z-Image-Turbo 侧以约 1M 像素生成，PiD 侧指定为 4 倍分辨率。
PiD 是 4 step 蒸馏模型，所以这里把 steps 设为 4，cfg 设为 1.0。
Context Windows (Manual) 节点用于 tiling。 OOM 时，或者纵长 / 横长图像输出变粗糙时使用。

放大任意图像

传给 PiD Conditioning 的，只是普通的 latent。

因此，前面不一定要专门做 text2image。把任意图像先 VAE Encode，再交给 PiD，就可以像 upscaler 一样使用。

PiD_flux1_4x_enhance.json

{
  "id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
  "revision": 0,
  "last_node_id": 89,
  "last_link_id": 143,
  "nodes": [
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        178.4554950121201,
        811.9490780397168
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        538.1873071000066,
        694.9903029515484
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        671.8070430075916,
        989.4245396947579
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 138
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        671.926902524347,
        1173.415179658996
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 139
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1281.7976510179856,
        674.4492063356142
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        647.074781051246,
        326.3776092603588
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1360.9976510179852,
        555.9717618776077
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 82,
      "type": "ResizeImageMaskNode",
      "pos": [
        -187.68646898516153,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 134
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            135
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale total pixels",
        1,
        "nearest-exact"
      ]
    },
    {
      "id": 80,
      "type": "VAEEncode",
      "pos": [
        407.88598227132763,
        1095.0166207447435
      ],
      "size": [
        170.05260120738637,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 136
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 132
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            133
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        540.3078029573284,
        910.6321261399594
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 84,
      "type": "GetImageSize",
      "pos": [
        409.91468906546555,
        1197.5924621383035
      ],
      "size": [
        210,
        136
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 137
        }
      ],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            138
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            139
          ]
        },
        {
          "name": "batch_size",
          "type": "INT",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "GetImageSize"
      },
      "widgets_values": []
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        1826.4226403881014,
        674.4492063356142
      ],
      "size": [
        644.1825674446068,
        806.9942591157356
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        313.05994271638855,
        326.3776092603588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 81,
      "type": "VAELoader",
      "pos": [
        85.79355151979729,
        989.5464055577053
      ],
      "size": [
        287.64071438371656,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            132
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        1635.4756905687632,
        674.4492063356142
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        915.0561692480533,
        326.3776092603588
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    },
    {
      "id": 79,
      "type": "LoadImage",
      "pos": [
        -532.4361549440936,
        1095.0166207447435
      ],
      "size": [
        316.7987915039063,
        467.0000366210936
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            134
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "ComfyUI_00091_.png",
        "image"
      ]
    },
    {
      "id": 83,
      "type": "ResizeImageMaskNode",
      "pos": [
        106.84934568668905,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 135
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            136,
            137
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale to multiple",
        16,
        "nearest-exact"
      ]
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        944.265791947152,
        694.9903029515484
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 133
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 89,
      "type": "MarkdownNote",
      "pos": [
        -135.8324089797664,
        326.3776092603588
      ],
      "size": [
        413.71462239515324,
        313.08611572179626
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n  * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    ├── 📂text_encoders/\n    │    └── gemma_2_2b_it_elm_bf16.safetensors\n    └── 📂vae/\n          └── ae.safetensors\n\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        914.3139120643391,
        987.6451858178366
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    }
  ],
  "links": [
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      132,
      81,
      0,
      80,
      1,
      "VAE"
    ],
    [
      133,
      80,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      134,
      79,
      0,
      82,
      0,
      "IMAGE"
    ],
    [
      135,
      82,
      0,
      83,
      0,
      "IMAGE"
    ],
    [
      136,
      83,
      0,
      80,
      0,
      "IMAGE"
    ],
    [
      137,
      83,
      0,
      84,
      0,
      "IMAGE"
    ],
    [
      138,
      84,
      0,
      70,
      0,
      "INT"
    ],
    [
      139,
      84,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        -554.3908824317602,
        238.6952183841798,
        3081.2835698845024,
        1345.6484688397118
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.5209868481924432,
      "offset": [
        749.0102116191479,
        6.514142817800455
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

将输入图像 resize 到约 1M 像素，并让尺寸成为 16 的倍数
取得 resize 后的高和宽，把它们乘以 4，作为 PiD 侧的输出尺寸

每个 PiD 模型对应的 VAE 不同，因此需要用和 PiD 模型匹配的 VAE 来 Encode。

可能会想使用新的 Flux.2 VAE，但颜色会变化很大。这里使用更稳定的 Flux.1 用 PiD + ae.safetensors 组合。

ae.safetensors (335 MB)

📂ComfyUI/
└── 📂models/
    └── 📂vae/
        └── ae.safetensors

本质上做的是重新描绘，所以与其说是 upscaler，不如说是 enhance。
不太适合需要忠实再现的用途。

PixelDiT / PiD

PixelDiT

模型的下载

text2image

PiD

模型的下载

模型的选择

Z-Image-Turbo → PiD

放大任意图像

参考

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！

PixelDiT / PiD

PixelDiT

模型的下载

text2image

PiD

模型的下载

模型的选择

Z-Image-Turbo → PiD

放大任意图像

参考

相关页面