PixelDiT / PiD

PixelDiT

PixelDiT は、NVIDIA が公開している ピクセル拡散モデル です。

Stable Diffusion 以降の画像生成モデルの多くは、Latent Diffusion Model という仕組みを使っています。

画像を 1 ピクセルずつ計算するのは大変なので、一度 latent という小さな表現に圧縮することで計算量を減らしつつ、形や色、構図のような特徴を扱いやすくしているんですね。

ただ、latent からピクセルに戻すときに、細かい文字、模様といった細部がどうしても劣化してしまいます。

ピクセル拡散モデル は、latent を介さずに画像をピクセル空間のまま扱います。そのため、VAE による復元劣化は仕組み的に起きません。

計算量を下げるための latent だったんじゃないの？という疑問は残りますが、画像全体をそのまま細かく見るのではなく、パッチに分けて大まかに見つつ、細部はピクセル側で描き込むといった工夫でこれを解決しています。

モデルのダウンロード

diffusion_models
- pixeldit_1300m_1024px_bf16.safetensors (2.6 GB)
text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)

📂ComfyUI/
└── 📂models/
    ├── 📂diffusion_models/
    │   └── pixeldit_1300m_1024px_bf16.safetensors
    └── 📂text_encoders/
        └── gemma_2_2b_it_elm_bf16.safetensors

text2image

PixelDiT_text2image.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 75,
  "last_link_id": 127,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1252.432861328125,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413.6004778593708,
        403.99281184374564
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": false
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "text, worst quality, blurry, ugly"
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        56.288665771484375,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 71,
      "type": "MarkdownNote",
      "pos": [
        -130.18155802626615,
        -17.811007621292433
      ],
      "size": [
        351.89747511237124,
        228.61658757745528
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │   └── pixeldit_1300m_1024px_bf16.safetensors\n    └── 📂text_encoders/\n         └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        269.35973351536364,
        43.42716662131588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixeldit_1300m_1024px_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        977.9548217773436,
        67.42716662131588
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 74,
      "type": "ModelSamplingSD3",
      "pos": [
        608.2696075439453,
        43.427166621315884
      ],
      "size": [
        226,
        58
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 73,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        532.0091326445271,
        575.75393284228
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 126
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 127
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            123
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        1024,
        1024,
        1
      ]
    },
    {
      "id": 75,
      "type": "ResolutionSelector",
      "pos": [
        234.8164432684831,
        575.75393284228
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            126
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "2:3 (Portrait Photo)",
        1,
        16
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415.00001525878906,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        898.7548217773438,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 123
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        30,
        3,
        "er_sde",
        "simple",
        1
      ]
    },
    {
      "id": 56,
      "type": "SaveImage",
      "pos": [
        1443.3798111474612,
        188.1918182373047
      ],
      "size": [
        390.01472165749783,
        646.8217101795782
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 101
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      101,
      8,
      0,
      56,
      0,
      "IMAGE"
    ],
    [
      123,
      73,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      124,
      37,
      0,
      74,
      0,
      "MODEL"
    ],
    [
      125,
      74,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      126,
      75,
      0,
      73,
      0,
      "INT"
    ],
    [
      127,
      75,
      1,
      73,
      1,
      "INT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.620921323059155,
      "offset": [
        539.7718098712515,
        304.32781282959496
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

ピクセル拡散モデルなので、本来は Load VAE も VAE Decode も必要ありません。

ただ、ComfyUI では既存の workflow 形式に合わせるため、Load VAE で pixel_space を選び、それを VAE Decode へ繋ぎます。

pixel_space という VAE でデコードしているように見えますが、これは KSampler から IMAGE 出力を得るための操作だと思ってください。

PiD

PiD は、VAE Decode の代わりに使う PixelDiT です。

通常は、生成した latent を VAE Decode して画像に戻します。 PiD では、その latent を PixelDiT に渡して、画像への復元と拡大をまとめてやっちゃおうという面白いアイデアです。

例えば、Z-Image-Turbo で 1024×1024 の latent を作り、VAE Decode する前に PiD へ渡します。 1024_to_4096 の PiD なら、それを 4096×4096 の画像として出力します。

既存モデルの生成力を使いつつ、VAE Decode による細部劣化を避けられる、というわけですね。

モデルのダウンロード

SDXL 用
- pid_sdxl_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Qwen-Image 用
- pid_qwenimage_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Flux.1 / Z-Image 用
- pid_flux1_512_to_2048_4step_bf16.safetensors (2.72 GB)
- pid_flux1_1024_to_4096_4step_bf16.safetensors (2.72 GB)
Flux.2 用
- pid_flux2_512_to_2048_4step_bf16.safetensors (2.73 GB)
- pid_flux2_1024_to_4096_4step_2606_bf16.safetensors (2.73 GB)

📂ComfyUI/
└── 📂models/
    └── 📂diffusion_models/
        ├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
        ├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux1_512_to_2048_4step_bf16.safetensors
        ├── pid_flux1_1024_to_4096_4step_bf16.safetensors
        ├── pid_flux2_512_to_2048_4step_bf16.safetensors
        └── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors

すべて入れる必要はありません。使うベースモデルに対応した PiD だけ配置します。

モデルの選び方

どの PiD モデルを選ぶかについて、二点注意する必要があります。

ベースモデルの種類
- 元のモデルが使っている latent タイプに合わせる必要があります。
- SDXL なら SDXL 用、Z-Image なら Flux.1 用 といった具合です。
拡大率
- モデル名を見ると 1024_to_4096 のような文字が見えますが、これは拡大率です。
- これを使えば勝手に拡大されるわけではなく、たとえば 1024_to_4096 なら、1024px 相当の latent / 出力を PiD に渡し、4096px の画像が出力されるようにパラメータを設定します。
- 大まかな解像度があっていればアスペクト比は自由です。

Z-Image-Turbo → PiD

Z-Image-Turbo の latent を、PiD でデコードしてみましょう。

Z-Image-Turbo_to_PiD_4k.json

{
  "id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
  "revision": 0,
  "last_node_id": 80,
  "last_link_id": 131,
  "nodes": [
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        32.131015771484385,
        312.74468994140625
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            74,
            75
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "qwen_3_4b.safetensors",
        "lumina2",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        405.392333984375
      ],
      "size": [
        418.3189392089844,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            52
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingAuraFlow",
      "pos": [
        579.7813758789064,
        53.0477294921875
      ],
      "size": [
        230.33058166503906,
        58
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 99
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            100
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingAuraFlow",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        3.1
      ]
    },
    {
      "id": 55,
      "type": "MarkdownNote",
      "pos": [
        -151.24897385253908,
        -13.402286529541016
      ],
      "size": [
        349.13103718118725,
        214.5148968572393
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n      ├── 📂diffusion_models/\n      │   └── z_image_turbo_bf16.safetensors\n      ├── 📂text_encoders/\n      │   └── qwen_3_4b.safetensors\n      └── 📂vae/\n           └── ae.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        2059.9064044746965,
        1212.0613014712694
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 25,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1785.4283649239185,
        1093.5838570132628
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        602.886208918055,
        1349.5611731753713
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1452.9092950777112
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        962.6180210059409,
        1232.6023980872037
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        1095.328699296338,
        1510.6730178870523
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 128
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        1095.4485588130935,
        1694.6636578512905
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 131
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        243.49762343749995,
        53.0477294921875
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            99
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "Z-Image/z_image_turbo_bf16.safetensors",
        "fp8_e4m3fn"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1706.228364923919,
        1212.0613014712694
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 24,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        2250.8533542940327,
        1212.0613014712694
      ],
      "size": [
        666.7297467636986,
        558.9757191157356
      ],
      "flags": {},
      "order": 26,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        953.7971717773437,
        68.20164184570308
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 5,
      "mode": 4,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            76
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        1337.8355683530854,
        1508.893664010131
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        390.84235000000007,
        186
      ],
      "size": [
        419.26959228515625,
        156.00363159179688
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            46
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1228.2752113281254,
        188.1918182373047
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 21,
      "mode": 4,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            127
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 53,
      "type": "EmptySD3LatentImage",
      "pos": [
        573.1119422851564,
        473.02593102293815
      ],
      "size": [
        237,
        106
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 129
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 130
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            98
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptySD3LatentImage",
        "cnr_id": "comfy-core",
        "ver": "0.3.49"
      },
      "widgets_values": [
        1104,
        1472,
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        1071.5054949571797,
        863.9897043960142
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 78,
      "type": "PreviewImage",
      "pos": [
        1412.600554848482,
        188.1918182373047
      ],
      "size": [
        475.4999999999998,
        310.8999999999998
      ],
      "flags": {},
      "order": 23,
      "mode": 4,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 127
        }
      ],
      "outputs": [],
      "properties": {
        "Node name for S&R": "PreviewImage"
      },
      "widgets_values": []
    },
    {
      "id": 66,
      "type": "MarkdownNote",
      "pos": [
        319.0841671505862,
        861.233126193333
      ],
      "size": [
        391.4749836827225,
        249.70306513499378
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    └── 📂text_encoders/\n          └── gemma_2_2b_it_elm_bf16.safetensors\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        1368.6965058530855,
        1232.6023980872037
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 22,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 111
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 79,
      "type": "ResolutionSelector",
      "pos": [
        259.561201016606,
        493.48155615066963
      ],
      "size": [
        270,
        126
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            128,
            129
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            130,
            131
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResolutionSelector"
      },
      "widgets_values": [
        "3:2 (Photo)",
        1,
        16
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        874.5971717773439,
        188.1918182373047
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 100
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 98
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            35,
            111
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        12345,
        "fixed",
        8,
        1,
        "euler",
        "simple",
        1
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        737.4906566223231,
        863.9897043960142
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        1339.486883153987,
        863.9897043960142
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      98,
      53,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      99,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      100,
      54,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      111,
      3,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      127,
      8,
      0,
      78,
      0,
      "IMAGE"
    ],
    [
      128,
      79,
      0,
      70,
      0,
      "INT"
    ],
    [
      129,
      79,
      0,
      53,
      0,
      "INT"
    ],
    [
      130,
      79,
      1,
      53,
      1,
      "INT"
    ],
    [
      131,
      79,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 1,
      "title": "Z-Image-Turbo",
      "bounding": [
        -161.24897385253908,
        -83.40228652954102,
        2064.456875764055,
        806.5944331355984
      ],
      "color": "#3f789e",
      "flags": {}
    },
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        290.6357989790571,
        776.3072911794422,
        2650.977159099303,
        1063.3279125842512
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.430567643134249,
      "offset": [
        591.0919825277065,
        375.87128501454
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟦 左上は通常の Z-Image-Turbo workflow です。
- 🟩 出力された latent は VAE Decode せず、PixelDiT 側の PiD Conditioning に繋ぎます。
今回は 1024_to_4096 モデルを使います。
- Z-Image-Turbo 側は 1M ピクセル相当で生成し、PiD 側ではその 4 倍の解像度を指定します。
PiD は 4 ステップ蒸留モデルなので、ここでは steps を 4、cfg を 1.0 にしています。
Context Windows (Manual) ノードは、いわゆるタイリング用です。 OOM する場合や、縦長・横長の画像で出力が荒れる場合に使います。

任意の画像をアップスケール

PiD Conditioning に渡しているのは、ただの latent です。

そのため、前段でわざわざ text2image をしなくても、好きな画像を一度 VAE Encode して PiD に渡せば、アップスケーラーのように使うこともできます。

PiD_flux1_4x_enhance.json

{
  "id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
  "revision": 0,
  "last_node_id": 89,
  "last_link_id": 143,
  "nodes": [
    {
      "id": 60,
      "type": "CLIPLoader",
      "pos": [
        178.4554950121201,
        811.9490780397168
      ],
      "size": [
        301.3524169921875,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 0,
          "links": [
            104,
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "gemma_2_2b_it_elm_bf16.safetensors",
        "pixeldit",
        "default"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 63,
      "type": "CLIPTextEncode",
      "pos": [
        538.1873071000066,
        694.9903029515484
      ],
      "size": [
        361.1895922851561,
        152.373631591797
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            112
          ]
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 70,
      "type": "ComfyMathExpression",
      "pos": [
        671.8070430075916,
        989.4245396947579
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 15,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 138
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            121
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 74,
      "type": "ComfyMathExpression",
      "pos": [
        671.926902524347,
        1173.415179658996
      ],
      "size": [
        210,
        128
      ],
      "flags": {},
      "order": 16,
      "mode": 0,
      "inputs": [
        {
          "label": "a",
          "name": "values.a",
          "type": "FLOAT,INT,BOOLEAN",
          "link": 139
        },
        {
          "label": "b",
          "name": "values.b",
          "shape": 7,
          "type": "FLOAT,INT,BOOLEAN",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "FLOAT",
          "type": "FLOAT",
          "links": null
        },
        {
          "name": "INT",
          "type": "INT",
          "links": [
            122
          ]
        },
        {
          "name": "BOOL",
          "type": "BOOLEAN",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "ComfyMathExpression"
      },
      "widgets_values": [
        "a * 4"
      ]
    },
    {
      "id": 61,
      "type": "KSampler",
      "pos": [
        1281.7976510179856,
        674.4492063356142
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 18,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 124
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 113
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 107
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            102
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        1234,
        "fixed",
        4,
        1,
        "lcm",
        "simple",
        1
      ]
    },
    {
      "id": 77,
      "type": "ModelSamplingSD3",
      "pos": [
        647.074781051246,
        326.3776092603588
      ],
      "size": [
        234.02274434806372,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 125
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            126
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4
      ]
    },
    {
      "id": 62,
      "type": "VAELoader",
      "pos": [
        1360.9976510179852,
        555.9717618776077
      ],
      "size": [
        235.80000000000018,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            103
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pixel_space"
      ]
    },
    {
      "id": 82,
      "type": "ResizeImageMaskNode",
      "pos": [
        -187.68646898516153,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 134
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            135
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale total pixels",
        1,
        "nearest-exact"
      ]
    },
    {
      "id": 80,
      "type": "VAEEncode",
      "pos": [
        407.88598227132763,
        1095.0166207447435
      ],
      "size": [
        170.05260120738637,
        46
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 136
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 132
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            133
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": []
    },
    {
      "id": 58,
      "type": "CLIPTextEncode",
      "pos": [
        540.3078029573284,
        910.6321261399594
      ],
      "size": [
        419.26959228515625,
        107.08506774902344
      ],
      "flags": {
        "collapsed": true
      },
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 104
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            107
          ]
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 84,
      "type": "GetImageSize",
      "pos": [
        409.91468906546555,
        1197.5924621383035
      ],
      "size": [
        210,
        136
      ],
      "flags": {},
      "order": 13,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 137
        }
      ],
      "outputs": [
        {
          "name": "width",
          "type": "INT",
          "links": [
            138
          ]
        },
        {
          "name": "height",
          "type": "INT",
          "links": [
            139
          ]
        },
        {
          "name": "batch_size",
          "type": "INT",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "GetImageSize"
      },
      "widgets_values": []
    },
    {
      "id": 65,
      "type": "SaveImage",
      "pos": [
        1826.4226403881014,
        674.4492063356142
      ],
      "size": [
        644.1825674446068,
        806.9942591157356
      ],
      "flags": {},
      "order": 20,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 110
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.75"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 59,
      "type": "UNETLoader",
      "pos": [
        313.05994271638855,
        326.3776092603588
      ],
      "size": [
        305.3782043457031,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            125
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "pid_flux1_1024_to_4096_4step_bf16.safetensors",
        "default"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 81,
      "type": "VAELoader",
      "pos": [
        85.79355151979729,
        989.5464055577053
      ],
      "size": [
        287.64071438371656,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 0,
          "links": [
            132
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ae.safetensors"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 57,
      "type": "VAEDecode",
      "pos": [
        1635.4756905687632,
        674.4492063356142
      ],
      "size": [
        157.56002807617188,
        46
      ],
      "flags": {},
      "order": 19,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 102
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode",
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": []
    },
    {
      "id": 76,
      "type": "ContextWindowsManual",
      "pos": [
        915.0561692480533,
        326.3776092603588
      ],
      "size": [
        299.2096226990984,
        298
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 126
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            124
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ContextWindowsManual"
      },
      "widgets_values": [
        1536,
        384,
        "standard_static",
        1,
        false,
        "pyramid",
        2,
        false,
        "",
        false,
        false
      ]
    },
    {
      "id": 79,
      "type": "LoadImage",
      "pos": [
        -532.4361549440936,
        1095.0166207447435
      ],
      "size": [
        316.7987915039063,
        467.0000366210936
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            134
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "ComfyUI_00091_.png",
        "image"
      ]
    },
    {
      "id": 83,
      "type": "ResizeImageMaskNode",
      "pos": [
        106.84934568668905,
        1095.0166207447435
      ],
      "size": [
        266.5849202168248,
        106
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "input",
          "type": "IMAGE,MASK",
          "link": 135
        }
      ],
      "outputs": [
        {
          "name": "resized",
          "type": "IMAGE",
          "links": [
            136,
            137
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ResizeImageMaskNode"
      },
      "widgets_values": [
        "scale to multiple",
        16,
        "nearest-exact"
      ]
    },
    {
      "id": 67,
      "type": "PiDConditioning",
      "pos": [
        944.265791947152,
        694.9903029515484
      ],
      "size": [
        270,
        102
      ],
      "flags": {},
      "order": 14,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 112
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 133
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            113
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "PiDConditioning"
      },
      "widgets_values": [
        "flux",
        0
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 89,
      "type": "MarkdownNote",
      "pos": [
        -135.8324089797664,
        326.3776092603588
      ],
      "size": [
        413.71462239515324,
        313.08611572179626
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "## models\n\n* diffusion_models\n\n  * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n  *  [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n  * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n  * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n    ├── 📂diffusion_models/\n    │    ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n    │    └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n    ├── 📂text_encoders/\n    │    └── gemma_2_2b_it_elm_bf16.safetensors\n    └── 📂vae/\n          └── ae.safetensors\n\n```"
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 64,
      "type": "EmptyChromaRadianceLatentImage",
      "pos": [
        914.3139120643391,
        987.6451858178366
      ],
      "size": [
        300.8609375,
        106
      ],
      "flags": {},
      "order": 17,
      "mode": 0,
      "inputs": [
        {
          "name": "width",
          "type": "INT",
          "widget": {
            "name": "width"
          },
          "link": 121
        },
        {
          "name": "height",
          "type": "INT",
          "widget": {
            "name": "height"
          },
          "link": 122
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            108
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyChromaRadianceLatentImage"
      },
      "widgets_values": [
        896,
        1152,
        1
      ]
    }
  ],
  "links": [
    [
      102,
      61,
      0,
      57,
      0,
      "LATENT"
    ],
    [
      103,
      62,
      0,
      57,
      1,
      "VAE"
    ],
    [
      104,
      60,
      0,
      58,
      0,
      "CLIP"
    ],
    [
      107,
      58,
      0,
      61,
      2,
      "CONDITIONING"
    ],
    [
      108,
      64,
      0,
      61,
      3,
      "LATENT"
    ],
    [
      109,
      60,
      0,
      63,
      0,
      "CLIP"
    ],
    [
      110,
      57,
      0,
      65,
      0,
      "IMAGE"
    ],
    [
      112,
      63,
      0,
      67,
      0,
      "CONDITIONING"
    ],
    [
      113,
      67,
      0,
      61,
      1,
      "CONDITIONING"
    ],
    [
      121,
      70,
      1,
      64,
      0,
      "INT"
    ],
    [
      122,
      74,
      1,
      64,
      1,
      "INT"
    ],
    [
      124,
      76,
      0,
      61,
      0,
      "MODEL"
    ],
    [
      125,
      59,
      0,
      77,
      0,
      "MODEL"
    ],
    [
      126,
      77,
      0,
      76,
      0,
      "MODEL"
    ],
    [
      132,
      81,
      0,
      80,
      1,
      "VAE"
    ],
    [
      133,
      80,
      0,
      67,
      1,
      "LATENT"
    ],
    [
      134,
      79,
      0,
      82,
      0,
      "IMAGE"
    ],
    [
      135,
      82,
      0,
      83,
      0,
      "IMAGE"
    ],
    [
      136,
      83,
      0,
      80,
      0,
      "IMAGE"
    ],
    [
      137,
      83,
      0,
      84,
      0,
      "IMAGE"
    ],
    [
      138,
      84,
      0,
      70,
      0,
      "INT"
    ],
    [
      139,
      84,
      1,
      74,
      0,
      "INT"
    ]
  ],
  "groups": [
    {
      "id": 2,
      "title": "Pid_1024→4096",
      "bounding": [
        -554.3908824317602,
        238.6952183841798,
        3081.2835698845024,
        1345.6484688397118
      ],
      "color": "#8A8",
      "flags": {}
    }
  ],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.5209868481924432,
      "offset": [
        749.0102116191479,
        6.514142817800455
      ]
    },
    "frontendVersion": "1.45.15",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

入力画像を 1M ピクセル相当、かつ 16 の倍数になるようにリサイズ
リサイズ後の高さと幅を取得し、4 倍した値を PiD 側の出力サイズに使用

PiD モデルごとに対応する VAE が違うため、PiD モデルに合った VAE で Encode する必要があります。

新しい Flux.2 VAE を使いたくなりますが、色が大きく変わってしまうため、ここでは安定している Flux.1用PiD + ae.safetensors の組み合わせにしています。

ae.safetensors (335 MB)

📂ComfyUI/
└── 📂models/
    └── 📂vae/
        └── ae.safetensors

やっていることは本質的には描き直しなので、アップスケーラーというよりエンハンスです。
忠実な再現が必要な用途にはあまり向きません。

PixelDiT / PiD

PixelDiT

モデルのダウンロード

text2image

PiD

モデルのダウンロード

モデルの選び方

Z-Image-Turbo → PiD

任意の画像をアップスケール

参考

jsonコピーボタンとは？

修正・誤字報告

記事リクエスト

感想・その他

ありがとうございます

PixelDiT / PiD

PixelDiT

モデルのダウンロード

text2image

PiD

モデルのダウンロード

モデルの選び方

Z-Image-Turbo → PiD

任意の画像をアップスケール

参考

関連ページ