PixelDiT
PixelDiT 是 NVIDIA 公开的 像素扩散模型。
Stable Diffusion 之后的很多图像生成模型,都使用 Latent Diffusion Model 这种机制。
逐像素计算图像的成本很高,所以模型会先把图像压缩成较小的 latent。这样可以减少计算量,同时也更容易处理形状、颜色、构图等特征。
不过,从 latent 还原回像素时,细小文字、纹样这类细节还是容易劣化。
像素扩散模型 不经过 latent,而是直接在像素空间里处理图像。因此,VAE 还原带来的劣化在机制上不会以同样的方式发生。
那不是正因为计算量大,才要用 latent 吗?PixelDiT 的做法是把图像切成 patch,一边粗略地看整体,一边在像素侧补细节。
模型的下载
- diffusion_models
- text_encoders
- gemma_2_2b_it_elm_bf16.safetensors (5.23 GB)
📂ComfyUI/
└── 📂models/
├── 📂diffusion_models/
│ └── pixeldit_1300m_1024px_bf16.safetensors
└── 📂text_encoders/
└── gemma_2_2b_it_elm_bf16.safetensors
text2image

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 75,
"last_link_id": 127,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1252.432861328125,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
101
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413.6004778593708,
403.99281184374564
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"text, worst quality, blurry, ugly"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
56.288665771484375,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 71,
"type": "MarkdownNote",
"pos": [
-130.18155802626615,
-17.811007621292433
],
"size": [
351.89747511237124,
228.61658757745528
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pixeldit_1300m_1024px_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pixeldit_1300m_1024px_bf16.safetensors) (2.6 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── pixeldit_1300m_1024px_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
269.35973351536364,
43.42716662131588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixeldit_1300m_1024px_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 39,
"type": "VAELoader",
"pos": [
977.9548217773436,
67.42716662131588
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 74,
"type": "ModelSamplingSD3",
"pos": [
608.2696075439453,
43.427166621315884
],
"size": [
226,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 73,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
532.0091326445271,
575.75393284228
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 126
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 127
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
123
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 75,
"type": "ResolutionSelector",
"pos": [
234.8164432684831,
575.75393284228
],
"size": [
270,
126
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
126
]
},
{
"name": "height",
"type": "INT",
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"2:3 (Portrait Photo)",
1,
16
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415.00001525878906,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A stylish editorial food photograph of a small round chocolate mousse cake on a rustic wooden table, warm cocoa brown velvet texture, delicate chocolate decoration on top, tiny white flowers as garnish, placed on a simple golden dessert plate, soft natural window light, shallow depth of field, dreamy foreground blur with green leaves, warm earthy tones, elegant patisserie atmosphere, cozy cafe mood, high-end dessert photography, cinematic bokeh, no text, no logo, no watermark, no typography"
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
898.7548217773438,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 123
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
30,
3,
"er_sde",
"simple",
1
]
},
{
"id": 56,
"type": "SaveImage",
"pos": [
1443.3798111474612,
188.1918182373047
],
"size": [
390.01472165749783,
646.8217101795782
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 101
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
101,
8,
0,
56,
0,
"IMAGE"
],
[
123,
73,
0,
3,
3,
"LATENT"
],
[
124,
37,
0,
74,
0,
"MODEL"
],
[
125,
74,
0,
3,
0,
"MODEL"
],
[
126,
75,
0,
73,
0,
"INT"
],
[
127,
75,
1,
73,
1,
"INT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.620921323059155,
"offset": [
539.7718098712515,
304.32781282959496
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
因为是像素扩散模型,本来并不需要 Load VAE 或 VAE Decode。
不过在 ComfyUI 里,为了配合既有的 workflow 形式,需要在 Load VAE 里选择 pixel_space,再连接到 VAE Decode。
看起来像是用名为 pixel_space 的 VAE 在解码,但这里可以理解为从 KSampler 取得 IMAGE 输出的操作。
PiD
PiD 是用来替代 VAE Decode 的 PixelDiT。
通常情况下,生成出的 latent 会经过 VAE Decode 还原成图像。 PiD 则是把这个 latent 交给 PixelDiT,让图像还原和放大一起完成。
例如,先用 Z-Image-Turbo 生成 1024×1024 的 latent,再在 VAE Decode 之前交给 PiD。
如果使用 1024_to_4096 的 PiD,就会输出 4096×4096 的图像。
也就是说,可以利用现有模型的生成能力,同时避开 VAE Decode 对细节造成的劣化。
模型的下载
-
SDXL 用
-
Qwen-Image 用
-
Flux.1 / Z-Image 用
-
Flux.2 用
📂ComfyUI/
└── 📂models/
└── 📂diffusion_models/
├── pid_sdxl_1024_to_4096_4step_bf16.safetensors
├── pid_qwenimage_1024_to_4096_4step_bf16.safetensors
├── pid_flux1_512_to_2048_4step_bf16.safetensors
├── pid_flux1_1024_to_4096_4step_bf16.safetensors
├── pid_flux2_512_to_2048_4step_bf16.safetensors
└── pid_flux2_1024_to_4096_4step_2606_bf16.safetensors
不需要全部放进去。只放和使用的基础模型对应的 PiD 就可以。
模型的选择
选择 PiD 模型时,需要注意两点。
-
基础模型的种类
- 需要和原模型使用的 latent 类型一致。
- SDXL 就用 SDXL 用,Z-Image 就用 Flux.1 用。
-
放大倍率
- 模型名里会看到
1024_to_4096这样的字样,这表示放大倍率。 - 并不是选了这个模型就会自动放大。比如
1024_to_4096,需要把 1024px 左右的 latent / 输出交给 PiD,并设置参数,让 PiD 输出 4096px 的图像。 - 大致分辨率对上即可,宽高比可以自由调整。
- 模型名里会看到
Z-Image-Turbo → PiD
试着用 PiD 解码 Z-Image-Turbo 的 latent。

{
"id": "d8034549-7e0a-40f1-8c2e-de3ffc6f1cae",
"revision": 0,
"last_node_id": 80,
"last_link_id": 131,
"nodes": [
{
"id": 38,
"type": "CLIPLoader",
"pos": [
32.131015771484385,
312.74468994140625
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
74,
75
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"qwen_3_4b.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
405.392333984375
],
"size": [
418.3189392089844,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
52
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 54,
"type": "ModelSamplingAuraFlow",
"pos": [
579.7813758789064,
53.0477294921875
],
"size": [
230.33058166503906,
58
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 99
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
100
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingAuraFlow",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
3.1
]
},
{
"id": 55,
"type": "MarkdownNote",
"pos": [
-151.24897385253908,
-13.402286529541016
],
"size": [
349.13103718118725,
214.5148968572393
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [z_image_turbo_bf16.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors)\n- [qwen_3_4b.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/text_encoders/qwen_3_4b.safetensors)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ └── z_image_turbo_bf16.safetensors\n ├── 📂text_encoders/\n │ └── qwen_3_4b.safetensors\n └── 📂vae/\n └── ae.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
2059.9064044746965,
1212.0613014712694
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 25,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1785.4283649239185,
1093.5838570132628
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 60,
"type": "CLIPLoader",
"pos": [
602.886208918055,
1349.5611731753713
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1452.9092950777112
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
962.6180210059409,
1232.6023980872037
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
1095.328699296338,
1510.6730178870523
],
"size": [
210,
128
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 128
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
1095.4485588130935,
1694.6636578512905
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 131
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
243.49762343749995,
53.0477294921875
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
99
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"Z-Image/z_image_turbo_bf16.safetensors",
"fp8_e4m3fn"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 61,
"type": "KSampler",
"pos": [
1706.228364923919,
1212.0613014712694
],
"size": [
315,
262
],
"flags": {},
"order": 24,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 65,
"type": "SaveImage",
"pos": [
2250.8533542940327,
1212.0613014712694
],
"size": [
666.7297467636986,
558.9757191157356
],
"flags": {},
"order": 26,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 39,
"type": "VAELoader",
"pos": [
953.7971717773437,
68.20164184570308
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 5,
"mode": 4,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
76
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
1337.8355683530854,
1508.893664010131
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
390.84235000000007,
186
],
"size": [
419.26959228515625,
156.00363159179688
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
46
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"A single retro-style Japanese pudding in a handmade ceramic pedestal bowl, smooth golden custard with rich caramel sauce, topped with light cream, a small strawberry piece, and tiny leaves, set on a rustic wooden table beside wooden spoons, soft window light, natural and airy mood, warm earthy colors, shallow focus, tasteful Japanese cafe aesthetic, simple and elegant dessert photography, one pudding only, no extra desserts, no text, no logo, no watermark"
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1228.2752113281254,
188.1918182373047
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 21,
"mode": 4,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
127
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 53,
"type": "EmptySD3LatentImage",
"pos": [
573.1119422851564,
473.02593102293815
],
"size": [
237,
106
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 129
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 130
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
98
]
}
],
"properties": {
"Node name for S&R": "EmptySD3LatentImage",
"cnr_id": "comfy-core",
"ver": "0.3.49"
},
"widgets_values": [
1104,
1472,
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
1071.5054949571797,
863.9897043960142
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 78,
"type": "PreviewImage",
"pos": [
1412.600554848482,
188.1918182373047
],
"size": [
475.4999999999998,
310.8999999999998
],
"flags": {},
"order": 23,
"mode": 4,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 127
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 66,
"type": "MarkdownNote",
"pos": [
319.0841671505862,
861.233126193333
],
"size": [
391.4749836827225,
249.70306513499378
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n └── 📂text_encoders/\n └── gemma_2_2b_it_elm_bf16.safetensors\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
1368.6965058530855,
1232.6023980872037
],
"size": [
270,
102
],
"flags": {},
"order": 22,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 111
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 79,
"type": "ResolutionSelector",
"pos": [
259.561201016606,
493.48155615066963
],
"size": [
270,
126
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
128,
129
]
},
{
"name": "height",
"type": "INT",
"links": [
130,
131
]
}
],
"properties": {
"Node name for S&R": "ResolutionSelector"
},
"widgets_values": [
"3:2 (Photo)",
1,
16
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
874.5971717773439,
188.1918182373047
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 100
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 98
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
35,
111
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
12345,
"fixed",
8,
1,
"euler",
"simple",
1
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
737.4906566223231,
863.9897043960142
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
1339.486883153987,
863.9897043960142
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
98,
53,
0,
3,
3,
"LATENT"
],
[
99,
37,
0,
54,
0,
"MODEL"
],
[
100,
54,
0,
3,
0,
"MODEL"
],
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
111,
3,
0,
67,
1,
"LATENT"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
127,
8,
0,
78,
0,
"IMAGE"
],
[
128,
79,
0,
70,
0,
"INT"
],
[
129,
79,
0,
53,
0,
"INT"
],
[
130,
79,
1,
53,
1,
"INT"
],
[
131,
79,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 1,
"title": "Z-Image-Turbo",
"bounding": [
-161.24897385253908,
-83.40228652954102,
2064.456875764055,
806.5944331355984
],
"color": "#3f789e",
"flags": {}
},
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
290.6357989790571,
776.3072911794422,
2650.977159099303,
1063.3279125842512
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.430567643134249,
"offset": [
591.0919825277065,
375.87128501454
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟦 左上是普通的 Z-Image-Turbo workflow。
- 🟩 输出的 latent 不走 VAE Decode,而是连接到 PixelDiT 侧的
PiD Conditioning。
- 🟩 输出的 latent 不走 VAE Decode,而是连接到 PixelDiT 侧的
- 这里使用
1024_to_4096模型。- Z-Image-Turbo 侧以约 1M 像素生成,PiD 侧指定为 4 倍分辨率。
- PiD 是 4 step 蒸馏模型,所以这里把
steps设为 4,cfg设为 1.0。 Context Windows (Manual)节点用于 tiling。 OOM 时,或者纵长 / 横长图像输出变粗糙时使用。
放大任意图像
传给 PiD Conditioning 的,只是普通的 latent。
因此,前面不一定要专门做 text2image。把任意图像先 VAE Encode,再交给 PiD,就可以像 upscaler 一样使用。

{
"id": "1aa3b166-1861-429f-92ae-7ee12e64ab01",
"revision": 0,
"last_node_id": 89,
"last_link_id": 143,
"nodes": [
{
"id": 60,
"type": "CLIPLoader",
"pos": [
178.4554950121201,
811.9490780397168
],
"size": [
301.3524169921875,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 0,
"links": [
104,
109
]
}
],
"properties": {
"Node name for S&R": "CLIPLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"gemma_2_2b_it_elm_bf16.safetensors",
"pixeldit",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 63,
"type": "CLIPTextEncode",
"pos": [
538.1873071000066,
694.9903029515484
],
"size": [
361.1895922851561,
152.373631591797
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 109
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
112
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 70,
"type": "ComfyMathExpression",
"pos": [
671.8070430075916,
989.4245396947579
],
"size": [
210,
128
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 138
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
121
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 74,
"type": "ComfyMathExpression",
"pos": [
671.926902524347,
1173.415179658996
],
"size": [
210,
128
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"label": "a",
"name": "values.a",
"type": "FLOAT,INT,BOOLEAN",
"link": 139
},
{
"label": "b",
"name": "values.b",
"shape": 7,
"type": "FLOAT,INT,BOOLEAN",
"link": null
}
],
"outputs": [
{
"name": "FLOAT",
"type": "FLOAT",
"links": null
},
{
"name": "INT",
"type": "INT",
"links": [
122
]
},
{
"name": "BOOL",
"type": "BOOLEAN",
"links": null
}
],
"properties": {
"Node name for S&R": "ComfyMathExpression"
},
"widgets_values": [
"a * 4"
]
},
{
"id": 61,
"type": "KSampler",
"pos": [
1281.7976510179856,
674.4492063356142
],
"size": [
315,
262
],
"flags": {},
"order": 18,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 124
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 113
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 107
},
{
"name": "latent_image",
"type": "LATENT",
"link": 108
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
102
]
}
],
"properties": {
"Node name for S&R": "KSampler",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
1234,
"fixed",
4,
1,
"lcm",
"simple",
1
]
},
{
"id": 77,
"type": "ModelSamplingSD3",
"pos": [
647.074781051246,
326.3776092603588
],
"size": [
234.02274434806372,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 125
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
126
]
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
4
]
},
{
"id": 62,
"type": "VAELoader",
"pos": [
1360.9976510179852,
555.9717618776077
],
"size": [
235.80000000000018,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
103
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pixel_space"
]
},
{
"id": 82,
"type": "ResizeImageMaskNode",
"pos": [
-187.68646898516153,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 134
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
135
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale total pixels",
1,
"nearest-exact"
]
},
{
"id": 80,
"type": "VAEEncode",
"pos": [
407.88598227132763,
1095.0166207447435
],
"size": [
170.05260120738637,
46
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 136
},
{
"name": "vae",
"type": "VAE",
"link": 132
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
133
]
}
],
"properties": {
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 58,
"type": "CLIPTextEncode",
"pos": [
540.3078029573284,
910.6321261399594
],
"size": [
419.26959228515625,
107.08506774902344
],
"flags": {
"collapsed": true
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 104
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
107
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
""
]
},
{
"id": 84,
"type": "GetImageSize",
"pos": [
409.91468906546555,
1197.5924621383035
],
"size": [
210,
136
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 137
}
],
"outputs": [
{
"name": "width",
"type": "INT",
"links": [
138
]
},
{
"name": "height",
"type": "INT",
"links": [
139
]
},
{
"name": "batch_size",
"type": "INT",
"links": null
}
],
"properties": {
"Node name for S&R": "GetImageSize"
},
"widgets_values": []
},
{
"id": 65,
"type": "SaveImage",
"pos": [
1826.4226403881014,
674.4492063356142
],
"size": [
644.1825674446068,
806.9942591157356
],
"flags": {},
"order": 20,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 110
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.75"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 59,
"type": "UNETLoader",
"pos": [
313.05994271638855,
326.3776092603588
],
"size": [
305.3782043457031,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
125
]
}
],
"properties": {
"Node name for S&R": "UNETLoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"pid_flux1_1024_to_4096_4step_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 81,
"type": "VAELoader",
"pos": [
85.79355151979729,
989.5464055577053
],
"size": [
287.64071438371656,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"slot_index": 0,
"links": [
132
]
}
],
"properties": {
"Node name for S&R": "VAELoader",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 57,
"type": "VAEDecode",
"pos": [
1635.4756905687632,
674.4492063356142
],
"size": [
157.56002807617188,
46
],
"flags": {},
"order": 19,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 102
},
{
"name": "vae",
"type": "VAE",
"link": 103
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
110
]
}
],
"properties": {
"Node name for S&R": "VAEDecode",
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": []
},
{
"id": 76,
"type": "ContextWindowsManual",
"pos": [
915.0561692480533,
326.3776092603588
],
"size": [
299.2096226990984,
298
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 126
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
124
]
}
],
"properties": {
"Node name for S&R": "ContextWindowsManual"
},
"widgets_values": [
1536,
384,
"standard_static",
1,
false,
"pyramid",
2,
false,
"",
false,
false
]
},
{
"id": 79,
"type": "LoadImage",
"pos": [
-532.4361549440936,
1095.0166207447435
],
"size": [
316.7987915039063,
467.0000366210936
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
134
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"ComfyUI_00091_.png",
"image"
]
},
{
"id": 83,
"type": "ResizeImageMaskNode",
"pos": [
106.84934568668905,
1095.0166207447435
],
"size": [
266.5849202168248,
106
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "input",
"type": "IMAGE,MASK",
"link": 135
}
],
"outputs": [
{
"name": "resized",
"type": "IMAGE",
"links": [
136,
137
]
}
],
"properties": {
"Node name for S&R": "ResizeImageMaskNode"
},
"widgets_values": [
"scale to multiple",
16,
"nearest-exact"
]
},
{
"id": 67,
"type": "PiDConditioning",
"pos": [
944.265791947152,
694.9903029515484
],
"size": [
270,
102
],
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 112
},
{
"name": "latent",
"type": "LATENT",
"link": 133
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
113
]
}
],
"properties": {
"Node name for S&R": "PiDConditioning"
},
"widgets_values": [
"flux",
0
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 89,
"type": "MarkdownNote",
"pos": [
-135.8324089797664,
326.3776092603588
],
"size": [
413.71462239515324,
313.08611572179626
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n\n* diffusion_models\n\n * [pid_flux1_512_to_2048_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_512_to_2048_4step_bf16.safetensors) (2.72 GB)\n * [pid_flux1_1024_to_4096_4step_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/diffusion_models/pid_flux1_1024_to_4096_4step_bf16.safetensors) (2.72 GB)\n\n* text_encoders\n\n * [gemma_2_2b_it_elm_bf16.safetensors](https://huggingface.co/Comfy-Org/PixelDiT/blob/main/text_encoders/gemma_2_2b_it_elm_bf16.safetensors) (5.23 GB)\n\n* vae\n\n * [ae.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/vae/ae.safetensors) (335 MB)\n\n\n```text\n📂ComfyUI/\n└── 📂models/\n ├── 📂diffusion_models/\n │ ├── pid_flux1_512_to_2048_4step_bf16.safetensors\n │ └── pid_flux1_1024_to_4096_4step_bf16.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_it_elm_bf16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 64,
"type": "EmptyChromaRadianceLatentImage",
"pos": [
914.3139120643391,
987.6451858178366
],
"size": [
300.8609375,
106
],
"flags": {},
"order": 17,
"mode": 0,
"inputs": [
{
"name": "width",
"type": "INT",
"widget": {
"name": "width"
},
"link": 121
},
{
"name": "height",
"type": "INT",
"widget": {
"name": "height"
},
"link": 122
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
108
]
}
],
"properties": {
"Node name for S&R": "EmptyChromaRadianceLatentImage"
},
"widgets_values": [
896,
1152,
1
]
}
],
"links": [
[
102,
61,
0,
57,
0,
"LATENT"
],
[
103,
62,
0,
57,
1,
"VAE"
],
[
104,
60,
0,
58,
0,
"CLIP"
],
[
107,
58,
0,
61,
2,
"CONDITIONING"
],
[
108,
64,
0,
61,
3,
"LATENT"
],
[
109,
60,
0,
63,
0,
"CLIP"
],
[
110,
57,
0,
65,
0,
"IMAGE"
],
[
112,
63,
0,
67,
0,
"CONDITIONING"
],
[
113,
67,
0,
61,
1,
"CONDITIONING"
],
[
121,
70,
1,
64,
0,
"INT"
],
[
122,
74,
1,
64,
1,
"INT"
],
[
124,
76,
0,
61,
0,
"MODEL"
],
[
125,
59,
0,
77,
0,
"MODEL"
],
[
126,
77,
0,
76,
0,
"MODEL"
],
[
132,
81,
0,
80,
1,
"VAE"
],
[
133,
80,
0,
67,
1,
"LATENT"
],
[
134,
79,
0,
82,
0,
"IMAGE"
],
[
135,
82,
0,
83,
0,
"IMAGE"
],
[
136,
83,
0,
80,
0,
"IMAGE"
],
[
137,
83,
0,
84,
0,
"IMAGE"
],
[
138,
84,
0,
70,
0,
"INT"
],
[
139,
84,
1,
74,
0,
"INT"
]
],
"groups": [
{
"id": 2,
"title": "Pid_1024→4096",
"bounding": [
-554.3908824317602,
238.6952183841798,
3081.2835698845024,
1345.6484688397118
],
"color": "#8A8",
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 0.5209868481924432,
"offset": [
749.0102116191479,
6.514142817800455
]
},
"frontendVersion": "1.45.15",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 将输入图像 resize 到约 1M 像素,并让尺寸成为 16 的倍数
- 取得 resize 后的高和宽,把它们乘以 4,作为 PiD 侧的输出尺寸
每个 PiD 模型对应的 VAE 不同,因此需要用和 PiD 模型匹配的 VAE 来 Encode。
可能会想使用新的 Flux.2 VAE,但颜色会变化很大。这里使用更稳定的 Flux.1 用 PiD + ae.safetensors 组合。
- ae.safetensors (335 MB)
📂ComfyUI/
└── 📂models/
└── 📂vae/
└── ae.safetensors
本质上做的是重新描绘,所以与其说是 upscaler,不如说是 enhance。
不太适合需要忠实再现的用途。