什么是 text2image?
输入文本提示词,生成图像。
从本质上讲,是通过 文本提示词这一条件 来控制扩散模型。
这是接下来制作的所有工作流的基础。让我们慢慢来看。
图像生成 AI 的机制
虽然非常简单,但在这里有解说。
完全不了解图像生成 AI 机制的人,请简单浏览一下。应该会更能理解参数的含义。
模型的下载
我们将使用作为一切开端的 Stable Diffusion 1.5 进行解说。
📂ComfyUI/
└── 📂models/
└── 📂checkpoints/
└── v1-5-pruned-emaonly-fp16.safetensors
工作流
{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 9,
"last_link_id": 9,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 8
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
186
],
"size": [
315,
262
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
354.2876035004722,
433.23967321788405
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
411.95503173828126,
151.0030493164063
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"beautiful scenery nature glass bottle landscape, , purple galaxy bottle,"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
416.1970166015625,
392.37848510742185
],
"size": [
410.75801513671877,
158.82607910156253
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
582.1350317382813,
606.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
512,
1
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
38.10000000000001,
363.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": [
8
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"v1-5-pruned-emaonly-fp16.safetensors"
]
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
8,
4,
2,
8,
1,
"VAE"
],
[
9,
8,
0,
9,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.7513148009015777,
"offset": [
61.89999999999999,
-86
]
},
"frontendVersion": "1.34.6",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
关于各节点
更改 VAE
说实话,Stable Diffusion 1.5 的 VAE 性能并不是很好。如果使用微调模型,有时会生成颜色奇怪的图像。
之后,发布了改良版的 VAE。此外还发布了各种各样的 VAE,但 Stable Diffusion 1.5 的 VAE 只要使用这个几乎不会出问题。
VAE 的下载
📂ComfyUI/
└── 📂models/
└── 📂vae/
└── vae-ft-mse-840000-ema-pruned.safetensors
工作流
SD1.5_text2image_vae-ft-mse-840000.json
{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 10,
"last_link_id": 10,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 10
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
186
],
"size": [
315,
262
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
354.2876035004722,
433.23967321788405
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
411.95503173828126,
151.0030493164063
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"beautiful scenery nature glass bottle landscape, , purple galaxy bottle,"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
416.1970166015625,
392.37848510742185
],
"size": [
410.75801513671877,
158.82607910156253
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
582.1350317382813,
606.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
512,
1
]
},
{
"id": 10,
"type": "VAELoader",
"pos": [
896.9256198347109,
69.4815990090115
],
"size": [
281.0743801652891,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
10
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
38.10000000000001,
363.8900000000004
],
"size": [
315,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"v1-5-pruned-emaonly-fp16.safetensors"
]
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
10,
0,
8,
1,
"VAE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
61.89999999999999,
30.518400990988496
]
},
"frontendVersion": "1.34.6",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 🟥 添加
Load VAE 节点,选择刚才下载的 VAE。
今后的工作流将以这个为基础。
初学者容易受挫的疑问
虽然被理所当然地对待,但仔细一想,有一些是图像生成特有的、不可思议的东西。
关于这些在别的页面简单地进行了解说。
提示词的写法
在 Stable Diffuision 1.5 / SDXL 中使用的文本编码器 CLIP,即使恭维也不能说性能很好。因此,为了生成期望的图像,有着被称为提示词工程学或咒语的技巧。
标签罗列
因为 CLIP 其实 读不懂 文章,所以即使写成文章形式的提示词也没什么意义。
1girl, solo, upper body, looking at viewer, smile, outdoors, sunset
因此,像这样以单纯的 标签罗列 的形式书写提示词的情况很多。
另外,动漫系模型直接使用了名为 Danbooru 网站的图像・以及用于整理该图像的标签。因此,过去常做去查在 Danbooru 中使用的标签并直接使用。
质量咒语
(best quality, masterpiece, ultra detailed, 8k, HDR, sharp focus, highly detailed)
像这样,只顾着在最前面写 似乎能提高质量 的词。
现在想来不知道有没有意义,但因为不知道哪个词有多大效果,所以就一味地写了。
Negative Prompt
bad anatomy, extra fingers, extra limbs, blurry, lowres, jpeg artifacts, ...
与刚才相反,把似乎会降低质量的词一味地写在负面提示词中。这些又有多少效果呢……
关注度记法
通过将 (red:1.05) / (blue:0.9) 这样的数值设定给提示词的各单词,可以更改该单词的重要性。
CLIP 会更重视前面的文本,所以写在后半部分的文本几乎会被无视。而且,原本就有很生效的词,和不太生效的词。
为了手动调整这些,使用这个关注度记法。
但是,这能顺利进行的前提终究是 CLIP 理解那个词的时候。
给可能不仅知道的词加上 (Ghoti:999) 之类的也没意义。
- 将光标放在想改变关注度的词上,按
Ctrl + 箭头↑/↓,可以用 0.05 为单位进行调整。