什么是 Textual Inversion?
即使想让它生成某个图像,很多时候也很难用文本很好地进行说明。
Textual Inversion 是将这种“难以用文本表现的外观和概念”,作为新单词让其记住的技术。
- 首先,准备 1 个像
<my_keyword>这样的虚拟单词。 - 将那个单词和几张〜几十张氛围相似的图像作为一套展示给模型。
- 模型学习“这些图像共通的特征”,并将那个信息埋入
<my_keyword>这 1 个词中。
著名的例子有 easynegative 和 badhandv4 等。
通过收集大量的“失败图像”让其学习,将“图像生成中常发生的失败”的概念总结为一句话。
但是,这无法让原本的模型从零画出完全不会画的东西。
终究只是 模型原本知道,但不知道该用什么指令表达的东西。
那种情况下,需要 LoRA 或全量微调等,重新学习模型本体的手法。
由 Textual Inversion 制作的这 1 个词份的数据,习惯上被称为 embedding。
几乎不再被使用
Textual Inversion 虽然有学习轻便的优点,但现在几乎被 LoRA 取代了。
easynegative 和 badhandv4 等,也因为模型本身的性能提高了,基本不需要了。
例外的是,Checkpoint 或 LoRA 的作者,为了接近预想的输出,有时会一起分发专用的 embeddings。
为了引出模型的性能,使用者需要正确输入“作者预想的提示词”。但是,不限于所有使用者都会遵从那个,每次写细致的提示词也很麻烦。
于是,预先准备 embeddings,通过让使用者加入那个单词,来保证最低品质。
应用了 embedding 的 text2image
embedding 的下载
试着使用名为 Porsche 911 Turbo 的 embedding 吧。
- Porsche 911 Turbo
-
📂ComfyUI/ └── 📂models/ └── 📂embeddings/ └── porsche911_ti.pt
工作流

{
"id": "8b9f7796-0873-4025-be3c-0f997f67f866",
"revision": 0,
"last_node_id": 10,
"last_link_id": 10,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1209,
188
],
"size": [
210,
46
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 7
},
{
"name": "vae",
"type": "VAE",
"link": 10
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
9
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 9,
"type": "SaveImage",
"pos": [
1451,
189
],
"size": [
354.2876035004722,
433.23967321788405
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 9
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
416.1970166015625,
392.37848510742185
],
"size": [
410.75801513671877,
158.82607910156253
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
6
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text, watermark"
]
},
{
"id": 5,
"type": "EmptyLatentImage",
"pos": [
582.1350317382813,
606.5799999999999
],
"size": [
244.81999999999994,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
2
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
512,
512,
1
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
186
],
"size": [
315,
262
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 1
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 4
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 6
},
{
"name": "latent_image",
"type": "LATENT",
"link": 2
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
7
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1234,
"fixed",
20,
8,
"euler",
"normal",
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
411.95503173828126,
151.0030493164063
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 3
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
4
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"embedding:porsche911_ti"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 10,
"type": "VAELoader",
"pos": [
896.9256198347109,
73.49066823348679
],
"size": [
281.0743801652891,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
10
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"vae-ft-mse-840000-ema-pruned.safetensors"
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": [
38.43636363636362,
363.0864500000007
],
"size": [
315,
98
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"slot_index": 0,
"links": [
1
]
},
{
"name": "CLIP",
"type": "CLIP",
"slot_index": 1,
"links": [
3,
5
]
},
{
"name": "VAE",
"type": "VAE",
"slot_index": 2,
"links": []
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.33",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"v1-5-pruned-emaonly-fp16.safetensors"
]
}
],
"links": [
[
1,
4,
0,
3,
0,
"MODEL"
],
[
2,
5,
0,
3,
3,
"LATENT"
],
[
3,
4,
1,
6,
0,
"CLIP"
],
[
4,
6,
0,
3,
1,
"CONDITIONING"
],
[
5,
4,
1,
7,
0,
"CLIP"
],
[
6,
7,
0,
3,
2,
"CONDITIONING"
],
[
7,
3,
0,
8,
0,
"LATENT"
],
[
9,
8,
0,
9,
0,
"IMAGE"
],
[
10,
10,
0,
8,
1,
"VAE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917354,
"offset": [
120.85363636363637,
103.94933176651321
]
},
"frontendVersion": "1.34.6",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 在 CLIP Text Encode 中写入
embedding:文件名这样来调用 embedding。- e.g.
embedding:porsche911_ti
- e.g.