Textual Inversion

What is Textual Inversion?

It is common to want to generate an image but not be able to explain it well in text.

Textual Inversion is a technology that allows the model to learn such "appearances or concepts that are difficult to express in text" as new words.

First, prepare one dummy word like <my_keyword>.
Show the model a few to several dozen images with a similar atmosphere along with that word.
The model learns the "features common to these images" and embeds that information into the single word <my_keyword>.

Famous examples include easynegative and badhandv4. By collecting a large number of "failed images" and having the model learn them, the concept of "failures that often occur in image generation" is summarized in a single word.

However, it cannot make the model draw something from scratch that the original model cannot draw at all. It is only for things that the model originally knew but you didn't know how to instruct.

In such cases, techniques that re-train the model itself, such as LoRA or full fine-tuning, are required.

This one-word data created by Textual Inversion is conventionally called an embedding.

Almost No Longer Used

Although Textual Inversion has the advantage of being light to train, it has now been mostly replaced by LoRA.

easynegative and badhandv4 are also basically unnecessary as the performance of the models themselves has improved.

Exceptionally, checkpoint or LoRA authors sometimes distribute dedicated embeddings together to bring the output closer to what they intend.

To bring out the performance of the model, the user needs to correctly input the "prompt intended by the author". However, not all users will follow it, and it is troublesome to write detailed prompts every time.

So, by preparing embeddings in advance and asking users to include that word, they guarantee a minimum quality.

text2image Applying Embedding

Downloading the Embedding

Let's use an embedding called Porsche 911 Turbo.

Porsche 911 Turbo

 📂ComfyUI/
   └── 📂models/
       └── 📂embeddings/
           └── porsche911_ti.pt

workflow

SD1.5_embedding.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 10,
  "last_link_id": 10,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        189
      ],
      "size": [
        354.2876035004722,
        433.23967321788405
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        582.1350317382813,
        606.5799999999999
      ],
      "size": [
        244.81999999999994,
        106
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        512,
        1
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 1
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        1234,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            4
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "embedding:porsche911_ti"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 10,
      "type": "VAELoader",
      "pos": [
        896.9256198347109,
        73.49066823348679
      ],
      "size": [
        281.0743801652891,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            10
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        38.43636363636362,
        363.0864500000007
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    }
  ],
  "links": [
    [
      1,
      4,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      10,
      0,
      8,
      1,
      "VAE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.8264462809917354,
      "offset": [
        120.85363636363637,
        103.94933176651321
      ]
    },
    "frontendVersion": "1.34.6",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

Call the embedding by writing something like embedding:filename in CLIP Text Encode.
- e.g. embedding:porsche911_ti

Textual Inversion

What is Textual Inversion?

Almost No Longer Used

text2image Applying Embedding

Downloading the Embedding

workflow

What is the JSON copy button?

This page has an issue!

Please explain more!

Feedback / Other

Thank you