image2image

什么是 image2image？

image2image 是 将参考图像作为草稿，在其上画图 的方法。

虽说是作为草稿，如果完美地描图了那就只是复印。没有任何独创性。

因此，在添加能知道原图程度的噪声后，通过去除噪声，适度保留原图的构图和氛围，让它画出符合提示词的别版本的画吧。

image2image 的机制

在这里再次复习一下扩散模型和 Sampling。
在 ComfyUI 中，KSampler 首先用噪声填满“空的 latent”，通过从中一点点去除噪声来生成图像。

在 image2image 中，将这个“空的 latent”替换为 编码了参考图像的 latent。然后，通过 start_at_step 调整 从哪个时间点开始增加噪声。

那么，让我们来看看在 steps: 20 的 KSampler (Advanced) 中改变 start_at_step 时的样子。

start_at_step: 0

从一开始就被噪声填满。
完全看不见草稿图像。几乎和通常的 text2image 一样。
※仅限 Stable Diffusion 1.5 举动稍微有点不同。
→ denoise 1.0 时的 image2image 和 text2image

start_at_step: 1

从前进了 1 step 的位置开始。
因此，添加到草稿的噪声量（＝接下来要去除的噪声量）稍微减少。
虽说如此，还几乎看不见草稿图像。

start_at_step: 9

添加到草稿的噪声量（＝接下来要去除的噪声量）相当减少。
草稿的轮廓和构图，残留到了能直接明白的程度

start_at_step: 20

既然指定在 20 步中的最后一步开始，实质上和“什么都不做”一样。
也就是说，实际上一切采样都不进行，也不添加噪声。
因此，输入的图像被原样输出。

像这样，将 start_at_step 设定在 1 ~ (steps - 1) 的某处，就变成了保留原画的同时进行采样的状态。

把这称为 image2image。

KSampler (Advanced) 的工作流

SD1.5_image2image_KSampler_(Advanced).json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 15,
  "last_link_id": 32,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        186
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 28
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            12
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 10,
      "type": "VAELoader",
      "pos": [
        464.1892561983473,
        736.7997591425777
      ],
      "size": [
        210,
        58
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            10,
            30
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 13,
      "type": "LoadImage",
      "pos": [
        145.97903082644623,
        611.5931484814206
      ],
      "size": [
        272.2618963068182,
        377.6363636363636
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            18
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "vivi (1).png",
        "image"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        186
      ],
      "size": [
        354.2876035004722,
        433.23967321788405
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            11
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "high quality, cute clay figure of a small humanoid character with long pink hair, yellow curved horns, purple boots, simple flat colors, minimal facial features, soft studio lighting, clean background"
      ]
    },
    {
      "id": 12,
      "type": "VAEEncode",
      "pos": [
        685.9517580991734,
        611.5931484814206
      ],
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 30
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            32
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": [],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 11,
      "type": "KSamplerAdvanced",
      "pos": [
        867.0434936363629,
        186
      ],
      "size": [
        306.34804687500014,
        334
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 14
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 11
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 12
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 32
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            28
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSamplerAdvanced"
      },
      "widgets_values": [
        "enable",
        123,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        6,
        20,
        "enable"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        38.43636363636362,
        363.0864500000007
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            14
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      10,
      0,
      8,
      1,
      "VAE"
    ],
    [
      11,
      6,
      0,
      11,
      1,
      "CONDITIONING"
    ],
    [
      12,
      7,
      0,
      11,
      2,
      "CONDITIONING"
    ],
    [
      14,
      4,
      0,
      11,
      0,
      "MODEL"
    ],
    [
      18,
      13,
      0,
      12,
      0,
      "IMAGE"
    ],
    [
      28,
      11,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      30,
      10,
      0,
      12,
      1,
      "VAE"
    ],
    [
      32,
      12,
      0,
      11,
      3,
      "LATENT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.7513148009015777,
      "offset": [
        61.56363636363638,
        -86
      ]
    },
    "frontendVersion": "1.34.5",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟩 在 VAE Encode 节点，将图像转换为 latent。
🟨 更改 start_at_step 的值，尝试各种保留多少原图。

KSampler 的工作流

用无印 KSampler，当然也可以做 image2image。
但是，“用哪个旋钮决定原图的残留情况”，和 KSampler (Advanced) 相当不同。

SD1.5_image2image_KSampler.json

{
  "id": "8b9f7796-0873-4025-be3c-0f997f67f866",
  "revision": 0,
  "last_node_id": 16,
  "last_link_id": 39,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        186
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 39
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 10
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            9
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 10,
      "type": "VAELoader",
      "pos": [
        464.1892561983473,
        736.7997591425777
      ],
      "size": [
        210,
        58
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            10,
            30
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "vae-ft-mse-840000-ema-pruned.safetensors"
      ]
    },
    {
      "id": 13,
      "type": "LoadImage",
      "pos": [
        145.97903082644623,
        611.5931484814206
      ],
      "size": [
        272.2618963068182,
        377.6363636363636
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            18
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "vivi (1).png",
        "image"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1451,
        186
      ],
      "size": [
        354.2876035004722,
        433.23967321788405
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "outputs": [],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33"
      },
      "widgets_values": [
        "ComfyUI"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        411.95503173828126,
        151.0030493164063
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            35
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "high quality, cute clay figure of a small humanoid character with long pink hair, yellow curved horns, purple boots, simple flat colors, minimal facial features, soft studio lighting, clean background"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        416.1970166015625,
        392.37848510742185
      ],
      "size": [
        410.75801513671877,
        158.82607910156253
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "slot_index": 0,
          "links": [
            36
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 12,
      "type": "VAEEncode",
      "pos": [
        685.9517580991734,
        611.5931484814206
      ],
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 30
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            37
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "VAEEncode"
      },
      "widgets_values": [],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        38.43636363636362,
        363.0864500000007
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "slot_index": 0,
          "links": [
            38
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "slot_index": 1,
          "links": [
            3,
            5
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "slot_index": 2,
          "links": []
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.33",
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "v1-5-pruned-emaonly-fp16.safetensors"
      ]
    },
    {
      "id": 16,
      "type": "KSampler",
      "pos": [
        871.9451695085444,
        186
      ],
      "size": [
        301.7355371900828,
        262
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 38
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 35
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 36
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 37
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            39
          ]
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.76",
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        123,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        0.7
      ],
      "color": "#323",
      "bgcolor": "#535"
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      10,
      10,
      0,
      8,
      1,
      "VAE"
    ],
    [
      18,
      13,
      0,
      12,
      0,
      "IMAGE"
    ],
    [
      30,
      10,
      0,
      12,
      1,
      "VAE"
    ],
    [
      35,
      6,
      0,
      16,
      1,
      "CONDITIONING"
    ],
    [
      36,
      7,
      0,
      16,
      2,
      "CONDITIONING"
    ],
    [
      37,
      12,
      0,
      16,
      3,
      "LATENT"
    ],
    [
      38,
      4,
      0,
      16,
      0,
      "MODEL"
    ],
    [
      39,
      16,
      0,
      8,
      0,
      "LATENT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.9090909090909091,
      "offset": [
        61.56363636363638,
        -86
      ]
    },
    "frontendVersion": "1.34.5",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}

🟪 更改 denoise 的值，设定保留多少原图。
- 1.0 在完全用噪声填满。也就是说和 text2image 一样。
- 0.0 则完全不添加噪声，所以原图被原样输出。

无印和 Advanced 的区别

在这里，试着和 KSampler (Advanced) 对比一下。

想做的事情本身是一样的，两者都是调整 “给原图添加多少噪声后，去除多少”。

只是，因为旋钮的分配方法不同，稍微有点混乱。让我们来看看在似乎会变成相同结果的设置下各自的举动。

KSampler (Advanced)

例如设为 steps: 20, start_at_step: 4 的话，
只执行“全部 20 步中的第 4 步到第 20 步”。
实际采样的次数是 20 - 4 = 16 次。

无印 KSampler

同样设为 steps: 20，如果设定 denoise: 0.8 等，外观上的“噪声施加方式”会变近，但 采样次数仍是 20 次。
即使把 denoise 的值变为 0.5 或 0.1，也还是采样 20 次。

Advanced
- steps 是“整体的步数”，只执行 start_at_step 以后 → 执行次数变化
无印
- steps 是“实际的执行次数”，denoise 只改变噪声的强度 → 执行次数不变

如果，想在无印 KSampler 中变成 Advanced 那样“相近的噪声施加方式”的话，以下的公式大概是个标准。（不会完全一致）

设定的 step 数 ≒ 整体的 step 数 * denoise

没必要特别在意

虽然说明得这么详细，但本来两者都只是决定 “给原图加多少噪声”。

如果混合使用无印 KSampler 和 Advanced 需要注意，但没有组那种工作流的人，所以没必要在意。

只要知道更改哪个参数，原图会保留多少程度就 OK 了。

denoise 1.0 时的 image2image 和 text2image

denoise: 1.0 时，因为用噪声完全填满了原图，所以在机制上 image2image 和使用了 Empty Latent Image 节点的 text2image 应该是一样的。

但是，Stable Diffusion 1.5 的话不会变得一样。（虽然我觉得是实现的差异，但不理解所以不知道。）
另一方面，最近的模型 (Flux 等)，会变成完全一样的图像。

Stable Diffusion 1.5 作为特殊的例子，在本站，将按本来的设计 “denoise 1.0 的 image2image 和 text2image 是同样的东西” 来处理。

image2image

什么是 image2image？

image2image 的机制

KSampler (Advanced) 的工作流

KSampler 的工作流

无印和 Advanced 的区别

没必要特别在意

denoise 1.0 时的 image2image 和 text2image

样本图像

什么是 JSON 复制按钮？

这个页面有问题！

请补充讲解！

感想 / 其他

感谢！