image2image is a method of using a reference image as a draft and having a picture drawn over it.
Even if you use it as a draft, if you trace it perfectly, it's just a copy. It has no originality.
So, by adding noise to the extent that the original image is still recognizable, and then removing the noise, let's have it draw a different version of the picture that follows the prompt while inheriting the composition and atmosphere of the original image moderately.
Mechanism of image2image
Here is a review of diffusion models and Sampling again.
In ComfyUI, KSampler first fills an "empty latent" with noise, and generates an image by gradually removing noise from there.
In image2image, this "empty latent" is replaced with a latent encoded from the reference image. And you adjust from which point to start adding noise with start_at_step.
Now, let's see what happens when we change start_at_step with a KSampler (Advanced) of steps: 20.
start_at_step: 0
It is filled with noise from the beginning.
The draft image is not visible at all. It is almost the same as normal text2image.
🟩 Convert the image to latent with the VAE Encode node.
🟨 Try changing the value of start_at_step to see how much of the original image remains.
Workflow with KSampler
Of course, you can do image2image with the standard KSampler as well.
However, "which knob determines how much of the original image remains" is quite different from KSampler (Advanced).
🟪 set how much of the original image to leave by changing the value of denoise.
At 1.0, it fills completely with noise. In other words, it is the same as text2image.
At 0.0, no noise is added at all, so the original image is output as it is.
Difference between Standard and Advanced
Here, let's compare it with KSampler (Advanced).
What we want to do is the same, and both adjust "how much noise is added to the original image and then how much is removed".
However, since the assignment of knobs is different, it is a bit confusing. Let's look at the behavior of each with settings that seem to produce the same result.
KSampler (Advanced)
For example, if you set steps: 20, start_at_step: 4,
It executes only "from the 4th step to the 20th step of the total 20 steps".
The actual number of times sampled is 20 - 4 = 16 times.
Standard KSampler
Similarly, if you set steps: 20 and denoise: 0.8, the appearance of "how noise is applied" will be close, but the sampling count remains 20 times.
Even if you change the value of denoise to 0.5 or 0.1, it still samples 20 times.
Advanced
steps is "total number of steps", execute only after start_at_step → execution count changes
Standard
steps is "actual execution count", denoise changes only the strength of noise → execution count does not change
If you want to achieve "noise application close to Advanced" with Standard KSampler, the following formula gives a rough estimate. (It does not match perfectly)
Steps to set ≒ Total steps * denoise
You don't really need to worry about it
After explaining it so thoroughly, both determine "how much noise to add to the original image".
Care must be taken when mixing standard KSampler and Advanced, but since no one builds such a workflow, there is no need to worry.
It is OK if you know which parameter to change to leave how much of the original image.
image2image and text2image when denoise is 1.0
When denoise: 1.0, the original image is completely filled with noise, so mechanically image2image and text2image using the Empty Latent Image node should be the same.
But they are not the same in Stable Diffusion 1.5. (I think it's a difference in implementation, but I don't understand it so I don't know.)
On the other hand, in recent models (Flux etc.), they become exactly the same image.
Stable Diffusion 1.5 is a special case, and on this site, we treat "image2image with denoise 1.0 and text2image as the same thing" as originally designed.