What is Frame Interpolation?
Video Frame Interpolation (VFI) is a technology to insert new frames between frames of a video to make the movement look smooth.
It has been used for quite a long time for purposes such as smoothing out jerky old videos or compensating for fps dropped in slow motion.
Also, with the advent of video generation AI, Generative Frame Interpolation, a technology beyond simple FPS interpolation, has also been born.
Frame Interpolation to Increase FPS (Classical VFI)
General VFI receives two temporally close frames (less than 0.1 seconds) and generates one or more "intermediate frames" sandwiched between them. By repeating this, the number of frames in the entire video is increased.

{
"last_node_id": 11,
"last_link_id": 17,
"nodes": [
{
"id": 8,
"type": "GMFSS Fortuna VFI",
"pos": [
485,
110
],
"size": [
335.5210876464844,
126
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "frames",
"type": "IMAGE",
"link": 10
},
{
"name": "optional_interpolation_states",
"type": "INTERPOLATION_STATES",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
16
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "GMFSS Fortuna VFI"
},
"widgets_values": [
"GMFSS_fortuna_union",
10,
2
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 4,
"type": "VHS_VideoCombine",
"pos": [
865,
110
],
"size": [
590,
612
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 16
},
{
"name": "audio",
"type": "VHS_AUDIO",
"link": null
},
{
"name": "batch_manager",
"type": "VHS_BatchManager",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine"
},
"widgets_values": {
"frame_rate": 24,
"loop_count": 0,
"filename_prefix": "AnimateDiff",
"format": "image/gif",
"pingpong": false,
"save_output": false,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "AnimateDiff_00018.gif",
"subfolder": "",
"type": "temp",
"format": "image/gif"
}
}
}
},
{
"id": 7,
"type": "VHS_LoadVideo",
"pos": [
85,
110
],
"size": [
356.6381284713742,
480.4254189809161
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [
{
"name": "batch_manager",
"type": "VHS_BatchManager",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
10
],
"shape": 3,
"slot_index": 0
},
{
"name": "frame_count",
"type": "INT",
"links": null,
"shape": 3
},
{
"name": "audio",
"type": "VHS_AUDIO",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "VHS_LoadVideo"
},
"widgets_values": {
"video": "94aefb41d8b4b1d032a8457d5811c129.gif",
"force_rate": 0,
"force_size": "Disabled",
"custom_width": 512,
"custom_height": 512,
"frame_load_cap": 0,
"skip_first_frames": 0,
"select_every_nth": 1,
"choose video to upload": "image",
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"frame_load_cap": 0,
"skip_first_frames": 0,
"force_rate": 0,
"filename": "94aefb41d8b4b1d032a8457d5811c129.gif",
"type": "input",
"format": "image/gif",
"select_every_nth": 1
}
}
}
}
],
"links": [
[
10,
7,
0,
8,
0,
"IMAGE"
],
[
16,
8,
0,
4,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"0246.VERSION": [
0,
0,
4
]
},
"version": 0.4
}
Various interpolation methods exist, such as FILM and GMFSS.
Generative interpolation (FLF2V)
Conventional frame interpolation connected "adjacent frames with almost no change".
Recently, going a step further, technologies of the type that fill the gap between frames separated by more than 1 second with the power of video generation models have appeared.

{
"last_node_id": 39,
"last_link_id": 40,
"nodes": [
{
"id": 37,
"type": "LoadImage",
"pos": {
"0": 60,
"1": 940
},
"size": [
315,
314
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
37
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"0186.png",
"image"
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": {
"0": 680,
"1": 480
},
"size": [
210,
76
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 5
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
2
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 38,
"type": "VHS_VideoCombine",
"pos": {
"0": 1550,
"1": 330
},
"size": [
676.74560546875,
570.2796020507812
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 36
},
{
"name": "audio",
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "VHS_VideoCombine"
},
"widgets_values": {
"frame_rate": 8,
"loop_count": 0,
"filename_prefix": "AnimateDiff",
"format": "video/h265-mp4",
"pix_fmt": "yuv420p10le",
"crf": 22,
"save_metadata": true,
"pingpong": false,
"save_output": false,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "AnimateDiff_00006.mp4",
"subfolder": "",
"type": "temp",
"format": "video/h265-mp4",
"frame_rate": 8
},
"muted": false
}
}
},
{
"id": 11,
"type": "DownloadAndLoadDynamiCrafterModel",
"pos": {
"0": 524.5999755859375,
"1": 50
},
"size": {
"0": 365.4000244140625,
"1": 106
},
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "DynCraft_model",
"type": "DCMODEL",
"links": [
6,
13
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadDynamiCrafterModel"
},
"widgets_values": [
"tooncrafter_512_interp-pruned-fp16.safetensors",
"auto",
true
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 13,
"type": "DownloadAndLoadCLIPVisionModel",
"pos": {
"0": 562.4000244140625,
"1": 220
},
"size": {
"0": 327.5999755859375,
"1": 58
},
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"links": [
8
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadCLIPVisionModel"
},
"widgets_values": [
"CLIP-ViT-H-fp16.safetensors"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 10,
"type": "DownloadAndLoadCLIPModel",
"pos": {
"0": 320,
"1": 420
},
"size": [
309.88747670016573,
58
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "clip",
"type": "CLIP",
"links": [
4,
5
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadCLIPModel"
},
"widgets_values": [
"stable-diffusion-2-1-clip-fp16.safetensors"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 5,
"type": "ToonCrafterInterpolation",
"pos": {
"0": 970,
"1": 330
},
"size": {
"0": 315,
"1": 418
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "DCMODEL",
"link": 6
},
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 8
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 1
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 2
},
{
"name": "images",
"type": "IMAGE",
"link": 39
},
{
"name": "optional_latents",
"type": "LATENT",
"link": null
},
{
"name": "controlnet",
"type": "DC_CONTROL",
"link": null
}
],
"outputs": [
{
"name": "samples",
"type": "LATENT",
"links": [
12
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "ToonCrafterInterpolation"
},
"widgets_values": [
20,
7,
1,
16,
1235,
"fixed",
10,
"auto",
1,
0,
1000
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": {
"0": 680,
"1": 350
},
"size": [
210,
76
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 4
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
1
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
""
]
},
{
"id": 16,
"type": "ToonCrafterDecode",
"pos": {
"0": 1306,
"1": 331
},
"size": {
"0": 216.8146514892578,
"1": 102
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "DCMODEL",
"link": 13
},
{
"name": "latent",
"type": "LATENT",
"link": 12
}
],
"outputs": [
{
"name": "images",
"type": "IMAGE",
"links": [
36
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "ToonCrafterDecode"
},
"widgets_values": [
"auto",
false
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 19,
"type": "ImageBatch",
"pos": {
"0": 420,
"1": 820
},
"size": [
140,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "image1",
"type": "IMAGE",
"link": 40
},
{
"name": "image2",
"type": "IMAGE",
"link": 37
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
38
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "ImageBatch"
},
"color": "#323",
"bgcolor": "#535"
},
{
"id": 15,
"type": "ImageResize",
"pos": {
"0": 580,
"1": 820
},
"size": {
"0": 315,
"1": 246
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 38
},
{
"name": "mask_optional",
"type": "MASK",
"link": null
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
39
],
"slot_index": 0,
"shape": 3
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "ImageResize"
},
"widgets_values": [
"resize only",
0,
512,
0,
"reduce size only",
"4:3",
0.5,
20
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 36,
"type": "LoadImage",
"pos": {
"0": 60,
"1": 570
},
"size": [
315,
314
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
40
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"0170.png",
"image"
]
}
],
"links": [
[
1,
6,
0,
5,
2,
"CONDITIONING"
],
[
2,
7,
0,
5,
3,
"CONDITIONING"
],
[
4,
10,
0,
6,
0,
"CLIP"
],
[
5,
10,
0,
7,
0,
"CLIP"
],
[
6,
11,
0,
5,
0,
"DCMODEL"
],
[
8,
13,
0,
5,
1,
"CLIP_VISION"
],
[
12,
5,
0,
16,
1,
"LATENT"
],
[
13,
11,
0,
16,
0,
"DCMODEL"
],
[
36,
16,
0,
38,
0,
"IMAGE"
],
[
37,
37,
0,
19,
1,
"IMAGE"
],
[
38,
19,
0,
15,
0,
"IMAGE"
],
[
39,
15,
0,
5,
4,
"IMAGE"
],
[
40,
36,
0,
19,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.601314800901579,
"offset": [
132.14306296953706,
120.78753938381911
]
}
},
"version": 0.4
}
If you pass two images, it connects them while creating "movement with a story" between them.
It is not simple linear interpolation, but since the AI also creates "what happens in the middle" to some extent, it approaches "a video with a short story" rather than morphing.
ToonCrafter is an early model of this lineage, but every time a new video model comes out, an FLF2V model that is orders of magnitude more natural comes out, so there is almost no point in using it now.
Extension
Frame interpolation up to this point was "processing each adjacent pair independently". Even if there are 3 or more input frames, each was just repeating frame interpolation of 2 frames as follows.
- Fill between 1st and 2nd frames...
- Fill between 2nd and 3rd frames...
- Fill between 3rd and 4th frames...
VACE's Extension has evolved one step from here.
While conventional VFI "looks only between adjacent two frames", Extension places multiple keyframes for one entire video and connects the entire interval on the generation model side.
For example, let's say you generate a video of 81 frames. Insert "keyframes" into some of those frames. The model generates the video so that the keyframes are connected naturally within the same time axis.

Compared to FLF2V, a much more natural video is generated. Probably, technologies like Extension will become mainstream in the future.