フレーム補間

フレーム補間とは？

フレーム補間（Video Frame Interpolation / VFI） は、動画のフレームとフレームの間に新しいフレームを差し込み、動きをなめらかに見せるための技術です。

昔のカクカクした動画を滑らかにしたり、スローモーションで下がったfpsを補ったりする用途で、かなり昔から使われています。

また、動画生成AIの登場によって、ジェネレーティブフレーム補間 という、単なるFPS補間以上の技術も生まれています。

fpsを上げるためのフレーム補間（古典的VFI）

一般的なVFIは、時間的に近い2枚のフレーム（0.1秒未満程度）を受け取り、その間に挟まる「中間フレーム」を1枚以上生成します。これを繰り返すことで、動画全体のフレーム数を増やします。

VFI_GMFSS.json

{
  "last_node_id": 11,
  "last_link_id": 17,
  "nodes": [
    {
      "id": 8,
      "type": "GMFSS Fortuna VFI",
      "pos": [
        485,
        110
      ],
      "size": [
        335.5210876464844,
        126
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [
        {
          "name": "frames",
          "type": "IMAGE",
          "link": 10
        },
        {
          "name": "optional_interpolation_states",
          "type": "INTERPOLATION_STATES",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            16
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "GMFSS Fortuna VFI"
      },
      "widgets_values": [
        "GMFSS_fortuna_union",
        10,
        2
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 4,
      "type": "VHS_VideoCombine",
      "pos": [
        865,
        110
      ],
      "size": [
        590,
        612
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 16
        },
        {
          "name": "audio",
          "type": "VHS_AUDIO",
          "link": null
        },
        {
          "name": "batch_manager",
          "type": "VHS_BatchManager",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "Filenames",
          "type": "VHS_FILENAMES",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VHS_VideoCombine"
      },
      "widgets_values": {
        "frame_rate": 24,
        "loop_count": 0,
        "filename_prefix": "AnimateDiff",
        "format": "image/gif",
        "pingpong": false,
        "save_output": false,
        "videopreview": {
          "hidden": false,
          "paused": false,
          "params": {
            "filename": "AnimateDiff_00018.gif",
            "subfolder": "",
            "type": "temp",
            "format": "image/gif"
          }
        }
      }
    },
    {
      "id": 7,
      "type": "VHS_LoadVideo",
      "pos": [
        85,
        110
      ],
      "size": [
        356.6381284713742,
        480.4254189809161
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [
        {
          "name": "batch_manager",
          "type": "VHS_BatchManager",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            10
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "frame_count",
          "type": "INT",
          "links": null,
          "shape": 3
        },
        {
          "name": "audio",
          "type": "VHS_AUDIO",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VHS_LoadVideo"
      },
      "widgets_values": {
        "video": "94aefb41d8b4b1d032a8457d5811c129.gif",
        "force_rate": 0,
        "force_size": "Disabled",
        "custom_width": 512,
        "custom_height": 512,
        "frame_load_cap": 0,
        "skip_first_frames": 0,
        "select_every_nth": 1,
        "choose video to upload": "image",
        "videopreview": {
          "hidden": false,
          "paused": false,
          "params": {
            "frame_load_cap": 0,
            "skip_first_frames": 0,
            "force_rate": 0,
            "filename": "94aefb41d8b4b1d032a8457d5811c129.gif",
            "type": "input",
            "format": "image/gif",
            "select_every_nth": 1
          }
        }
      }
    }
  ],
  "links": [
    [
      10,
      7,
      0,
      8,
      0,
      "IMAGE"
    ],
    [
      16,
      8,
      0,
      4,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "0246.VERSION": [
      0,
      0,
      4
    ]
  },
  "version": 0.4
}

FILMやGMFSSなど、様々な補完手法が存在します。

Generative interpolation（FLF2V）

従来のフレーム補間は「ほとんど変化のない隣り合うフレーム同士」をつなぐものでした。

最近はそこから一歩進んで、1秒以上離れたフレームの間を、動画生成モデルの力で埋めるタイプの技術が登場しています。

tooncrafter_interp.json

{
  "last_node_id": 39,
  "last_link_id": 40,
  "nodes": [
    {
      "id": 37,
      "type": "LoadImage",
      "pos": {
        "0": 60,
        "1": 940
      },
      "size": [
        315,
        314
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            37
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "0186.png",
        "image"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": {
        "0": 680,
        "1": 480
      },
      "size": [
        210,
        76
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            2
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 38,
      "type": "VHS_VideoCombine",
      "pos": {
        "0": 1550,
        "1": 330
      },
      "size": [
        676.74560546875,
        570.2796020507812
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 36
        },
        {
          "name": "audio",
          "type": "AUDIO",
          "link": null
        },
        {
          "name": "meta_batch",
          "type": "VHS_BatchManager",
          "link": null
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "Filenames",
          "type": "VHS_FILENAMES",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VHS_VideoCombine"
      },
      "widgets_values": {
        "frame_rate": 8,
        "loop_count": 0,
        "filename_prefix": "AnimateDiff",
        "format": "video/h265-mp4",
        "pix_fmt": "yuv420p10le",
        "crf": 22,
        "save_metadata": true,
        "pingpong": false,
        "save_output": false,
        "videopreview": {
          "hidden": false,
          "paused": false,
          "params": {
            "filename": "AnimateDiff_00006.mp4",
            "subfolder": "",
            "type": "temp",
            "format": "video/h265-mp4",
            "frame_rate": 8
          },
          "muted": false
        }
      }
    },
    {
      "id": 11,
      "type": "DownloadAndLoadDynamiCrafterModel",
      "pos": {
        "0": 524.5999755859375,
        "1": 50
      },
      "size": {
        "0": 365.4000244140625,
        "1": 106
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "DynCraft_model",
          "type": "DCMODEL",
          "links": [
            6,
            13
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadDynamiCrafterModel"
      },
      "widgets_values": [
        "tooncrafter_512_interp-pruned-fp16.safetensors",
        "auto",
        true
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 13,
      "type": "DownloadAndLoadCLIPVisionModel",
      "pos": {
        "0": 562.4000244140625,
        "1": 220
      },
      "size": {
        "0": 327.5999755859375,
        "1": 58
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "clip_vision",
          "type": "CLIP_VISION",
          "links": [
            8
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadCLIPVisionModel"
      },
      "widgets_values": [
        "CLIP-ViT-H-fp16.safetensors"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 10,
      "type": "DownloadAndLoadCLIPModel",
      "pos": {
        "0": 320,
        "1": 420
      },
      "size": [
        309.88747670016573,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "links": [
            4,
            5
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadCLIPModel"
      },
      "widgets_values": [
        "stable-diffusion-2-1-clip-fp16.safetensors"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 5,
      "type": "ToonCrafterInterpolation",
      "pos": {
        "0": 970,
        "1": 330
      },
      "size": {
        "0": 315,
        "1": 418
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "DCMODEL",
          "link": 6
        },
        {
          "name": "clip_vision",
          "type": "CLIP_VISION",
          "link": 8
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 1
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 2
        },
        {
          "name": "images",
          "type": "IMAGE",
          "link": 39
        },
        {
          "name": "optional_latents",
          "type": "LATENT",
          "link": null
        },
        {
          "name": "controlnet",
          "type": "DC_CONTROL",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "links": [
            12
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ToonCrafterInterpolation"
      },
      "widgets_values": [
        20,
        7,
        1,
        16,
        1235,
        "fixed",
        10,
        "auto",
        1,
        0,
        1000
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": {
        "0": 680,
        "1": 350
      },
      "size": [
        210,
        76
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 4
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            1
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        ""
      ]
    },
    {
      "id": 16,
      "type": "ToonCrafterDecode",
      "pos": {
        "0": 1306,
        "1": 331
      },
      "size": {
        "0": 216.8146514892578,
        "1": 102
      },
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "DCMODEL",
          "link": 13
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": 12
        }
      ],
      "outputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "links": [
            36
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ToonCrafterDecode"
      },
      "widgets_values": [
        "auto",
        false
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 19,
      "type": "ImageBatch",
      "pos": {
        "0": 420,
        "1": 820
      },
      "size": [
        140,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "image1",
          "type": "IMAGE",
          "link": 40
        },
        {
          "name": "image2",
          "type": "IMAGE",
          "link": 37
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            38
          ],
          "slot_index": 0,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ImageBatch"
      },
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 15,
      "type": "ImageResize",
      "pos": {
        "0": 580,
        "1": 820
      },
      "size": {
        "0": 315,
        "1": 246
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 38
        },
        {
          "name": "mask_optional",
          "type": "MASK",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            39
          ],
          "slot_index": 0,
          "shape": 3
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "ImageResize"
      },
      "widgets_values": [
        "resize only",
        0,
        512,
        0,
        "reduce size only",
        "4:3",
        0.5,
        20
      ],
      "color": "#323",
      "bgcolor": "#535"
    },
    {
      "id": 36,
      "type": "LoadImage",
      "pos": {
        "0": 60,
        "1": 570
      },
      "size": [
        315,
        314
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            40
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "0170.png",
        "image"
      ]
    }
  ],
  "links": [
    [
      1,
      6,
      0,
      5,
      2,
      "CONDITIONING"
    ],
    [
      2,
      7,
      0,
      5,
      3,
      "CONDITIONING"
    ],
    [
      4,
      10,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      5,
      10,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      11,
      0,
      5,
      0,
      "DCMODEL"
    ],
    [
      8,
      13,
      0,
      5,
      1,
      "CLIP_VISION"
    ],
    [
      12,
      5,
      0,
      16,
      1,
      "LATENT"
    ],
    [
      13,
      11,
      0,
      16,
      0,
      "DCMODEL"
    ],
    [
      36,
      16,
      0,
      38,
      0,
      "IMAGE"
    ],
    [
      37,
      37,
      0,
      19,
      1,
      "IMAGE"
    ],
    [
      38,
      19,
      0,
      15,
      0,
      "IMAGE"
    ],
    [
      39,
      15,
      0,
      5,
      4,
      "IMAGE"
    ],
    [
      40,
      36,
      0,
      19,
      0,
      "IMAGE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.601314800901579,
      "offset": [
        132.14306296953706,
        120.78753938381911
      ]
    }
  },
  "version": 0.4
}

二枚の画像を渡すと、その間に 「ストーリーを持った動き」 を作りながらつないでくれます。

単純な直線補間ではなく、「途中で何が起きるか」もある程度AIが作るため、モーフィングというより「短いストーリーのある動画」に近づいていきます。

ToonCrafterはこの系統の初期のモデルですが、新しい動画モデルが出るたびに桁違いに自然なFLF2Vモデルが出てくるため、今使う意味はほとんどありません。

Extension

ここまでのフレーム補間は、「隣り合うペアごとに独立して処理する」ものでした。
3 枚以上の入力フレームがあっても、以下のようにそれぞれは2枚のフレーム補間を繰り返していただけです。

1–2 枚目の間を埋める…
2–3 枚目の間を埋める…
3–4 枚目の間を埋める…

VACE のExtensionは、ここから一段発展しています。

従来のVFIが「隣の2枚の間だけを見る」のに対して、Extensionは一つの動画全体に対して複数のキーフレームを配置し、その間全体を生成モデル側でつなぎます。

例えば、81フレームの動画を生成するとしましょう。そのうち何フレームかに「キーフレーム」を差し込みます。モデルは、そのキーフレーム同士を同じ時間軸の中で自然につなぐように動画を生成します。

FLF2Vと比べ、遥かに自然な動画が生成されます。おそらく、今後はExtensionのような技術が主流になるでしょう。

フレーム補間