FreeInit って何じゃろ？

AnimateDiff のドキュメントの中に Using FreeInit という項目があります。

FreeInit is an effective method that improves temporal consistency and overall quality of videos generated using video-diffusion-models without any addition training. It can be applied to AnimateDiff, ModelScope, VideoCrafter and various other video generation models seamlessly at inference time, and works by iteratively refining the latent-initialization noise. More details can be found it the paper.

Google 先生によると FreeInit は、追加のトレーニングを行わずに、ビデオ拡散モデルを使用して生成されたビデオの時間的一貫性と全体的な品質を向上させる効果的な方法です。これは、AnimateDiff、ModelScope、VideoCrafter、およびその他のさまざまなビデオ生成モデルに推論時にシームレスに適用でき、潜在初期化ノイズを繰り返し調整することで機能します。詳細は紙面でご覧いただけます。

なるほど、わからん

とりあえず、動画生成に対して品質向上を期待できるようですので、使ってみようと思います。

FreeInit とは
FreeInit を有効にしてみる
まとめ

FreeInit とは

前述の通りです。

AnimateDiff の出力の品質向上が期待できそう！

FreeInit を有効にしてみる

モデルのロード等 (MotionAdapter, AnimateDiffPipeline, DDIMScheduler)

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler

# モーションアダプターのロード
adapter = MotionAdapter.from_pretrained(
    "guoyww/animatediff-motion-adapter-v1-5-2"
).to("cuda")

# SD 1.5系のモデルを AnimateDiffPipeline でロード
pipe = AnimateDiffPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    motion_adapter=adapter,
    torch_dtype=torch.float16
).to("cuda")

# スケジューラの設定
pipe.scheduler = DDIMScheduler.from_config(
    pipe.scheduler.config,
    beta_schedule="linear",
    clip_sample=False,
    timestep_spacing="linspace",
    steps_offset=1
)

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

パイプラインの実行

import torch

# enable FreeInit
# Refer to the enable_free_init documentation for a full list of configurable parameters
pipe.enable_free_init(method="butterworth", use_fast_sampling=True)

# パイプライン実行
prompt = "a panda playing a guitar, on a boat, in the ocean, high quality"
negative_prompt = "bad quality, worse quality"
frames = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=20,
    generator=torch.Generator("cpu").manual_seed(666),
).frames[0]

# disable FreeInit
pipe.disable_free_init()

実行結果

from datetime import datetime
from zoneinfo import ZoneInfo
from diffusers.utils import export_to_gif

# Asia/Tokyo タイムゾーンの現在時刻を YYYYMMDDhhmmss 形式で得る
formattedNow = datetime.now(tz=ZoneInfo("Asia/Tokyo")).strftime("%Y%m%d%H%M%S")

# 実行結果
export_to_gif(frames, f"animation_{formattedNow}.gif")