diffusers の AnimateDiff で Scheduler を調整してみる

zako-lab929.hatenablog.com

前回の記事で以下のようなことを書きました。

でも、もうちょっとdiffusers v0.15.0 の Text-to-Video Zero を Google Colab で使ってみる - ジャコ Labの時みたいに、おぉーー！！っていう感じのが出てほしいな。スケジューラの問題？GitHub 版も試してみようかな？

理由としては、以下のように低画質な感じなところです。

これはこれで味があって良いと思いますが、ギャラリーにあるようなもっと彩度の高いものが出したいです。

この記事では、
GitHub 版の実行はエラーになってできなかったのですが、GitHub 版のコンフィグファイルを見るとスケジューラーの設定が少々異なっていることを発見しましたので、スケジューラーを調整してみます。また id:touch-sp さんからコメントをいただいた通り、狙い所はあっていそうです。

はじめに
- (diffusers v0.22.0 版の設定について)
- (GitHub 版の設定について)
調整してみる
- スクリプト全体
  - モデルのロード等 (MotionAdapter, AnimateDiffPipeline, LoRA, DDIMScheduler)
  - パイプライン実行 (変更無し)
実行結果
まとめ

はじめに

(diffusers v0.22.0 版の設定について)

前回の記事では diffusers v0.22.0 版のスクリプトで実行しているため、以下のようなスケジューラーを使っていました。

DDIMScheduler.from_pretrained(
    model_id, 
    subfolder="scheduler", 
    clip_sample=False, 
    timestep_spacing="linspace", 
    steps_offset=1
)

(GitHub 版の設定について)

GitHub 版では以下のような設定を持っているようです

noise_scheduler_kwargs:
  beta_start:    0.00085
  beta_end:      0.012
  beta_schedule: "linear"
  steps_offset:  1
  clip_sample:   False

beta_start, beta_end, timestep_spacing, beta_scheduleに違いがありそうです。

調整してみる

pipe.scheduler = DDIMScheduler.from_config(
    pipe.scheduler.config,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="linear",
    steps_offset=1,
    clip_sample=False
)

id:touch-sp さん、コメントありがとうございます！ここですね！

スクリプト全体

スクリプト全体 (折りたたみ)

モデルのロード等 (MotionAdapter, AnimateDiffPipeline, LoRA, DDIMScheduler)

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler

# モーションアダプターのロード
adapter = MotionAdapter.from_pretrained(
    "guoyww/animatediff-motion-adapter-v1-5-2"
)

# SD 1.5系のモデルを AnimateDiffPipeline でロード
pipe = AnimateDiffPipeline.from_pretrained(
    "frankjoshua/toonyou_beta6",
    motion_adapter=adapter
)

# スケジューラの設定
pipe.scheduler = DDIMScheduler.from_config(
    pipe.scheduler.config,
    beta_start=0.00085,        # この辺が変わりました
    beta_end=0.012,            # この辺が変わりました
    beta_schedule="linear",    # この辺が変わりました
    steps_offset=1,
    clip_sample=False
)

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

パイプライン実行 (変更無し)

import torch
from diffusers.utils import export_to_gif
from datetime import datetime
from zoneinfo import ZoneInfo

# パイプライン実行
prompt = "masterpiece, best quality, 1girl, solo, cherry blossoms, hanami, pink flower, white flower, spring season, wisteria, petals, flower, plum blossoms, outdoors, falling petals, white hair, black eyes"
negative_prompt="bad quality, worse quality"
frames = pipe(
    prompt,
    negative_prompt = negative_prompt,
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.manual_seed(6520604954829636163),
).frames[0]

# Asia/Tokyo タイムゾーンの現在時刻を YYYYMMDDhhmmss 形式で得る
formattedNow = datetime.now(tz=ZoneInfo("Asia/Tokyo")).strftime("%Y%m%d%H%M%S")

# 実行結果
export_to_gif(frames, f"animation_{formattedNow}.gif")

実行結果

多少チカチカしてるけど彩度上がった？

まとめ

スケジューラのパラメータをちょっと変えたらかなり変わった！