diffusers v0.15.0 の Stable UnCLIP というものを使ってみる

github.com

diffuers v0.15.0 でリリースされたものは、 Text-to-Video や オーディオ生成 , Stable UnCLIP , Multi ControlNet など面白そうなもの機能がたくさんあります。

zako-lab929.hatenablog.com

以前の記事で触っていた Prompt Weighting の Compel も v0.15.0 で登場したようです。

知らなかった！

本記事ではその中でも Stable UnCLIP を触ってみようかなと思います。
選んだ理由としては Image-to-Image のようなノリでバリエーションを増やせるようなので、やってみようかなと思った感じです。

Stable UnCLIP とは
スクリプト
- 生成物 x6
Image-to-Image は？
- stabilityai/stable-diffusion-2-1
  - 品質系のプロンプトを入れてみました
- runwayml/stable-diffusion-v1-5
  - 品質系のプロンプトを入れてみました
まとめ

Stable UnCLIP とは

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image

Google 翻訳によるとオープンソースの画像バリエーションモデルとしては最高です。初期画像を渡し、オプションで画像のバリエーションを生成するプロンプトを渡します

となっていました。

とりあえず、バリエーションを作ってくれそう。画像が必要。必要に応じてプロンプトを渡す。ことがわかった。

Image-to-Image と何が違うかというと「元画像に 意味的に近い画像 を生成する」らしいです。

リリースノート以外に以下にもドキュメントがあったので貼っておきます。

huggingface.co

スクリプト

スクリプトはとても簡単です。

# 元画像の準備
init_image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
init_image = load_image(init_image_url)

# パイプラインの準備
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16).to("cuda")
pipe.enable_model_cpu_offload()

# パイプライン実行
image = pipe(init_image).images[0]
image

生成物 x6

確かに雰囲気が似てる！

Image-to-Image は？

Stable UnCLIP に合わせて Stable Diffusion 2.1 を使いたいところ...

使えるのかしら？

stabilityai/stable-diffusion-2-1

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

# 元画像の準備
init_image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
init_image = load_image(init_image_url)

# パイプラインの準備
pipe = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
).to("cuda")
pipe.enable_model_cpu_offload()

# パイプライン実行
prompt = ""
image = pipe(prompt, image=init_image).images[0]
image

とりあえず空プロンプトで

なんか不思議なものができた！

品質系のプロンプトを入れてみました

prompt = "High quality, Ultra detailed, best quality, insanely detailed, beautiful, masterpiece"
negative_prompt = "worst quality,ugly,bad anatomy,jpeg artifacts"
image = pipe(prompt, negative_prompt=negative_prompt, image=init_image).images[0]
image