diffusers で GLIGEN を試してみる (3) - 枠でスタイルも指示できる？

zako-lab929.hatenablog.com

前回の記事までで StableDiffusionGLIGENPipeline 及び StableDiffusionGLIGENTextImagePipeline を使ってきました。

後者の StableDiffusionGLIGENTextImagePipeline は、
もう1個使用例があったのですが、やや分かりづらかったので記事を分割してしまいました。

この記事では もう1個の使用例 GLIGEN でスタイルも指示するパターンを見ていきます。

Grounded Generation (2) - (StableDiffusionGLIGENTextImagePipeline の使用例3)
まとめ

Grounded Generation (2) - (StableDiffusionGLIGENTextImagePipeline の使用例3)

placeholder という予約語みたいのが登場して若干わかりづらかったですが、
自分なりの解釈をしながら進めていきます。

モデルのロード等 (StableDiffusionGLIGENTextImagePipeline)

import torch
from diffusers import StableDiffusionGLIGENTextImagePipeline

# パイプラインの準備
pipe = StableDiffusionGLIGENTextImagePipeline.from_pretrained(
    "anhnct/Gligen_Text_Image",
    torch_dtype=torch.float16
).to("cuda")

枠の準備と設定

from diffusers.utils import load_image

# 枠の設定
boxes = [[0.4, 0.2, 1.0, 0.8], [0.0, 1.0, 0.0, 1.0]]  # Set `[0.0, 1.0, 0.0, 1.0]` for the style
gligen_image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
gligen_image = load_image(gligen_image_url)
gligen_placeholder_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
gligen_placeholder = load_image(gligen_placeholder_url)

ここで得られる情報として、 バウンディングボックスは2つ あるようです。
そして gligen_image と gligen_placeholder という画像変数を用意しています。

gligen_image と gligen_placeholder は同じURLをロードしているので全く同じ画像が入っていますが一旦無視しておきましょう。

コメントを見る限り[0.0, 1.0, 0.0, 1.0]は style 向けの特殊な設定と見受けられる。

パイプライン実行

from diffusers.utils import make_image_grid

# パイプラインの実行
prompt = "a dragon flying on the sky"
image = pipe(
    prompt=prompt,
    gligen_phrases=["dragon", "placeholder"],  # Can use any text instead of `placeholder` token, because we will use mask here
    gligen_images=[gligen_placeholder, gligen_image],  # Can use any image in gligen_placeholder, because we will use mask here
    input_phrases_mask=[1, 0],  # Set 0 for the placeholder token
    input_images_mask=[0, 1],  # Set 0 for the placeholder image
    gligen_boxes=boxes,
    gligen_scheduled_sampling_beta=1,
).images[0]

# 実行結果
make_image_grid([gligen_image, gligen_placeholder, image], rows=1, cols=3)

ここからは自分なりの解釈をしながらパラメータを深堀りしていきます。