Stable Diffusion 3 が公開されたので使ってみました

最も洗練された画像生成モデル、Stable Diffusion 3 Medium のオープンリリースを発表 — Stability AI Japan

Stable Diffusiion 3 Medium が公開されましたので早速使ってみました。

Stable Diffuson 3 を使ってみます
Stable Diffusion 3 のドキュメントを読んでみる
オフロードを使ってもう1回 Stable Diffusion 3 を使ってみる
まとめ

Stable Diffuson 3 を使ってみます

Hugging Face にログインが必要です

!huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): ******************************

***** のところに Hugging Face のアクセストークンを入れるよ

アクセストークンの作り方は以下の記事にて

ログインできたか確認

!huggingface-cli whoami

これでログインできたかわかるみたい

早速実行してみます

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "A cat holding a sign that says hello world"
negative_prompt = ""
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]
image

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

知ってた(´;ω;`)ﾌﾞﾜｯ

Stable Diffusion 3 のドキュメントを読んでみる

The SD3 pipeline uses three text encoders to generate an image. Model offloading is necessary in order for it to run on most commodity hardware. Please use the torch.float16 data type for additional memory savings.

なるほどなるほど。ほとんどの汎用ハードウェアではオフロードが必要だそうだ

更には。。。

SD3 uses three text encoders, one if which is the very large T5-XXL model. This makes it challenging to run the model on GPUs with less than 24GB of VRAM, even when using fp16 precision. The following section outlines a few memory optimizations in Diffusers that make it easier to run SD3 on low resource hardware.

fp16 でも 24GB 未満の VRAM では実行困難だと！？
私のような下々の人間には無理じゃないか

オフロードを使ってもう1回 Stable Diffusion 3 を使ってみる

  import torch
  from diffusers import StableDiffusion3Pipeline

  pipe = StableDiffusion3Pipeline.from_pretrained(
      "stabilityai/stable-diffusion-3-medium-diffusers",
      torch_dtype=torch.float16
  )
- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

  prompt = "A cat holding a sign that says hello world"
  negative_prompt = ""
  image = pipe(
      prompt,
      negative_prompt=negative_prompt,
      num_inference_steps=28,
      guidance_scale=7.0,
  ).images[0]
  image

いけたー！よかったー！何もしてないのに高画質！！ちゃんと文字も生成されてる！！