BLIP-2 でずんだもんのキャプションを出力してもらった

さてさて、
BLIP , UniDiffuser で 自動キャプショニング(Image captioning, Image-toText) を実施しましたが、ここ数日は BLIP-2 を調べていました。

本日は BLIP-2 での ずんだもん のキャプションを出力してみます。

実行準備
ずんだもんたちを Image Captioning する
- 実行結果
まとめ

実行準備

Google Drive をマウントしてデータセットを準備する

from google.colab import drive
drive.mount('/content/drive')

!cp -r /content/drive/MyDrive/trains/dataset .
!rm ./dataset/zundamon/*.txt

自作 LoRA を作ったときのデータをそのまま利用します

自作 LoRA の記事はこちら

パッケージをインストールする

!pip install git+https://github.com/huggingface/diffusers
!pip install -U transformers accelerate peft controlnet_aux onnxruntime-gpu insightface

BLIP-2を使うだけならtransformersだけあれば良いですが、
diffusers.utilsを使いたいので、いつもの色々なやつをインストールしておきます

BLIP-2 のモデルをロードする

from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# モデル等をロードする
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16)
model.to(device)

ずんだもんたちを Image Captioning する

import os
from diffusers.utils import load_image

dir_path = "./dataset/zundamon"
for f in os.listdir(dir_path):
    file_path = f"{dir_path}/{f}"
    if not os.path.isfile(file_path):
        continue

    image = load_image(file_path).resize((512, 512))
    inputs = processor(images=image, return_tensors="pt").to(device, torch.float16)
    generated_ids = model.generate(**inputs)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
    print(f"{f}: {generated_text}")

実行結果

教師データ	生成されたキャプションデータ
zundamon (1).png	a cartoon girl with green hair and a green dress (緑の髪と緑のドレスを着た漫画の女の子)
zundamon (2).png	a girl with green hair and a green hoodie (緑の髪と緑のパーカーを着た女の子)
zundamon (3).png	a cartoon character with green hair and green eyes (緑の髪と緑の目をした漫画のキャラクター)
zundamon (4).png	a girl in a green dress holding pom poms (ポンポンを持った緑のドレスを着た女の子)
zundamon (5).png	a cartoon girl with green hair and green overalls (緑の髪と緑のオーバーオールを着た漫画の女の子)
zundamon (6).png	a cartoon character with green ears and green hair (緑の耳と緑の髪をした漫画のキャラクター)
zundamon (7).png	a cartoon character with green hair and green eyes (緑の髪と緑の目をした漫画のキャラクター)
zundamon (8).png	a cartoon character with green hair and green eyes (緑の髪と緑の目をした漫画のキャラクター)
zundamon (9).png	a cartoon character with green hair and green eyes (緑の髪と緑の目をした漫画のキャラクター)
zundamon (10).png	a cartoon character with green ears and a green dress (緑の耳と緑のドレスを着た漫画のキャラクター)
zundamon (11).png	a cartoon girl with green ears and a green dress (緑の耳と緑のドレスを着た漫画の女の子)
zundamon (12).png	a cartoon girl with green hair and green overalls (緑の髪と緑のオーバーオールを着た漫画の女の子)