Google Colab で BLIP とやらで自動キャプショニングを試してみる

kohya-ss/sd-scripts 内に BLIP とやらで 自動でキャプションを生成 できるスクリプトがあるようなので試してみます。

どうやら元を辿ると salesforce/BLIP に辿り着くようです。

スクリプトファイル
- 使い方
Google Colab で試してみる
その他
まとめ

スクリプトファイル

スクリプトファイルは ${Repository}/finetune/make_captions.py のようです。

使い方

python finetune/make_captions.py --batch_size <バッチサイズ> <教師データフォルダ>

ふむふむ

Google Colab で試してみる

教師データは下記の記事で使用した zundamon を使用します。

zako-lab929.hatenablog.com

教師データを Google Colab 上に準備する

Google Drive をマウント

from google.colab import drive
drive.mount('/content/drive')

Google Drive から教師データをカレントディレクトリへコピー

!cp -r /content/drive/MyDrive/trains/dataset .
!rm ./dataset/zundamon/*.txt

前回用意したデータセットには既にキャプションファイルが含まれているので消しちゃいます

kohya-ss/sd-scripts を Google Colab 上に準備する

kohya-ss/sd-scripts をクローン

!git clone https://github.com/kohya-ss/sd-scripts.git

ワーキングディレクトリをリポジトリ直下に変更

%cd sd-scripts

必要なモジュールをインストール

!pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
!pip install -U -r requirements.txt
!pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu121

追加で以下のモジュールが必要でした

!pip install timm fairscale

これらのモジュールが足りないと怒られた

いざ！実行！

!python finetune/make_captions.py --batch_size 8 /content/dataset/zundamon

教師データ	生成されたキャプションデータ
zundamon (1).png	an anime female dressed in green is flying (緑色の服を着たアニメの女性が飛んでいます)
zundamon (2).png	this anime character is about to get into the action (このアニメのキャラクターは、今にもアクションを始めようとしています)
zundamon (3).png	a drawing of a female character holding a green frisbee (緑色のフリスビーを持っている女性キャラクターの絵)
zundamon (4).png	a cartoon character holding some poms in her hand (ポンポンを手に持っている漫画のキャラクター)
zundamon (5).png	a anime female dressed as an apple holding a leaf (葉っぱを持っているリンゴの格好をしたアニメの女性)
zundamon (6).png	the cartoon girl is in the image has green hair (画像に写っている漫画の女の子は緑色の髪をしています)
zundamon (7).png	a cartoon girl is wearing green outfit and holding a green object (漫画の女の子は緑色の服を着て、緑色の物を持っています)
zundamon (8).png	a girl with green hair standing up (緑色の髪の女の子が立っています)
zundamon (9).png	a girl in green is staring over her shoulder (緑色の女の子が肩越しに見つめています)
zundamon (10).png	a cartoon anime character is holdi leaf (漫画のアニメのキャラクターが葉っぱを持っています)
zundamon (11).png	a young woman with green hair holding a green frisbee (緑色のフリスビーを持っている緑色の髪の若い女性)
zundamon (12).png	a cute cartoon character is wearing a green suit (かわいい漫画のキャラクターが緑色のスーツを着ています)