When I got stable-diffusion to work, I really wanted to see what the AI would do on its own, letting it describe a given picture and re-create the picture from this description. How detailed can or should a description be?
First install the „clip-interrogator“:
python3 -m pip install --upgrade pip setuptools wheel
sudo apt install -y rustc cargo
pip install clip-interrogator
And here’s the sample python file. I got an error when I didn’t reassign/clear the „pipe“.
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
from PIL import Image
from clip_interrogator import Config, Interrogator
print("### Starting Stable Diffusion Pipeline")
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda")
prompt = "a dark environment, two warriors standing on a chess field with swords drawn"
# prompt = "a laptop sitting on top of a wooden table"
steps = 50
width = 512
height = 512
print("### Creating images with Stable Diffusion")
with autocast("cuda"):
for i in range(1):
output = pipe(prompt, width=width, height=height, num_inference_steps=steps)
image = output["images"][0]
file = prompt.replace(" ", "_").replace(",", "")
image.save(f"{file}-{i}.png")
pipe = "" # Destroy?
print("### Initializing Interrogator")
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
file = prompt.replace(" ", "_").replace(",", "")
for i in range(1):
print("Loading file ", i)
image = Image.open(f"{file}-{i}.png").convert('RGB')
print(ci.interrogate(image))