Connecting Comic Books to Generative AI

"As a reminder, these typically fall into two categories: cbr - A RAR file of scanned images cbz - A zip file of scanned images This week I was wondering - given that GenAI tools are pretty good at understanding images - how well could a GenAI system take a set of images, in order, and understand the context of the story behind them. I decided to give it a shot and honestly, I'm pretty impressed by the results."

"from google import genai import os import io import zipfile import rarfile import sys client = genai.Client() prompt = """ You analyze a set of images from a comic book in order to write a summary of the comic in question. You will be given a set of images, in order, representing each page of the comic book. For each page, you will attempt to determine if it's an ad, and if so, ignore it."

A pipeline scans a directory for .cbr and .cbz comic archives, skipping any comic that already has a .txt summary. For archives without summaries, the pipeline extracts images from RAR/ZIP files, uploads each page image to Google Gemini's Files API for temporary storage, and sends an ordered image list plus a prompt that instructs the model to ignore ads and produce a one‑paragraph summary. The system uses genai.Client and common Python libraries (zipfile, rarfile, io, os). Generated summaries are written back to the filesystem as .txt files, producing consistent story-context summaries from page images.

#genai #comics #image-understanding #gemini-api

Read at Raymondcamden

Unable to calculate read time

Collection

[

...

]

Connecting Comic Books to Generative AIConnecting Comic Books to Generative AI Briefly

Connecting Comic Books to Generative AI
Connecting Comic Books to Generative AI
Briefly