Connecting Comic Books to Generative AI
Briefly

A pipeline scans a directory for .cbr and .cbz comic archives, skipping any comic that already has a .txt summary. For archives without summaries, the pipeline extracts images from RAR/ZIP files, uploads each page image to Google Gemini's Files API for temporary storage, and sends an ordered image list plus a prompt that instructs the model to ignore ads and produce a one‑paragraph summary. The system uses genai.Client and common Python libraries (zipfile, rarfile, io, os). Generated summaries are written back to the filesystem as .txt files, producing consistent story-context summaries from page images.
As a reminder, these typically fall into two categories: cbr - A RAR file of scanned images cbz - A zip file of scanned images This week I was wondering - given that GenAI tools are pretty good at understanding images - how well could a GenAI system take a set of images, in order, and understand the context of the story behind them. I decided to give it a shot and honestly, I'm pretty impressed by the results.
from google import genai import os import io import zipfile import rarfile import sys client = genai.Client() prompt = """ You analyze a set of images from a comic book in order to write a summary of the comic in question. You will be given a set of images, in order, representing each page of the comic book. For each page, you will attempt to determine if it's an ad, and if so, ignore it.
Read at Raymondcamden
[
|
]