
"OpenAI's new GPT Image 1.5 is an AI image synthesis model that reportedly generates images up to four times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents another step toward making photorealistic image manipulation a casual process that requires no particular visual skills."
"GPT Image 1.5 is notable because it's a "native multimodal" image model, meaning image generation happens inside the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a different technique called diffusion to generate images.) This newer type of model, which we covered in more detail in March, treats images and text as the same kind of thing: chunks of data called "tokens" to be predicted, patterns to be completed."
For most of photography's roughly 200-year history, altering a photo convincingly required a darkroom, Photoshop expertise, or manual collage techniques. Google released a public prototype called the Nano Banana image model earlier in the year, prompting OpenAI to accelerate its own work. OpenAI's GPT Image 1.5 reportedly generates images up to four times faster and costs about 20 percent less via the API, and rolled out to all ChatGPT users. The model is native multimodal, processing images and text as tokens in a unified neural network, enabling coherent pixel-level edits like changing pose, position, or viewpoint.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]