Next-word pretraining creates statistical pressure toward hallucination, even with idealized error-free data. Facts lacking repeated support in training data yield unavoidable errors, while recurring regularities do not.
On a clear night I set up my telescope in the yard and let the mount hum along while the camera gathers light from something distant and patient. The workflow is a ritual. Focus by eye until the airy disk tightens. Shoot test frames and watch the histogram. Capture darks, flats, and bias frames so the quirks of the sensor can be cleaned away later. That discipline is not fussy.
What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Fifty-four seconds. That's how long it took Raphael Wimmer to write up an experiment that he did not actually perform, using a new artificial-intelligence tool called Prism, released by OpenAI last month. "Writing a paper has never been easier. Clogging the scientific publishing pipeline has never been easier," wrote Wimmer, a researcher in human-computer action at the University of Regensburg in Germany, on Bluesky. Large language models (LLMs) can suggest hypotheses, write code and draft papers, and AI agents are automating parts of the research process.