

Historically, each new generation of OpenAI’s models has delivered incremental improvements in factual accuracy, with hallucination rates dropping as the technology matured. However, internal testing and third-party evaluations now reveal that o3 and o4-mini, both classified as “reasoning models,” are more prone to making things up than earlier reasoning models…