OpenAI's GPT 5.2 Arrives: Setting New Benchmarks While Raising Questions on 3D Reasoning and Speed
OpenAI has introduced GPT 5.2, an anticipated update to its large language model series, which is being praised by early access users for its enhanced capabilities across various domains. The model demonstrates significant improvements in economically valuable tasks, achieving 70.9% on GDP Val for the ‘Thinking’ variant (74.1% for Pro) compared to GPT-5’s 38.8%. It has also set new state-of-the-art records on software engineering benchmarks like SWEBench Pro (55.6% for Thinking, 56% for extra high reasoning) and SWEBench Verified (80%). Notably, GPT 5.2 Pro’s extra high reasoning option scored 90.5% on ARC AGI1, representing a 390x efficiency improvement in a year, and 54.2% on the more challenging ARC AGI2. Further advancements include 100% on AME for math problems without tools, reduced hallucinations, 98% accuracy at 256k tokens in long-context needle-in-a-haystack tests, and improved vision and tool-calling accuracy.
Despite these strides, GPT 5.2 presents a performance paradox. A custom skateboard trick benchmark, ‘Skatebench,’ designed to test three-dimensional reasoning, showed a severe regression, with GPT 5.2 scoring 79% on extra high reasoning compared to GPT-5’s 97% default, while consuming three times more tokens. This suggests a potential trade-off in spatial understanding, possibly over-indexed on 2D space. The model also introduces higher pricing: $1.75 per million input tokens and $14 per million output tokens for GPT 5.2 Thinking, with the Pro version escalating to $21 input and $168 output. While OpenAI claims increased token efficiency makes achieving a given quality level cheaper, reaching peak performance often entails higher costs and significantly longer response times. Early testers report that while GPT models excel at precise instruction following, the new 5.2 series, particularly its Pro and extra high reasoning modes, are notably slower, with some tasks taking 30-50 minutes. Nonetheless, the consensus from power users like Matt Schumer is that GPT 5.2 Pro, despite its speed limitations, is “undoubtedly the world’s best model” for deep reasoning, research, and complex coding tasks, offering an “uncanny ability to infer missing context.”