GPT-5.2 Faces Scrutiny: New Model's Real-World Performance Questioned Despite Benchmark Wins
Initial assessments of GPT-5.2 reveal significant discrepancies between its impressive benchmark scores and practical usability, with critical failures in basic reasoning and regressions in key areas. Independent testing suggests a potential 'benchmark-maxing' issue, challenging the conventional metrics for LLM superiority.