OpenAI Uncovers 'Ghosts in the Codeex Machine' Addressing Reported Performance Degradation
OpenAI has published an extensive internal report, titled “Ghosts in the Codeex Machine,” addressing widespread user concerns regarding a perceived degradation in the performance of GPT-5 Codex since its launch on September 15th. Reacting to increasing public reports, OpenAI initiated a full-time investigation by a dedicated team of engineers, a move highlighting their commitment to user feedback following past industry incidents. The investigation methodology included upgrading CLI feedback mechanisms, dogfooding internal usage with external setups, auditing infrastructure and feature flags, and significantly increasing evaluation runs to identify anomalies.
The investigation uncovered a combination of factors contributing to the observed performance shifts. Key findings included slight performance degradation on older hardware, which has since been removed from the fleet, and opportunities to improve load balancing strategies. The team also identified issues with compaction frequency in long-running sessions, where recursive summaries led to reduced accuracy, prompting improvements and user warnings. Bugs in the apply patch tool sometimes caused models to resort to file deletion, with mitigations planned for immediate and future model versions. A subtle bug in constrained sampling affected a small percentage of sessions, leading to anomalous language shifts. While overall latency had shown improvement, inefficient escalating timeouts for persistent tasks were noted. Additionally, lower-than-expected authentication cache rates introduced minor latency, and increasingly complex user setups with multiple MCP (Multi-Tool Code Pilot) tools were found to degrade performance. OpenAI has confirmed a series of improvements already deployed, with more fixes underway, and has established a permanently staffed team to continuously monitor and enhance Codex performance. The company also demonstrated its commitment by refunding Codex credit usage and resetting rate limits for affected users, addressing an overcharging bug in cloud tasks.