{"id":4476,"date":"2025-11-20T00:22:48","date_gmt":"2025-11-20T00:22:48","guid":{"rendered":"https:\/\/violethoward.com\/new\/openai-debuts-gpt%e2%80%915-1-codex-max-coding-model-and-it-already-completed-a-24-hour-task-internally\/"},"modified":"2025-11-20T00:22:48","modified_gmt":"2025-11-20T00:22:48","slug":"openai-debuts-gpt%e2%80%915-1-codex-max-coding-model-and-it-already-completed-a-24-hour-task-internally","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/openai-debuts-gpt%e2%80%915-1-codex-max-coding-model-and-it-already-completed-a-24-hour-task-internally\/","title":{"rendered":"OpenAI debuts GPT\u20115.1-Codex-Max coding model and it already completed a 24-hour task internally"},"content":{"rendered":"

\n
<\/p>\n

OpenAI has introduced GPT\u20115.1-Codex-Max<\/b>, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT\u20115.1-Codex-Max will now replace GPT\u20115.1-Codex as the default model across Codex-integrated surfaces.<\/p>\n

The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows.<\/p>\n

It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks: <\/p>\n

On SWE-Bench Verified<\/b>, GPT\u20115.1-Codex-Max achieved 77.9% accuracy<\/b> at extra-high reasoning effort, edging past Gemini 3 Pro\u2019s 76.2%. <\/p>\n

It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini\u2019s 54.2%, <\/b>and matched Gemini\u2019s score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark.<\/p>\n

When measured against Gemini 3 Pro\u2019s most advanced configuration \u2014 its Deep Thinking model \u2014 Codex-Max holds a slight edge in agentic coding benchmarks, as well. <\/p>\n

Performance Benchmarks: Incremental Gains Across Key Tasks<\/b><\/h3>\n
GPT\u20115.1-Codex-Max demonstrates measurable improvements over GPT\u20115.1-Codex across a range of standard software engineering benchmarks. <\/p>\n
On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT\u20115.1-Codex\u2019s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT\u20115.1-Codex\u2019s 73.7%.<\/p>\n
Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT\u20115.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT\u20115.1-Codex. <\/p>\n
All evaluations were run with compaction and extra-high reasoning effort enabled.<\/p>\n
These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads.<\/p>\n
Technical Architecture: Long-Horizon Reasoning via Compaction<\/b><\/h3>\n
A major architectural improvement in GPT\u20115.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction<\/b>. <\/p>\n
This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit \u2014 effectively allowing for continuous work across millions of tokens without performance degradation.<\/p>\n
The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging.<\/p>\n
Compaction also improves token efficiency. At medium reasoning effort, GPT\u20115.1-Codex-Max used approximately 30% fewer thinking tokens than GPT\u20115.1-Codex for comparable or better accuracy, which has implications for both cost and latency.<\/p>\n
Platform Integration and Use Cases<\/b><\/h3>\n
GPT\u20115.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI\u2019s own integrated tools and interfaces built specifically for code-focused AI agents. These include:<\/p>\n
\n
\n
Codex CLI<\/b>, OpenAI\u2019s official command-line tool (@openai\/codex), where GPT\u20115.1-Codex-Max is already live.<\/p>\n<\/li>\n
\n
IDE extensions<\/b>, likely developed or maintained by OpenAI, though no specific third-party IDE integrations were named.<\/p>\n<\/li>\n
\n
Interactive coding environments<\/b>, such as those used to demonstrate frontend simulation apps like CartPole or Snell\u2019s Law Explorer.<\/p>\n<\/li>\n
\n
Internal code review tooling<\/b>, used by OpenAI\u2019s engineering teams.<\/p>\n<\/li>\n<\/ul>\n
For now, GPT\u20115.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI.<\/p>\n
It is not currently confirmed whether or how the model will integrate into third-party IDEs unless they are built on top of the CLI or future API.<\/p>\n
The model is capable of interacting with live tools and simulations. Examples shown in the release include:<\/p>\n
\n
\n
An interactive CartPole policy gradient simulator, which visualizes reinforcement learning training and activations.<\/p>\n<\/li>\n
\n
A Snell\u2019s Law optics explorer, supporting dynamic ray tracing across refractive indices.<\/p>\n<\/li>\n<\/ul>\n
These interfaces exemplify the model\u2019s ability to reason in real time while maintaining an interactive development session \u2014 effectively bridging computation, visualization, and implementation within a single loop.<\/p>\n
Cybersecurity and Safety Constraints<\/b><\/h3>\n
While GPT\u20115.1-Codex-Max does not meet OpenAI\u2019s \u201cHigh\u201d capability threshold for cybersecurity under its Preparedness Framework, it is currently the most capable cybersecurity model OpenAI has deployed. It supports use cases such as automated vulnerability detection and remediation, but with strict sandboxing and disabled network access by default.<\/p>\n
OpenAI reports no increase in scaled malicious use but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated to a local workspace unless developers opt-in to broader access, mitigating risks like prompt injection from untrusted content.<\/p>\n
Deployment Context and Developer Usage<\/b><\/h3>\n
GPT\u20115.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise<\/b> plans. It will also become the new default in Codex-based environments, replacing GPT\u20115.1-Codex, which was a more general-purpose model.<\/p>\n
OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average \u2014 highlighting the tool\u2019s impact on internal development velocity.<\/p>\n
Despite its autonomy and persistence, OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code.<\/p>\n
Outlook<\/b><\/h3>\n
GPT\u20115.1-Codex-Max represents a significant evolution in OpenAI\u2019s strategy toward agentic development tools, offering greater reasoning depth, token efficiency, and interactive capabilities across software engineering tasks. By extending its context management and compaction strategies, the model is positioned to handle tasks at the scale of full repositories, rather than individual files or snippets.<\/p>\n
With continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-assisted programming environments \u2014 while underscoring the importance of oversight in increasingly autonomous systems.<\/p>\n

\n
Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"
OpenAI has introduced GPT\u20115.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT\u20115.1-Codex-Max will now replace GPT\u20115.1-Codex as the default model across Codex-integrated surfaces. The new model is designed to […]<\/p>\n","protected":false},"author":1,"featured_media":4477,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-4476","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/11\/lsRydt-rLj3bqzrXKMGoN_b9a12412710e4c468f59b1b66cb31312__1_.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4476","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=4476"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4476\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/4477"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=4476"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=4476"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=4476"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}