{"id":1248,"date":"2025-04-16T19:00:25","date_gmt":"2025-04-16T19:00:25","guid":{"rendered":"https:\/\/violethoward.com\/new\/openai-launches-o3-and-o4-mini-ai-models-that-think-with-images-and-use-tools-autonomously\/"},"modified":"2025-04-16T19:00:25","modified_gmt":"2025-04-16T19:00:25","slug":"openai-launches-o3-and-o4-mini-ai-models-that-think-with-images-and-use-tools-autonomously","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/openai-launches-o3-and-o4-mini-ai-models-that-think-with-images-and-use-tools-autonomously\/","title":{"rendered":"OpenAI launches o3 and o4-mini, AI models that &#8216;think with images&#8217; and use tools autonomously"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>OpenAI launched two groundbreaking AI models today that can reason with images and use tools independently, representing what experts call a step change in artificial intelligence capabilities.<\/p>\n\n\n\n<p>The San Francisco-based company introduced o3 and o4-mini, the latest in its \u201co-series\u201d of reasoning models, which it claims are its most intelligent and capable models to date. These systems can integrate images directly into their reasoning process, search the web, run code, analyze files, and even generate images within a single task flow.<\/p>\n\n\n\n<p>\u201cThere are some models that feel like a qualitative step into the future. GPT-4 was one of those. Today is also going to be one of those days,\u201d said Greg Brockman, OpenAI\u2019s president, during a press conference announcing the release. \u201cThese are the first models where top scientists tell us they produce legitimately good and useful novel ideas.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><p>\n<iframe loading=\"lazy\" title=\"OpenAI o3 &amp; o4-mini\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/sq8GBPUb3rk?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/p><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-openai-s-new-models-think-with-images-to-transform-visual-problem-solving\">How OpenAI\u2019s new models \u2018think with images\u2019 to transform visual problem-solving<\/h2>\n\n\n\n<p>The most striking feature of these new models is their ability to \u201cthink with images\u201d \u2014 not just see them, but manipulate and reason about them as part of their problem-solving process.<\/p>\n\n\n\n<p>\u201cThey don\u2019t just see an image \u2014 they think with it,\u201d OpenAI said in a statement sent to VentureBeat. \u201cThis unlocks a new class of problem-solving that blends visual and textual reasoning.\u201d<\/p>\n\n\n\n<p>During a demonstration at the press conference, a researcher showed how o3 could analyze a physics poster from a decade-old internship, navigate its complex diagrams independently, and even identify that the final result wasn\u2019t present in the poster itself.<\/p>\n\n\n\n<p>\u201cIt must have just read, you know, at least like 10 different papers in a few seconds for me,\u201d Brandon McKenzie, a researcher at OpenAI working on multimodal reasoning, said during the demo. He estimated the task would have taken him \u201cmany days just for me to even like, onboard myself, back to my project, and then a few days more probably, to actually search through the literature.\u201d<\/p>\n\n\n\n<p>The ability for AI to manipulate images in its reasoning process \u2014 zooming in on details, rotating diagrams, or cropping unnecessary elements \u2014 represents a novel approach that industry analysts say could revolutionize fields from scientific research to education.<\/p>\n\n\n\n<blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">I had early access, o3 is an impressive model, seems very capable. Some fun examples:<br\/>1) Cracked a business case I use in my class<br\/>2) Creating some SVGs (images created by code alone)<br\/>3) Writing a constrained story of two interlocking gyres<br\/>4) Hard science fiction space battle. <a href=\"https:\/\/t.co\/TK4PKvKNoT\">pic.twitter.com\/TK4PKvKNoT<\/a><\/p>\u2014 Ethan Mollick (@emollick) <a href=\"https:\/\/twitter.com\/emollick\/status\/1912552106214502739?ref_src=twsrc%5Etfw\">April 16, 2025<\/a><\/blockquote> \n\n\n\n\n\n\n\n<p>OpenAI executives emphasized that these releases represent more than just improved models \u2014 they\u2019re complete AI systems that can independently use and chain together multiple tools when solving problems.<\/p>\n\n\n\n<p>\u201cWe\u2019ve trained them to use tools through reinforcement learning\u2014teaching them not just how to use tools, but to reason about when to use them,\u201d the company explained in its release.<\/p>\n\n\n\n<p>Greg Brockman highlighted the models\u2019 extensive tool use capabilities: \u201cThey actually use these tools in their chain of thought as they\u2019re trying to solve a hard problem. For example, we\u2019ve seen o3 use like 600 tool calls in a row trying to solve a really hard task.\u201d<\/p>\n\n\n\n<p>This capability allows the models to perform complex, multi-step workflows without constant human direction. For instance, if asked about future energy usage patterns in California, the AI can search the web for utility data, write Python code to analyze it, generate visualizations, and produce a comprehensive report \u2014 all as a single fluid process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-openai-surges-ahead-of-competitors-with-record-breaking-performance-on-key-ai-benchmarks\">OpenAI surges ahead of competitors with record-breaking performance on key AI benchmarks<\/h2>\n\n\n\n<p>OpenAI claims o3 sets new state-of-the-art benchmarks across key measures of AI capability, including Codeforces, SWE-bench, and MMMU. In evaluations by external experts, o3 reportedly makes 20 percent fewer major errors than its predecessor on difficult, real-world tasks.<\/p>\n\n\n\n<p>The smaller o4-mini model is optimized for speed and cost efficiency while maintaining strong reasoning capabilities. On the AIME 2025 mathematics competition, o4-mini scored 99.5 percent when given access to a Python interpreter.<\/p>\n\n\n\n<p>\u201cI really do believe that with this suite of models, o3 and o4-mini, we\u2019re going to see more advances,\u201d Mark Chen, OpenAI\u2019s head of research, said during the press conference.<\/p>\n\n\n\n<p>The timing of this release is significant, coming just two days after OpenAI unveiled its GPT-4.1 model, which excels at coding tasks. The rapid succession of announcements signals an acceleration in the competitive AI landscape, where OpenAI faces increasing pressure from Google\u2019s Gemini models, Anthropic\u2019s Claude, and Elon Musk\u2019s xAI.<\/p>\n\n\n\n<p>Last month, OpenAI closed what amounts to the largest private tech funding round in history, raising $40 billion at a $300 billion valuation. The company is also reportedly considering building its own social network, potentially to compete with Elon Musk\u2019s X platform and to secure a proprietary source of training data.<\/p>\n\n\n\n<blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">o3 and o4-mini are super good at coding, so we are releasing a new product, Codex CLI, to make them easier to use.<\/p><p>this is a coding agent that runs on your computer. it is fully open source and available today; we expect it to rapidly improve.<\/p>\u2014 Sam Altman (@sama) <a href=\"https:\/\/twitter.com\/sama\/status\/1912558495997784441?ref_src=twsrc%5Etfw\">April 16, 2025<\/a><\/blockquote> \n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-openai-s-new-models-transform-software-engineering-with-unprecedented-code-navigation-abilities\">How OpenAI\u2019s new models transform software engineering with unprecedented code navigation abilities<\/h2>\n\n\n\n<p>One area where the new models particularly excel is software engineering. Brockman noted during the press conference that o3 is \u201cactually better than I am at navigating through our OpenAI code base, which is really useful.\u201d<\/p>\n\n\n\n<p>As part of the announcement, OpenAI also introduced Codex CLI, a lightweight coding agent that runs directly in a user\u2019s terminal. The open-source tool allows developers to leverage the models\u2019 reasoning capabilities for coding tasks, with support for screenshots and sketches.<\/p>\n\n\n\n<p>\u201cWe\u2019re also sharing a new experiment: Codex CLI, a lightweight coding agent you can run from your terminal,\u201d the company announced. \u201cYou can get the benefits of multimodal reasoning from the command line by passing screenshots or low fidelity sketches to the model, combined with access to your code locally.\u201d<\/p>\n\n\n\n<p>To encourage adoption, OpenAI is launching a $1 million initiative to support projects using Codex CLI and OpenAI models, with grants available in increments of $25,000 in API credits.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-inside-openai-s-enhanced-safety-protocols-how-the-company-protects-against-ai-misuse\">Inside OpenAI\u2019s enhanced safety protocols: How the company protects against AI misuse<\/h2>\n\n\n\n<p>OpenAI reports conducting extensive safety testing on the new models, particularly focused on their ability to refuse harmful requests. The company\u2019s safety measures include completely rebuilding their safety training data and developing system-level mitigations to flag dangerous prompts.<\/p>\n\n\n\n<p>\u201cWe stress tested both models with our most rigorous safety program to date,\u201d the company stated, noting that both o3 and o4-mini remain below OpenAI\u2019s \u201cHigh\u201d threshold for potential risks in biological, cybersecurity, and AI self-improvement capabilities.<\/p>\n\n\n\n<p>During the press conference, OpenAI researchers Wenda and Ananya presented detailed benchmark results, noting that the new models underwent over 10 times the training compute of previous versions to achieve their capabilities.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-when-and-how-you-can-access-o3-and-o4-mini-deployment-timeline-and-commercial-strategy\">When and how you can access o3 and o4-mini: Deployment timeline and commercial strategy<\/h2>\n\n\n\n<p>The new models are immediately available to ChatGPT Plus, Pro, and Team users, with Enterprise and Education customers gaining access next week. Free users can sample o4-mini by selecting \u201cThink\u201d in the composer before submitting queries.<\/p>\n\n\n\n<p>Developers can access both models via OpenAI\u2019s Chat Completions API and Responses API, though some organizations will need verification to access them.<\/p>\n\n\n\n<p>The release represents a significant commercial opportunity for OpenAI, as the models appear both more capable and more cost-efficient than their predecessors. \u201cFor example, on the 2025 AIME math competition, the cost-performance frontier for o3 strictly improves over o1, and similarly, o4-mini\u2019s frontier strictly improves over o3-mini,\u201d the company stated.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-future-of-ai-how-openai-is-bridging-reasoning-and-conversation-for-next-generation-systems\">The future of AI: How OpenAI is bridging reasoning and conversation for next-generation systems<\/h2>\n\n\n\n<p>Industry analysts view these releases as part of a broader convergence in AI capabilities, with models increasingly combining specialized reasoning with natural conversation abilities and tool use.<\/p>\n\n\n\n<p>\u201cToday\u2019s updates reflect the direction our models are heading in: we\u2019re converging the specialized reasoning capabilities of the o-series with more of the natural conversational abilities and tool use of the GPT-series,\u201d OpenAI noted in its release.<\/p>\n\n\n\n<p>Ethan Mollick, associate professor at the Wharton School who studies AI adoption, described o3 as \u201ca very strong model, but still a jagged one\u201d in a social media post after the announcement.<\/p>\n\n\n\n<p>As competition in the AI space continues to intensify, with Google, Anthropic, and others releasing increasingly powerful models, OpenAI\u2019s dual focus on both reasoning capabilities and practical tool use suggests a strategy aimed at maintaining its leadership position by delivering both intelligence and utility.<\/p>\n\n\n\n<p>With o3 and o4-mini, OpenAI has crossed a threshold where machines begin to perceive images the way humans do\u2014manipulating visual information as an integral part of their thinking process rather than merely analyzing what they see. This shift from passive recognition to active visual reasoning may ultimately prove more significant than any benchmark score, representing the moment when AI began to truly see the world through thinking eyes.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div><template id="BCm3R9WqkZjulY7P5L55"></template><\/script>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/openai-launches-o3-and-o4-mini-ai-models-that-think-with-images-and-use-tools-autonomously\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI launched two groundbreaking AI models today that can reason with images and use tools independently, representing what experts call a step change in artificial intelligence capabilities. The San Francisco-based company introduced o3 and o4-mini, the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1249,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-1248","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/04\/nuneybits_Vector_art_of_robot_who_can_see_very_well_a092bb47-3b17-473e-82a2-69296cc80b4d.webp.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=1248"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1248\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/1249"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=1248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=1248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=1248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 04:11:39 UTC -->