{"id":3056,"date":"2025-08-08T19:07:06","date_gmt":"2025-08-08T19:07:06","guid":{"rendered":"https:\/\/violethoward.com\/new\/openais-gpt-5-rollout-is-not-going-smoothly\/"},"modified":"2025-08-08T19:07:06","modified_gmt":"2025-08-08T19:07:06","slug":"openais-gpt-5-rollout-is-not-going-smoothly","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/openais-gpt-5-rollout-is-not-going-smoothly\/","title":{"rendered":"OpenAI&#8217;s GPT-5 rollout is not going smoothly"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>The launch of OpenAI\u2019s long anticipated new model, GPT-5, is <strong>off to a rocky start<\/strong> to say the least.<\/p>\n\n\n\n<p>Even forgiving errors in charts and voice demoes during yesterday\u2019s livestreamed presentation of the new model (actually four separate models, and a \u2018Thinking\u2019 mode that can be engaged for three of them), a<strong> number of user reports have emerged since GPT-5\u2019s release showing it erring badly <\/strong>when solving relatively simple problems that preceding OpenAI models \u2014 and rivals from competing AI labs \u2014  answer correctly. <\/p>\n\n\n\n<p>For example, data scientist Colin Fraser posted screenshots showing <strong>GPT-5 getting a math proof wrong (whether 8.888 repeating is equal to 9 \u2014 it is of course, not).<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"\/>\n\n\n\n<p>It also <strong>failed on a simple algebra arithmetic<\/strong> <strong>problem<\/strong> that elementary schoolers could probably nail, 5.9 = x + 5.11. <\/p>\n\n\n\n<div id=\"boilerplate_2803147\" class=\"post-boilerplate boilerplate-speedbump\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong\/><strong>AI Scaling Hits Its Limits<\/strong><\/p>\n\n\n\n<p>Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Turning energy into a strategic advantage<\/li>\n\n\n\n<li>Architecting efficient inference for real throughput gains<\/li>\n\n\n\n<li>Unlocking competitive ROI with sustainable AI systems<\/li>\n<\/ul>\n\n\n\n<p><strong>Secure your spot to stay ahead<\/strong>: https:\/\/bit.ly\/4mwGngO<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"\/>\n\n\n\n<p>Using <strong>GPT-5 to judge OpenAI\u2019s own erroneous presentation charts also did not yield helpful or correct responses<\/strong>. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"\/>\n\n\n\n<p>It also failed on this trickier math word problem below (which, to be fair, stumped this human at first\u2026<strong>though Elon Musk\u2019s Groq 4 AI answered it correctly<\/strong>. For a hint, think of the fact that flagstones in this case can\u2019t be divided into smaller portions. They must remain in tact as 80 separate units, so no halves or quarters). <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-not-as-good-at-coding-as-benchmarks-indicate\">Not as good at coding as benchmarks indicate<\/h2>\n\n\n\n<p>Even though OpenAI\u2019s internal benchmarks and some third-party external ones have shown GPT-5 to outperform all other models at coding,<strong> it appears that in real world usage, Anthropic\u2019s recently updated Claude Opus 4.1 seems to do a better job at \u201cone-shotting\u201d certain tasks<\/strong>, that is, completing the user\u2019s desired application or software build to their specifications. See an example below from developer Justin Sun posted to X :<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Opus 4.1&#8217;s one-shot attempt at &#8220;create a 3d capybara petting zoo&#8221; \u2013 8 minutes total<\/p><p>This was honestly pretty insane, not only are the capybaras way cuter and moving, there are individual pet affinity levels, a day\/night switcher, feeding, and even a screenshot feature <a href=\"https:\/\/t.co\/FiKTO3FKK4\">pic.twitter.com\/FiKTO3FKK4<\/a><\/p>\u2014 justin (@justinsunyt) <a href=\"https:\/\/twitter.com\/justinsunyt\/status\/1953554914338386216?ref_src=twsrc%5Etfw\">August 7, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>Unfortunately,<strong> OpenAI is slowly deprecating those older models \u2014 including the former default GPT-4o and the powerful reasoning model o3<\/strong> \u2014 for users of ChatGPT, though they\u2019ll continue to be available in the application programming interface (API) for developers for the foreseeable future. <\/p>\n\n\n\n<p>In addition, a report from security firm SPLX found that OpenAI\u2019s internal safety layer left major gaps in areas like business alignment and vulnerability to prompt injection and obfuscated logic attacks.\u00a0<\/p>\n\n\n\n<p>While anecdotal, the checking the temperature on how the model is faring with early AI adopters seems to indicate a chilly reception.<\/p>\n\n\n\n<p><strong>AI influencer and former Googler Bilawal Sidhu posted a poll <\/strong>on X asking for a \u201cvibe check\u201d from his followers and the wider userbase, and so far, with 172 votes in, the<strong> overwhelming response is \u201cKinda mid.\u201d<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Alright, GPT-5 vibe check<\/p>\u2014 Bilawal Sidhu (@bilawalsidhu) <a href=\"https:\/\/twitter.com\/bilawalsidhu\/status\/1953559292713611284?ref_src=twsrc%5Etfw\">August 7, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>And as the pseudonymous AI Leaks and News account wrote, <strong>\u201cThe overwhelming consensus on GPT-5 from both X and the Reddit AMA are overwhelmingly negative.\u201d<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">The overwhelming consensus on GPT-5 from both X and the Reddit AMA are overwhelmingly negative<\/p><p>Most users are disgruntled about the broken model picker and non-pro users not having access to legacy models<\/p><p>What are your initial thoughts on GPT-5?<\/p>\u2014 AI Leaks and News (@AILeaksAndNews) <a href=\"https:\/\/twitter.com\/AILeaksAndNews\/status\/1953809152104395182?ref_src=twsrc%5Etfw\">August 8, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>Tibor Blaho, lead engineer at AIPRM and a popular AI leaks and news poster on X, summarized the many problems with the ChatGPT-5 rollout in an excellent post, highlighting that one of the new marquee features <strong>\u2014 an automatic \u201crouter\u201d in ChatGPT that chooses a thinking or non-thinking mode for the underlying GPT-5 model depending on the difficulty of the query \u2014 has become one of the chief complaints,<\/strong> given the model seemed to default to non-thinking mode for many users.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">A bit sad how the GPT-5 launch is going so far, especially after the long wait and high expectations<\/p><p>\u2013 The automatic switching between models (the router) seems partly broken\/unreliable<\/p><p>\u2013 It&#8217;s unclear exactly which model you&#8217;re actually interacting with (standard or mini,\u2026<\/p>\u2014 Tibor Blaho (@btibor91) <a href=\"https:\/\/twitter.com\/btibor91\/status\/1953787629151170800?ref_src=twsrc%5Etfw\">August 8, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-competition-waiting-in-the-wings\">Competition waiting in the wings<\/h2>\n\n\n\n<p>Thus, the <strong>sentiment toward ChatGPT-5 is far from universally positive, highlighting a major problem for OpenAI<\/strong> as it faces increasing competition from major U.S. rivals like Google and Anthropic, and a growing list of free, open source and powerful Chinese LLMs offering features that many U.S. models lack. <\/p>\n\n\n\n<p>Take the <strong>Alibaba Qwen Team of AI researchers, <\/strong>who just today updated their highly performant Qwen 3 model to have 1 million token context \u2014 <strong>giving users the ability to exchange nearly 4x as much information with the model in a single back\/forth interaction as GPT-5 offers.<\/strong><\/p>\n\n\n\n<p>Given OpenAI\u2019s other big release this week \u2014 that of new open source gpt-oss models \u2014 also received a mixed reception from early users, things are not looking up for the number one dedicated AI company by users right now (700 million weekly active users of ChatGPT as of this month). <\/p>\n\n\n\n<p>Indeed, this is also exemplified by users of the betting marketplace Polymarket overwhelmingly deciding following the release of GPT-5 that<strong> Google would likely have the best AI model by the end of this month, August 2025. <\/strong><\/p>\n\n\n\n<p>Other power users like Otherside AI co-founder and CEO Matt Schumer, who received early access to GPT-5 and blogged about it favorably in a review here, <strong>opined that views would shift as more people figured out the best ways to use the new model and adjusted their integration approaches<\/strong>:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">A lot of folks who are having a bad experience are using GPT-5 in agent harnesses that aren&#8217;t yet optimized for it.<\/p><p>For every new model release, there&#8217;s a time lag between release + when companies that integrate the model have it truly working well.<\/p><p>Agent companies rush to\u2026<\/p>\u2014 Matt Shumer (@mattshumer_) <a href=\"https:\/\/twitter.com\/mattshumer_\/status\/1953869225715499366?ref_src=twsrc%5Etfw\">August 8, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>While it\u2019s still early days for GPT-5 \u2014 and the sentiment could change dramatically as more users get their hands on it and try it for different tasks \u2014 the <strong>early indications are not looking like this is a \u201chome run\u201d release for OpenAI <\/strong>in the same way that prior releases such as GPT-4, or even the newer 4o and o3, were. And that\u2019s a concerning indicator for a company that just raised yet another funding round, yet remains unprofitable due to its high costs of research and development. <\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div><template id="ljVC6P7b4DS1WwrO1hJT"></template><\/script>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/openais-gpt-5-rollout-is-not-going-smoothly\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now The launch of OpenAI\u2019s long anticipated new model, GPT-5, is off to a rocky start to say the least. Even forgiving errors in charts and voice demoes during yesterday\u2019s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3057,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3056","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-8-2025-01_24_39-PM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3056"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3056\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3057"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 18:45:18 UTC -->