{"id":229,"date":"2025-02-28T07:54:37","date_gmt":"2025-02-28T07:54:37","guid":{"rendered":"https:\/\/violethoward.com\/new\/industry-observers-say-gpt-4-5-is-an-odd-model-question-its-price\/"},"modified":"2025-02-28T07:54:37","modified_gmt":"2025-02-28T07:54:37","slug":"industry-observers-say-gpt-4-5-is-an-odd-model-question-its-price","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/industry-observers-say-gpt-4-5-is-an-odd-model-question-its-price\/","title":{"rendered":"Industry observers say GPT-4.5 is an &#8220;odd&#8221; model, question its price"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>OpenAI has announced the release of GPT-4.5, which CEO Sam Altman previously said would be the last non-chain-of-thought (CoT) model.\u00a0<\/p>\n\n\n\n<p>The company said the new model \u201cis not a frontier model\u201d but is still its biggest large language model (LLM), with more computational efficiency. Altman said that, even though GPT-4.5 does not reason the same way as OpenAI\u2019s other new offerings o1 or o3-mini, this new model still offers more human-like thoughtfulness.\u00a0<\/p>\n\n\n\n<p>Industry observers, many of whom had early access to the new model, have found GPT-4.5 to be an interesting move from OpenAI, tempering their expectations of what the model should be able to achieve.\u00a0<\/p>\n\n\n\n<p>Wharton professor and AI commentator Ethan Mollick posted on social media that GPT-4.5 is a \u201cvery odd and interesting model,\u201d noting it can get \u201coddly lazy on complex projects\u201d despite being a strong writer.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-bluesky-embed wp-block-embed-bluesky-embed\"\/>\n\n\n\n<p>OpenAI co-founder and former Tesla AI head Andrej Karpathy noted that GPT-4.5 made him remember when GPT-4 came out and he saw the model\u2019s potential. In a post to X, Karpathy said that, while using GPT 4.5, \u201ceverything is a little bit better, and it\u2019s awesome, but also not exactly in ways that are trivial to point to.\u201d<\/p>\n\n\n\n<p>Karpathy, however warned that people shouldn\u2019t expect revolutionary impact from the model as it \u201cdoes not push forward model capability in cases where reasoning is critical (math, code, etc.).\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-industry-thoughts-in-detail\">Industry thoughts in detail<\/h2>\n\n\n\n<p>Here\u2019s what Karpathy had to say about the latest GPT iteration in a lengthy post on X: <\/p>\n\n\n\n<p>\u201c<em>Today marks the release of GPT4.5 by OpenAI. I\u2019ve been looking forward to this for ~2 years, ever since GPT4 was released, because this release offers a qualitative measurement of the slope of improvement you get out of scaling pretraining compute (i.e. simply training a bigger model). Each 0.5 in the version is roughly 10X pretraining compute. Now, recall that GPT1 barely generates coherent text. GPT2 was a confused toy. GPT2.5 was \u201cskipped\u201d straight into GPT3, which was even more interesting. GPT3.5 crossed the threshold where it was enough to actually ship as a product and sparked OpenAI\u2019s \u201cChatGPT moment\u201d. And GPT4 in turn also felt better, but I\u2019ll say that it definitely felt subtle. <\/em><\/p>\n\n\n\n<p><em>I remember being a part of a hackathon trying to find concrete prompts where GPT4 outperformed 3.5. They definitely existed, but clear and concrete \u201cslam dunk\u201d examples were difficult to find. It\u2019s that \u2026 everything was just a little bit better but in a diffuse way. The word choice was a bit more creative. Understanding of nuance in the prompt was improved. Analogies made a bit more sense. The model was a little bit funnier. World knowledge and understanding was improved at the edges of rare domains. Hallucinations were a bit less frequent. The vibes were just a bit better. It felt like the water that rises all boats, where everything gets slightly improved by 20%. So it is with that expectation that I went into testing GPT4.5, which I had access to for a few days, and which saw 10X more pretraining compute than GPT4. And I feel like, once again, I\u2019m in the same hackathon 2 years ago. Everything is a little bit better and it\u2019s awesome, but also not exactly in ways that are trivial to point to. Still, it is incredible interesting and exciting as another qualitative measurement of a certain slope of capability that comes \u201cfor free\u201d from just pretraining a bigger model.<\/em><\/p>\n\n\n\n<p><em>Keep in mind that that GPT4.5 was only trained with pretraining, supervised finetuning and RLHF, so this is not yet a reasoning model. Therefore, this model release does not push forward model capability in cases where reasoning is critical (math, code, etc.). In these cases, training with RL and gaining thinking is incredibly important and works better, even if it is on top of an older base model (e.g. GPT4ish capability or so). The state of the art here remains the full o1. Presumably, OpenAI will now be looking to further train with reinforcement learning on top of GPT4.5 to allow it to think and push model capability in these domains.<\/em><\/p>\n\n\n\n<p><em>HOWEVER. We do actually expect to see an improvement in tasks that are not reasoning heavy, and I would say those are tasks that are more EQ (as opposed to IQ) related and bottlenecked by e.g. world knowledge, creativity, analogy making, general understanding, humor, etc. So these are the tasks that I was most interested in during my vibe checks.<\/em><\/p>\n\n\n\n<p><em>So below, I thought it would be fun to highlight 5 funny\/amusing prompts that test these capabilities, and to organize them into an interactive \u201cLM Arena Lite\u201d right here on X, using a combination of images and polls in a thread. Sadly X does not allow you to include both an image and a poll in a single post, so I have to alternate posts that give the image (showing the prompt, and two responses one from 4 and one from 4.5), and the poll, where people can vote which one is better. After 8 hours, I\u2019ll reveal the identities of which model is which. Let\u2019s see what happens \ud83d\ude42<\/em>\u201c<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-box-ceo-s-thoughts-on-gpt-4-5\">Box CEO\u2019s thoughts on GPT-4.5<\/h2>\n\n\n\n<p>Other early users also saw potential in GPT-4.5. Box CEO Aaron Levie said on X that his company used GPT-4.5 to help extract structured data and metadata from complex enterprise content.\u00a0<\/p>\n\n\n\n<p>\u201c<em>The AI breakthroughs just keep coming. OpenAI just announced GPT-4.5, and we\u2019ll be making it available to Box customers later today in the Box AI Studio.<\/em><\/p>\n\n\n\n<p><em>We\u2019ve been testing GPT4.5 in early access mode with Box AI for advanced enterprise unstructured data use-cases, and have seen strong results. With the Box AI enterprise eval, we test models against a variety of different scenarios, like Q&amp;A accuracy, reasoning capabilities and more. In particular, to explore the capabilities of GPT-4.5, we focused on a key area with significant potential for enterprise impact: The extraction of structured data, or metadata extraction, from complex enterprise content.\u00a0<\/em><\/p>\n\n\n\n<p><em>At Box, we rigorously evaluate data extraction models using multiple enterprise-grade datasets. One key dataset we leverage is CUAD, which consists of over 510 commercial legal contracts. Within this dataset, Box has identified 17,000 fields that can be extracted from unstructured content and evaluated the model based on single shot extraction for these fields (this is our hardest test, where the model only has once chance to extract all the metadata in a single pass vs. taking multiple attempts). In our tests, GPT-4.5 correctly extracted 19 percentage points more fields accurately compared to GPT-4o, highlighting its improved ability to handle nuanced contract data.<\/em><\/p>\n\n\n\n<p><em>Next, to ensure GPT-4.5 could handle the demands of real-world enterprise content, we evaluated its performance against a more rigorous set of documents, Box\u2019s own challenge set. We selected a subset of complex legal contracts \u2013 those with multi-modal content, high-density information and lengths exceeding 200 pages \u2013 to represent some of the most difficult scenarios our customers face. On this challenge set, GPT-4.5 also consistently outperformed GPT-4o in extracting key fields with higher accuracy, demonstrating its superior ability to handle intricate and nuanced legal documents.<\/em><\/p>\n\n\n\n<p><em>Overall, we\u2019re seeing strong results with GPT-4.5 for complex enterprise data, which will unlock even more use-cases in the enterprise.<\/em>\u201c<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-questions-on-price-and-its-importance\">Questions on price and its importance<\/h2>\n\n\n\n<p>Even as early users found GPT-4.5 workable \u2014 albeit a bit lazy \u2014 they questioned its release.\u00a0<\/p>\n\n\n\n<p>For instance, prominent OpenAI critic Gary Marcus called GPT-4.5 a \u201cnothingburger\u201d on Bluesky.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-bluesky-embed wp-block-embed-bluesky-embed\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"bluesky-embed\" data-bluesky-uri=\"at:\/\/did:plc:yddiyagux5ossyt5342u52fb\/app.bsky.feed.post\/3lj6r3mvuzk2v\" data-bluesky-cid=\"bafyreiaueqp5hezjxokocuugr4dg7g5ov7uhnuj2ktrcsl3ewxfves646m\"><p lang=\"en\">Hot take: GPT 4.5 is a nothingburger; GPT-5 still fantasy.\u2022\u00a0Scaling data is not a physical law; pretty much everything I told you was true.\u2022\u00a0All the BS about GPT-5 we listened to for last few years: not so true.\u2022\u00a0Fanboys like Cowen will blame users, but results just aren\u2019t what they had hoped.<\/p>\u2014 Gary Marcus (@garymarcus.bsky.social) 2025-02-27T20:44:55.115Z<\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>Hugging Face CEO Clement Delangue commented that GPT4.5\u2019s closed-source provenance makes it \u201cmeh.\u201d\u00a0<\/p>\n\n\n\n<p>However, many noted that GPT-4.5 had nothing to do with its performance. Instead, people questioned why OpenAI would release a model so expensive that it is almost prohibitive to use but is not as powerful as its other models.\u00a0<\/p>\n\n\n\n<p>One user commented on X: \u201c<em>So you\u2019re telling me GPT-4.5 is worth more than o1 yet it doesn\u2019t perform as well on benchmarks\u2026. Make it make sense<\/em>.\u201d<\/p>\n\n\n\n<p>Other X users posited theories that the high token cost could be to deter competitors like DeepSeek \u201cto distill the 4.5 model.\u201d<\/p>\n\n\n\n<p>DeepSeek became a big competitor against OpenAI in January, with industry leaders finding DeepSeek-R1 reasoning to be as capable as OpenAI\u2019s \u2014 but more affordable.\u00a0<\/p>\n\n\n\n\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/industry-observers-say-gpt-4-5-is-an-odd-model-question-its-price\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has announced the release of GPT-4.5, which CEO Sam Altman previously said would be the last non-chain-of-thought (CoT) model.\u00a0 The company said the new model \u201cis not a frontier model\u201d but is still its biggest [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":230,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-229","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/02\/openai-robot.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/229","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=229"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/229\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/230"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=229"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=229"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69b0ea1f46fa5c3231e56837. Config Timestamp: 2026-03-11 04:05:51 UTC, Cached Timestamp: 2026-04-08 03:39:08 UTC -->