{"id":420,"date":"2025-03-05T01:52:24","date_gmt":"2025-03-05T01:52:24","guid":{"rendered":"https:\/\/violethoward.com\/new\/contextual-ais-new-ai-model-crushes-gpt-4o-in-accuracy-heres-why-it-matters\/"},"modified":"2025-03-05T01:52:24","modified_gmt":"2025-03-05T01:52:24","slug":"contextual-ais-new-ai-model-crushes-gpt-4o-in-accuracy-heres-why-it-matters","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/contextual-ais-new-ai-model-crushes-gpt-4o-in-accuracy-heres-why-it-matters\/","title":{"rendered":"Contextual AI\u2019s new AI model crushes GPT-4o in accuracy \u2014 here\u2019s why it matters"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Contextual AI unveiled its grounded language model (GLM) today, claiming it delivers the highest factual accuracy in the industry by outperforming leading AI systems from Google, Anthropic and OpenAI on a key benchmark for truthfulness.<\/p>\n\n\n\n<p>The startup, founded by the pioneers of retrieval-augmented generation (RAG) technology, reported that its GLM achieved an 88% factuality score on the FACTS benchmark, compared to 84.6% for Google\u2019s Gemini 2.0 Flash, 79.4% for Anthropic\u2019s Claude 3.5 Sonnet and 78.8% for OpenAI\u2019s GPT-4o.<\/p>\n\n\n\n<p>While large language models have transformed enterprise software, factual inaccuracies \u2014 often called hallucinations \u2014 remain a critical challenge for business adoption. Contextual AI aims to solve this by creating a model specifically optimized for enterprise RAG applications where accuracy is paramount.<\/p>\n\n\n\n<p>\u201cWe knew that part of the solution would be a technique called RAG \u2014 retrieval-augmented generation,\u201d said Douwe Kiela, CEO and cofounder of Contextual AI, in an exclusive interview with VentureBeat. \u201cAnd we knew that because RAG is originally my idea. What this company is about is really about doing RAG the right way, to kind of the next level of doing RAG.\u201d<\/p>\n\n\n\n<p>The company\u2019s focus differs significantly from general-purpose models like ChatGPT or Claude, which are designed to handle everything from creative writing to technical documentation. Contextual AI instead targets high-stakes enterprise environments where factual precision outweighs creative flexibility.<\/p>\n\n\n\n<p>\u201cIf you have a RAG problem and you\u2019re in an enterprise setting in a highly regulated industry, you have no tolerance whatsoever for hallucination,\u201d explained Kiela. \u201cThe same general-purpose language model that is useful for the marketing department is not what you want in an enterprise setting where you are much more sensitive to mistakes.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"2000\" height=\"1125\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?w=800\" alt=\"\" class=\"wp-image-2998547\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png 2000w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=300,169 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=768,432 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=800,450 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=1536,864 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=400,225 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=750,422 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=578,325 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Generate-Benchmark-Data.png?resize=930,523 930w\" sizes=\"(max-width: 2000px) 100vw, 2000px\"\/><figcaption class=\"wp-element-caption\">A benchmark comparison showing Contextual AI\u2019s new grounded language model (GLM) outperforming competitors from Google, Anthropic and OpenAI on factual accuracy tests. The company claims its specialized approach reduces AI hallucinations in enterprise settings.(Credit: Contextual AI)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-contextual-ai-makes-groundedness-the-new-gold-standard-for-enterprise-language-models\">How Contextual AI makes \u2018groundedness\u2019 the new gold standard for enterprise language models<\/h2>\n\n\n\n<p>The concept of \u201cgroundedness\u201d \u2014 ensuring AI responses stick strictly to information explicitly provided in the context \u2014 has emerged as a critical requirement for enterprise AI systems. In regulated industries like finance, healthcare and telecommunications, companies need AI that either delivers accurate information or explicitly acknowledges when it doesn\u2019t know something.<\/p>\n\n\n\n<p>Kiela offered an example of how this strict groundedness works: \u201cIf you give a recipe or a formula to a standard language model, and somewhere in it, you say, \u2018but this is only true for most cases,\u2019 most language models are still just going to give you the recipe assuming it\u2019s true. But our language model says, \u2018Actually, it only says that this is true for most cases.\u2019 It\u2019s capturing this additional bit of nuance.\u201d<\/p>\n\n\n\n<p>The ability to say \u201cI don\u2019t know\u201d is a crucial one for enterprise settings. \u201cWhich is really a very powerful feature, if you think about it in an enterprise setting,\u201d Kiela added.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-contextual-ai-s-rag-2-0-a-more-integrated-way-to-process-company-information\">Contextual AI\u2019s RAG 2.0: A more integrated way to process company information<\/h2>\n\n\n\n<p>Contextual AI\u2019s platform is built on what it calls \u201cRAG 2.0,\u201d an approach that moves beyond simply connecting off-the-shelf components.<\/p>\n\n\n\n<p>\u201cA typical RAG system uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework,\u201d according to a company statement. \u201cThis leads to a \u2018Frankenstein\u2019s monster\u2019 of generative AI: the individual components technically work, but the whole is far from optimal.\u201d<\/p>\n\n\n\n<p>Instead, Contextual AI jointly optimizes all components of the system. \u201cWe have this mixture-of-retrievers component, which is really a way to do intelligent retrieval,\u201d Kiela explained. \u201cIt looks at the question, and then it thinks, essentially, like most of the latest generation of models, it thinks, [and] first it plans a strategy for doing a retrieval.\u201d<\/p>\n\n\n\n<p>This entire system works in coordination with what Kiela calls \u201cthe best re-ranker in the world,\u201d which helps prioritize the most relevant information before sending it to the grounded language model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-beyond-plain-text-contextual-ai-now-reads-charts-and-connects-to-databases\">Beyond plain text: Contextual AI now reads charts and connects to databases<\/h2>\n\n\n\n<p>While the newly announced GLM focuses on text generation, Contextual AI\u2019s platform has recently added support for multimodal content including charts, diagrams and structured data from popular platforms like BigQuery, Snowflake, Redshift and Postgres.<\/p>\n\n\n\n<p>\u201cThe most challenging problems in enterprises are at the intersection of unstructured and structured data,\u201d Kiela noted. \u201cWhat I\u2019m mostly excited about is really this intersection of structured and unstructured data. Most of the really exciting problems in large enterprises are smack bang at the intersection of structured and unstructured, where you have some database records, some transactions, maybe some policy documents, maybe a bunch of other things.\u201d<\/p>\n\n\n\n<p>The platform already supports a variety of complex visualizations, including circuit diagrams in the semiconductor industry, according to Kiela.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-contextual-ai-s-future-plans-creating-more-reliable-tools-for-everyday-business\">Contextual AI\u2019s future plans: Creating more reliable tools for everyday business<\/h2>\n\n\n\n<p>Contextual AI plans to release its specialized re-ranker component shortly after the GLM launch, followed by expanded document-understanding capabilities. The company also has experimental features for more agentic capabilities in development.<\/p>\n\n\n\n<p>Founded in 2023 by Kiela and Amanpreet Singh, who previously worked at Meta\u2019s Fundamental AI Research (FAIR) team and Hugging Face, Contextual AI has secured customers including HSBC, Qualcomm and the Economist. The company positions itself as helping enterprises finally realize concrete returns on their AI investments.<\/p>\n\n\n\n<p>\u201cThis is really an opportunity for companies who are maybe under pressure to start delivering ROI from AI to start looking at more specialized solutions that actually solve their problems,\u201d Kiela said. \u201cAnd part of that really is having a grounded language model that is maybe a bit more boring than a standard language model, but it\u2019s really good at making sure that it\u2019s grounded in the context and that you can really trust it to do its job.\u201d<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/contextual-ais-new-ai-model-crushes-gpt-4o-in-accuracy-heres-why-it-matters\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Contextual AI unveiled its grounded language model (GLM) today, claiming it delivers the highest factual accuracy in the industry by outperforming leading AI systems from Google, Anthropic and OpenAI on a key benchmark for truthfulness. The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":421,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-420","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/03\/nuneybits_Vector_art_of_perfect_accuracy_arrow_bullseye_b5e4f299-ed08-4b2d-83f8-c8038b347f19.webp.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/420","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=420"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/420\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/421"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69b0ea1f46fa5c3231e56837. Config Timestamp: 2026-03-11 04:05:51 UTC, Cached Timestamp: 2026-04-08 03:20:56 UTC -->