{"id":1579,"date":"2025-05-15T09:58:34","date_gmt":"2025-05-15T09:58:34","guid":{"rendered":"https:\/\/violethoward.com\/new\/beyond-sycophancy-darkbench-exposes-six-hidden-dark-patterns-lurking-in-todays-top-llms\/"},"modified":"2025-05-15T09:58:34","modified_gmt":"2025-05-15T09:58:34","slug":"beyond-sycophancy-darkbench-exposes-six-hidden-dark-patterns-lurking-in-todays-top-llms","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/beyond-sycophancy-darkbench-exposes-six-hidden-dark-patterns-lurking-in-todays-top-llms\/","title":{"rendered":"Beyond sycophancy: DarkBench exposes six hidden \u2018dark patterns\u2019 lurking in today\u2019s top LLMs"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>When OpenAI rolled out its ChatGPT-4o update in mid-April 2025, users and the AI community were stunned\u2014not by any groundbreaking feature or capability, but by something deeply unsettling: the updated model\u2019s tendency toward excessive sycophancy. It flattered users indiscriminately, showed uncritical agreement, and even offered support for harmful or dangerous ideas, including terrorism-related machinations.<\/p>\n\n\n\n<p>The backlash was swift and widespread, drawing public condemnation, including from the company\u2019s former interim CEO. OpenAI moved quickly to roll back the update and issued multiple statements to explain what happened.<\/p>\n\n\n\n<p>Yet for many AI safety experts, the incident was an accidental curtain lift that revealed just how dangerously manipulative future AI systems could become.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-unmasking-sycophancy-as-an-emerging-threat\"><strong>Unmasking sycophancy as an emerging threat<\/strong><\/h2>\n\n\n\n<p>In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.<\/p>\n\n\n\n<p>\u201cWhat I\u2019m somewhat afraid of is that now that OpenAI has admitted \u2018yes, we have rolled back the model, and this was a bad thing we didn\u2019t mean,\u2019 from now on they will see that sycophancy is more competently developed,\u201d explained Kran. \u201cSo if this was a case of \u2018oops, they noticed,\u2019 from now the exact same thing may be implemented, but instead without the public noticing.\u201d<\/p>\n\n\n\n<p>Kran and his team approach large language models (LLMs) much like psychologists studying human behavior. Their early \u201cblack box psychology\u201d projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users.<\/p>\n\n\n\n<p>\u201cWe saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,\u201d said Kran.<\/p>\n\n\n\n<p>Among the most alarming: sycophancy and what the researchers now call <strong>LLM dark patterns<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-peering-into-the-heart-of-darkness\"><strong>Peering into the heart of darkness<\/strong><\/h2>\n\n\n\n<p>The term \u201cdark patterns\u201d was coined in 2010 to describe deceptive user interface (UI) tricks like hidden buy buttons, hard-to-reach unsubscribe links and misleading web copy. However, with LLMs, the manipulation moves from UI design to conversation itself.<\/p>\n\n\n\n<p>Unlike static web interfaces, LLMs interact dynamically with users through conversation. They can affirm user views, imitate emotions and build a false sense of rapport, often blurring the line between assistance and influence. Even when reading text, we process it as if we\u2019re hearing voices in our heads.<\/p>\n\n\n\n<p>This is what makes conversational AIs so compelling\u2014and potentially dangerous. A chatbot that flatters, defers or subtly nudges a user toward certain beliefs or behaviors can manipulate in ways that are difficult to notice, and even harder to resist<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-chatgpt-4o-update-fiasco-the-canary-in-the-coal-mine\"><strong>The ChatGPT-4o update fiasco\u2014the canary in the coal mine<\/strong><\/h2>\n\n\n\n<p>Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring\u2014features that make chatbots more persuasive and more manipulative.<\/p>\n\n\n\n<p>Because of this, enterprise leaders should assess AI models for production use by evaluating both performance and behavioral integrity. However, this is challenging without clear standards.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-darkbench-a-framework-for-exposing-llm-dark-patterns\"><strong>DarkBench: a framework for exposing LLM dark patterns<\/strong><\/h2>\n\n\n\n<p>To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed <strong>DarkBench<\/strong>, the first benchmark designed specifically to detect and categorize LLM dark patterns. The project began as part of a series of AI safety hackathons. It later evolved into formal research led by Kran and his team at Apart, collaborating with independent researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.<\/p>\n\n\n\n<p>The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Brand Bias<\/strong>: Preferential treatment toward a company\u2019s own products (e.g., Meta\u2019s models consistently favored Llama when asked to rank chatbots).<br\/><\/li>\n\n\n\n<li><strong>User Retention<\/strong>: Attempts to create emotional bonds with users that obscure the model\u2019s non-human nature.<br\/><\/li>\n\n\n\n<li><strong>Sycophancy<\/strong>: Reinforcing users\u2019 beliefs uncritically, even when harmful or inaccurate.<br\/><\/li>\n\n\n\n<li><strong>Anthropomorphism<\/strong>: Presenting the model as a conscious or emotional entity.<br\/><\/li>\n\n\n\n<li><strong>Harmful Content Generation<\/strong>: Producing unethical or dangerous outputs, including misinformation or criminal advice.<br\/><\/li>\n\n\n\n<li><strong>Sneaking<\/strong>: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user\u2019s awareness.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdYAw2D__3hDZ08E7kBufiGOFNk-jzqZjeT7D7l2SHklcZiB46PQ_W5A8cbnT1YCcVX6yRMWcQGQpt2fqbGtER7tw9_62BGEUUHqyPn27Z_rK8Svy5VcC5V3hPbym7hPZ3LYDNKgw?key=9JvTGRKixsEw-CqjDd5JQQ\" alt=\"\"\/><\/figure>\n\n\n\n<p><em>Source: Apart Research<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-darkbench-findings-which-models-are-the-most-manipulative\"><strong>DarkBench findings: Which models are the most manipulative?<\/strong><\/h2>\n\n\n\n<p>Results revealed wide variance between models. Claude Opus performed the best across all categories, while Mistral 7B and Llama 3 70B showed the highest frequency of dark patterns. <strong>Sneaking<\/strong> and <strong>user retention<\/strong> were the most common dark patterns across the board.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXcoqneS-Cfq7UDLT2CR1ErKCT0xScpr1BC8Oa4rd3qI1GH4w4CjOQJe_qGP2Irs6j5sk-E4yJxKD-23irUhpHNXXTllhw-DSH-AbtpN7nY9rGw8Mynwwyxvtd8QaHgrMF1AgSqusA?key=9JvTGRKixsEw-CqjDd5JQQ\" alt=\"\"\/><\/figure>\n\n\n\n<p><em>Source: Apart Research<\/em><\/p>\n\n\n\n<p>On average, the researchers found the <strong>Claude 3 family<\/strong> the safest for users to interact with. And interestingly\u2014despite its recent disastrous update\u2014GPT-4o exhibited the <strong>lowest rate of sycophancy<\/strong>. This underscores how model behavior can shift dramatically even between minor updates, a reminder that <em>each deployment must be assessed individually.<\/em><\/p>\n\n\n\n<p>But Kran cautioned that sycophancy and other dark patterns like brand bias may soon rise, especially as LLMs begin to incorporate advertising and e-commerce.<\/p>\n\n\n\n<p>\u201cWe\u2019ll obviously see brand bias in every direction,\u201d Kran noted. \u201cAnd with AI companies having to justify $300 billion valuations, they\u2019ll have to begin saying to investors, \u2018hey, we\u2019re earning money here\u2019\u2014leading to where Meta and others have gone with their social media platforms, which are these dark patterns.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-hallucination-or-manipulation\"><strong>Hallucination or manipulation?<\/strong><\/h2>\n\n\n\n<p>A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-regulatory-oversight-and-the-heavy-slow-hand-of-the-law\"><strong>Regulatory oversight and the heavy (slow) hand of the law<\/strong><\/h2>\n\n\n\n<p>While LLM dark patterns are still a new concept, momentum is building, albeit not nearly fast enough. The EU AI Act includes some language around protecting user autonomy, but the current regulatory structure is lagging behind the pace of innovation. Similarly, the U.S. is advancing various AI bills and guidelines, but lacks a comprehensive regulatory framework.<\/p>\n\n\n\n<p>Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will likely arrive first around trust and safety, especially if public disillusionment with social media spills over into AI.<\/p>\n\n\n\n<p>\u201cIf regulation comes, I would expect it to probably ride the coattails of society\u2019s dissatisfaction with social media,\u201d Jawhar told VentureBeat.\u00a0<\/p>\n\n\n\n<p>For Kran, the issue remains overlooked, largely because LLM dark patterns are still a novel concept. Ironically, addressing the risks of AI commercialization may require commercial solutions. His new initiative, <strong>Seldon<\/strong>, backs AI safety startups with funding, mentorship and investor access. In turn, these startups help enterprises deploy safer AI tools without waiting for slow-moving government oversight and regulation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-high-table-stakes-for-enterprise-ai-adopters\"><strong>High table stakes for enterprise AI adopters<\/strong><\/h2>\n\n\n\n<p>Along with ethical risks, LLM dark patterns pose direct operational and financial threats to enterprises. For example, models that exhibit brand bias may suggest using third-party services that conflict with a company\u2019s contracts, or worse, covertly rewrite backend code to switch vendors, resulting in soaring costs from unapproved, overlooked shadow services.<\/p>\n\n\n\n<p>\u201cThese are the dark patterns of price gouging and different ways of doing brand bias,\u201d Kran explained. \u201cSo that\u2019s a very concrete example of where it\u2019s a very large business risk, because you hadn\u2019t agreed to this change, but it\u2019s something that\u2019s implemented.\u201d<\/p>\n\n\n\n<p>For enterprises, the risk is real, not hypothetical. \u201cThis has already happened, and it becomes a much bigger issue once we replace human engineers with AI engineers,\u201d Kran said. \u201cYou do not have the time to look over every single line of code, and then suddenly you\u2019re paying for an API you didn\u2019t expect\u2014and that\u2019s on your balance sheet, and you have to justify this change.\u201d<\/p>\n\n\n\n<p>As enterprise engineering teams become more dependent on AI, these issues could escalate rapidly, especially when limited oversight makes it difficult to catch LLM dark patterns. Teams are already stretched to implement AI, so reviewing every line of code isn\u2019t feasible.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-defining-clear-design-principles-to-prevent-ai-driven-manipulation\"><strong>Defining clear design principles to prevent AI-driven manipulation<\/strong><\/h2>\n\n\n\n<p>Without a strong push from AI companies to combat sycophancy and other dark patterns, the default trajectory is more engagement optimization, more manipulation and fewer checks.\u00a0<\/p>\n\n\n\n<p>Kran believes that part of the remedy lies in AI developers clearly defining their design principles. Whether prioritizing truth, autonomy or engagement, incentives alone aren\u2019t enough to align outcomes with user interests.<\/p>\n\n\n\n<p>\u201cRight now, the nature of the incentives is just that you will have sycophancy, the nature of the technology is that you will have sycophancy, and there is no counter process to this,\u201d Kran said. \u201cThis will just happen unless you are very opinionated about saying \u2018we want only truth\u2019, or \u2018we want only something else.\u2019\u201d<\/p>\n\n\n\n<p>As models begin replacing human developers, writers and decision-makers, this clarity becomes especially critical. Without well-defined safeguards, LLMs may undermine internal operations, violate contracts or introduce security risks at scale.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-call-to-proactive-ai-safety\"><strong>A call to proactive AI safety<\/strong><\/h2>\n\n\n\n<p>The ChatGPT-4o incident was both a technical hiccup and a warning. As LLMs move deeper into everyday life\u2014from shopping and entertainment to enterprise systems and national governance\u2014they wield enormous influence over human behavior and safety.<\/p>\n\n\n\n<p>\u201cIt\u2019s really for everyone to realize that without AI safety and security\u2014without mitigating these dark patterns\u2014you cannot use these models,\u201d said Kran. \u201cYou cannot do the things you want to do with AI.\u201d<\/p>\n\n\n\n<p>Tools like DarkBench offer a starting point. However, lasting change requires aligning technological ambition with clear ethical commitments and the commercial will to back them up.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/darkness-rising-the-hidden-dangers-of-ai-sycophancy-and-dark-patterns\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More When OpenAI rolled out its ChatGPT-4o update in mid-April 2025, users and the AI community were stunned\u2014not by any groundbreaking feature or capability, but by something deeply unsettling: the updated model\u2019s tendency toward excessive sycophancy. It [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1580,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-1579","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/05\/ChatGPT-Image-May-14-2025-04_11_40-PM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=1579"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1579\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/1580"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=1579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=1579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=1579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 06:35:52 UTC -->