{"id":2765,"date":"2025-07-25T17:09:22","date_gmt":"2025-07-25T17:09:22","guid":{"rendered":"https:\/\/violethoward.com\/new\/its-qwens-summer-qwen3-235b-a22b-thinking-2507-tops-charts\/"},"modified":"2025-07-25T17:09:22","modified_gmt":"2025-07-25T17:09:22","slug":"its-qwens-summer-qwen3-235b-a22b-thinking-2507-tops-charts","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/its-qwens-summer-qwen3-235b-a22b-thinking-2507-tops-charts\/","title":{"rendered":"It&#8217;s Qwen&#8217;s summer: Qwen3-235B-A22B-Thinking-2507 tops charts"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>If the AI industry had an equivalent to the recording industry\u2019s \u201csong of the summer\u201d \u2014 a hit that catches on in the warmer months here in the Northern Hemisphere and is heard playing everywhere \u2014 the clear honoree for that title would go to Alibaba\u2019s Qwen Team.<\/p>\n\n\n\n<p>Over just the past week, the frontier model AI research division of the Chinese e-commerce behemoth has released not one, not two, not three, but four (!!) new open source generative AI models that offer record-setting benchmarks, besting even some leading proprietary options.<\/p>\n\n\n\n<p>Last night, Qwen Team capped it off with the release of <strong>Qwen3-235B-A22B-Thinking-2507<\/strong>, it\u2019s updated reasoning large language model (LLM), which takes longer to respond than a non-reasoning or \u201cinstruct\u201d LLM, engaging in \u201cchains-of-thought\u201d or self-reflection and self-checking that hopefully result in more correct and comprehensive responses on more difficult tasks. <\/p>\n\n\n\n<p>Indeed, the new Qwen3-Thinking-2507, as we\u2019ll call it for short, now leads or closely trails top-performing models across several major benchmarks. <\/p>\n\n\n\n<div id=\"boilerplate_2803147\" class=\"post-boilerplate boilerplate-speedbump\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>The AI Impact Series Returns to San Francisco &#8211; August 5<\/strong><\/p>\n\n\n\n<p>The next phase of AI is here &#8211; are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows &#8211; from real-time decision-making to end-to-end automation.<\/p>\n\n\n\n<p>Secure your spot now &#8211; space is limited: https:\/\/bit.ly\/3GuuPLF<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><p>As AI influencer and news aggregator Andrew Curran wrote on X: \u201cQwen\u2019s strongest reasoning model has arrived, and it is at the frontier.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" height=\"450\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?w=800\" alt=\"\" class=\"wp-image-3014682\" style=\"width:840px;height:auto\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg 1920w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=300,169 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=768,432 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=800,450 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=1536,864 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=400,225 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=750,422 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=578,325 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg?resize=930,523 930w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><\/figure>\n\n\n\n<p>In the <strong>AIME25<\/strong> benchmark\u2014designed to evaluate problem-solving ability in mathematical and logical contexts \u2014 <strong>Qwen3-Thinking-2507 leads all reported models<\/strong> with a score of <strong>92.3<\/strong>, narrowly surpassing both OpenAI\u2019s o4-mini (<strong>92.7<\/strong>) and Gemini-2.5 Pro (<strong>88.0<\/strong>). <\/p>\n\n\n\n<p>The model also shows a commanding performance on <strong>LiveCodeBench v6<\/strong>, <strong>scoring 74.1, ahead of Google Gemini-2.5 Pro (72.5), OpenAI o4-mini (71.8)<\/strong>, and significantly outperforming its earlier version, which posted <strong>55.7<\/strong>.<\/p>\n\n\n\n<p>In <strong>GPQA<\/strong>, a benchmark for graduate-level multiple-choice questions, the model achieves <strong>81.1<\/strong>, nearly matching Deepseek-R1-0528 (<strong>81.0<\/strong>) and trailing Gemini-2.5 Pro\u2019s top mark of <strong>86.4<\/strong>. <\/p>\n\n\n\n<p>On <strong>Arena-Hard v2<\/strong>, which evaluates alignment and subjective preference through win rates, Qwen3-Thinking-2507 scores <strong>79.7<\/strong>, placing it ahead of all competitors.<\/p>\n\n\n\n<p>The results show that this model not only surpasses its predecessor in every major category but also sets a new standard for what open-source, reasoning-focused models can achieve.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-shift-away-from-hybrid-reasoning\">A shift away from \u2018hybrid reasoning\u2019<\/h2>\n\n\n\n<p>The release of Qwen3-Thinking-2507 reflects a broader strategic shift by Alibaba\u2019s Qwen team: moving away from hybrid reasoning models that required users to manually toggle between \u201cthinking\u201d and \u201cnon-thinking\u201d modes. <\/p>\n\n\n\n<p>Instead, the team is now training separate models for reasoning and instruction tasks. This separation allows each model to be optimized for its intended purpose\u2014resulting in improved consistency, clarity, and benchmark performance. The new Qwen3-Thinking model fully embodies this design philosophy.<\/p>\n\n\n\n<p>Alongside it, Qwen launched <strong>Qwen3-Coder-480B-A35B-Instruct<\/strong>, a 480B-parameter model built for complex coding workflows. It supports 1 million token context windows and outperforms GPT-4.1 and Gemini 2.5 Pro on SWE-bench Verified.<\/p>\n\n\n\n<p>Also announced was <strong>Qwen3-MT<\/strong>, a multilingual translation model trained on trillions of tokens across 92+ languages. It supports domain adaptation, terminology control, and inference from just $0.50 per million tokens.<\/p>\n\n\n\n<p>Earlier in the week, the team released <strong>Qwen3-235B-A22B-Instruct-2507<\/strong>, a non-reasoning model that surpassed Claude Opus 4 on several benchmarks and introduced a lightweight FP8 variant for more efficient inference on constrained hardware.<\/p>\n\n\n\n<p>All models are licensed under Apache 2.0 and are available through Hugging Face, ModelScope, and the Qwen API.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-licensing-apache-2-0-and-its-enterprise-advantage\">Licensing: Apache 2.0 and its enterprise advantage<\/h2>\n\n\n\n<p>Qwen3-235B-A22B-Thinking-2507 is released under the <strong>Apache 2.0 license<\/strong>, a highly permissive and commercially friendly license that allows enterprises to download, modify, self-host, fine-tune, and integrate the model into proprietary systems without restriction.<\/p>\n\n\n\n<p>This stands in contrast to proprietary models or research-only open releases, which often require API access, impose usage limits, or prohibit commercial deployment. For compliance-conscious organizations and teams looking to control cost, latency, and data privacy, Apache 2.0 licensing enables full flexibility and ownership.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-availability-and-pricing\">Availability and pricing<\/h2>\n\n\n\n<p>Qwen3-235B-A22B-Thinking-2507 is available now for free download on Hugging Face and ModelScope. <\/p>\n\n\n\n<p>For those enterprises who don\u2019t want to or don\u2019t have the resources and capability to host the model inference on their own hardware or virtual private cloud through Alibaba Cloud\u2019s API, vLLM, and SGLang.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Input price:<\/strong> $0.70 per million tokens<\/li>\n\n\n\n<li><strong>Output price:<\/strong> $8.40 per million tokens<\/li>\n\n\n\n<li><strong>Free tier:<\/strong> 1 million tokens, valid for 180 days<\/li>\n<\/ul>\n\n\n\n\n\n\n\n<p>The model is compatible with agentic frameworks via <strong>Qwen-Agent<\/strong>, and supports advanced deployment via OpenAI-compatible APIs. <\/p>\n\n\n\n<p>It can also be run locally using transformer frameworks or integrated into dev stacks through Node.js, CLI tools, or structured prompting interfaces.<\/p>\n\n\n\n<p>Sampling settings for best performance include <strong>temperature=0.6<\/strong>, <strong>top_p=0.95<\/strong>, and <strong>max output length of 81,920 tokens<\/strong> for complex tasks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-enterprise-applications-and-future-outlook\">Enterprise applications and future outlook<\/h2>\n\n\n\n<p>With its strong benchmark performance, long-context capability, and permissive licensing, Qwen3-Thinking-2507 is particularly well suited for use in enterprise AI systems involving reasoning, planning, and decision support.<\/p>\n\n\n\n<p>The broader Qwen3 ecosystem \u2014 including coding, instruction, and translation models\u2014further extends the appeal to technical teams and business units looking to incorporate AI across verticals like engineering, localization, customer support, and research.<\/p>\n\n\n\n<p>The Qwen team\u2019s decision to release specialized models for distinct use cases, backed by technical transparency and community support, signals a deliberate shift toward building <strong>open, performant, and production-ready AI infrastructure<\/strong>.<\/p>\n\n\n\n<p>As more enterprises seek alternatives to API-gated, black-box models, Alibaba\u2019s Qwen series increasingly positions itself as a viable open-source foundation for intelligent systems\u2014offering both control and capability at scale.<\/p>\n\n\n\n\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/its-qwens-summer-new-open-source-qwen3-235b-a22b-thinking-2507-tops-openai-gemini-reasoning-models-on-key-benchmarks\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now If the AI industry had an equivalent to the recording industry\u2019s \u201csong of the summer\u201d \u2014 a hit that catches on in the warmer months here in the Northern [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2766,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-2765","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/07\/GwshKhhagAA7pbb-1.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=2765"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2765\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/2766"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=2765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=2765"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=2765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 16:08:44 UTC -->