{"id":2756,"date":"2025-07-25T11:41:06","date_gmt":"2025-07-25T11:41:06","guid":{"rendered":"https:\/\/violethoward.com\/new\/alibabas-new-qwen3-235b-a22b-2507-beats-kimi-2-claude-opus\/"},"modified":"2025-07-25T11:41:06","modified_gmt":"2025-07-25T11:41:06","slug":"alibabas-new-qwen3-235b-a22b-2507-beats-kimi-2-claude-opus","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/alibabas-new-qwen3-235b-a22b-2507-beats-kimi-2-claude-opus\/","title":{"rendered":"Alibaba&#8217;s new Qwen3-235B-A22B-2507 beats Kimi-2, Claude Opus"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Chinese e-commerce giant Alibaba has made waves globally in the tech and business communities with its own family of \u201cQwen\u201d generative AI large language models, beginning with the launch of the original Tongyi Qianwen LLM chatbot in April 2023 through the release of Qwen 3 in April 2025.<\/p>\n\n\n\n<p>Why? <\/p>\n\n\n\n<p>Well, not only are its models powerful and score high on third-party benchmark tests at completing math, science, reasoning, and writing tasks, but for the most part, they\u2019ve been released under permissive open source licensing terms, allowing organizations and enterprises to download them, customize them, run them, and generally use them for all variety of purposes, even commercial. Think of them as an alternative to DeepSeek. <\/p>\n\n\n\n<p>This week, Alibaba\u2019s \u201cQwen Team,\u201d as its AI division is known, released the latest updates to its Qwen family, and they\u2019re already attracting attention once more from AI power users in the West for their top performance, in one case, edging out even the new Kimi-2 model from rival Chinese AI startup Moonshot released in mid-July 2025.<\/p>\n\n\n\n<div id=\"boilerplate_2803147\" class=\"post-boilerplate boilerplate-speedbump\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>The AI Impact Series Returns to San Francisco &#8211; August 5<\/strong><\/p>\n\n\n\n<p>The next phase of AI is here &#8211; are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows &#8211; from real-time decision-making to end-to-end automation.<\/p>\n\n\n\n<p>Secure your spot now &#8211; space is limited: https:\/\/bit.ly\/3GuuPLF<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><p>The new Qwen3-235B-A22B-2507-Instruct model \u2014 released on AI code sharing community Hugging Face alongside a \u201cfloating point 8\u201d or FP8 version, which we\u2019ll cover more in-depth below \u2014 improves from the original Qwen 3 on reasoning tasks, factual accuracy, and multilingual understanding. It also outperforms Claude Opus 4\u2019s \u201cnon-thinking\u201d version. <\/p>\n\n\n\n<p>The new Qwen3 model update also delivers better coding results, alignment with user preferences, and long-context handling, according to its creators. But that\u2019s not all\u2026<\/p>\n\n\n\n<p>Read on for what else it offers enterprise users and technical decision-makers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-fp8-version-lets-enterprises-run-qwen-3-with-far-less-memory-and-far-less-compute\">FP8 version lets enterprises run Qwen 3 with far less memory and far less compute<\/h2>\n\n\n\n<p>In addition to the new Qwen3-235B-A22B-2507 model, the Qwen Team released an \u201cFP8\u201d version, which stands for <strong>8-bit floating point<\/strong>, a format that compresses the model\u2019s numerical operations to use less memory and processing power \u2014 without noticeably affecting its performance. <\/p>\n\n\n\n<p>In practice, this means organizations can run a model with Qwen3\u2019s capabilities on smaller, less expensive hardware or more efficiently in the cloud. The result is faster response times, lower energy costs, and the ability to scale deployments without needing massive infrastructure.<\/p>\n\n\n\n<p>This makes the FP8 model especially attractive for production environments with tight latency or cost constraints. Teams can scale Qwen3\u2019s capabilities to single-node GPU instances or local development machines, avoiding the need for massive multi-GPU clusters. It also lowers the barrier to private fine-tuning and on-premises deployments, where infrastructure resources are finite and total cost of ownership matters.<\/p>\n\n\n\n<p>Even though Qwen team didn\u2019t release official calculations, comparisons to similar FP8 quantized deployments suggest the efficiency savings are substantial. Here\u2019s a practical illustration (<em><strong>updated and corrected on 07\/23\/2025 at 16:04 pm ET <\/strong>\u2014 this piece originally included an inaccurate chart based on a miscalculation, I apologize for the errors<\/em> <em>and thank readers for contacting<\/em> <em>me about them.<\/em>):<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Metric<\/strong><\/th><th><strong>BF16 \/ BF16-equiv build<\/strong><\/th><th><strong>FP8 Quantized build<\/strong><\/th><\/tr><\/thead><tbody><tr><td><strong>GPU memory use<\/strong>*<\/td><td>\u2248 640 GB total (8 \u00d7 H100-80 GB, TP-8)<\/td><td>\u2248 320 GB total on the recommended 4 \u00d7 H100-80 GB, TP-4\u00a0 \u00a0 Lowest-footprint community run: ~143 GB across 2 \u00d7 H100 with Ollama off-loading<\/td><\/tr><tr><td><strong>Single-query inference speed\u2020<\/strong><\/td><td>~74 tokens \/ s (batch = 1, context = 2 k, 8 \u00d7 H20-96 GB, TP-8)<\/td><td>~72 tokens \/ s (same settings, 4 \u00d7 H20-96 GB, TP-4)<\/td><\/tr><tr><td><strong>Power \/ energy<\/strong><\/td><td>Full node of eight H100s draws ~4-4.5 kW under load (550\u2013600 W per card, plus host)\u2021<\/td><td>FP8 needs half the cards and moves half the data; NVIDIA\u2019s Hopper FP8 case-studies report \u2248 35-40 % lower TCO and energy at comparable throughput<\/td><\/tr><tr><td><strong>GPUs needed (practical)<\/strong><\/td><td>8 \u00d7 H100-80 GB (TP-8) or 8 \u00d7 A100-80 GB for parity<\/td><td>4 \u00d7 H100-80 GB (TP-4). 2 \u00d7 H100 is possible with aggressive off-loading, at the cost of latency\u00a0<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>*<strong>Disk footprint for the checkpoints:<\/strong> BF16 weights are ~500 GB; the FP8 checkpoint is \u201cwell over 200 GB,\u201d so the absolute memory savings on GPU come mostly from needing fewer cards, not from weights alone.<\/em><\/p>\n\n\n\n<p><em>\u2020Speed figures are from the Qwen3 official SGLang benchmarks (batch 1). Throughput scales almost linearly with batch size: Baseten measured ~45 tokens\/s per user at batch 32 and ~1.4 k tokens\/s aggregate on the same four-GPU FP8 setup.<\/em><\/p>\n\n\n\n<p><em>\u2021No vendor supplies exact wall-power numbers for Qwen, so we approximate using H100 board specs and NVIDIA Hopper FP8 energy-saving data.<\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-no-more-hybrid-reasoning-instead-qwen-will-release-separate-reasoning-and-instruct-models\">No more \u2018hybrid reasoning\u2019\u2026instead Qwen will release separate reasoning and instruct models!<\/h2>\n\n\n\n<p>Perhaps most interesting of all, Qwen Team announced it will no longer be pursuing a \u201chybrid\u201d reasoning approach, which it introduced back with Qwen 3 in April and seemed to be inspired by an approach pioneered by sovereign AI collective Nous Research. <\/p>\n\n\n\n<p>This allowed users to toggle on a \u201creasoning\u201d model, letting the AI model engage in its own self-checking and producing \u201cchains-of-thought\u201d before responding. <\/p>\n\n\n\n<p>In a way, it was designed to mimic the reasoning capabilities of powerful proprietary models such as OpenAI\u2019s \u201co\u201d series (o1, o3, o4-mini, o4-mini-high), which also produce \u201cchains-of-thought.\u201d<\/p>\n\n\n\n<p>However, unlike those rival models which always engage in such \u201creasoning\u201d for every prompt, Qwen 3 could have the reasoning mode manually switched on or off by the user by clicking a \u201cThinking Mode\u201d button on the Qwen website chatbot, or by typing \u201c\/think\u201d before their prompt on a local or privately run model inference. <\/p>\n\n\n\n<p>The idea was to give users control to engage the slower and more token-intensive thinking mode for more difficult prompts and tasks, and use a non-thinking mode for simpler prompts. But again, this put the onus on the user to decide. While flexible, it also introduced design complexity and inconsistent behavior in some cases.<\/p>\n\n\n\n<p>Now As Qwen team wrote in its announcement post on X: <\/p>\n\n\n\n<p><em>\u201cAfter talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we\u2019ll train Instruct and Thinking models separately so we can get the best quality possible.\u201d<\/em><\/p>\n\n\n\n<p>With the 2507 update \u2014 an instruct or NON-REASONING model only, for now \u2014 Alibaba is no longer straddling both approaches in a single model. Instead, separate model variants will be trained for instruction and reasoning tasks respectively. <\/p>\n\n\n\n<p>The result is a model that adheres more closely to user instructions, generates more predictable responses, and, as benchmark data shows, improves significantly across multiple evaluation domains.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-performance-benchmarks-and-use-cases\">Performance benchmarks and use cases<\/h2>\n\n\n\n<p>Compared to its predecessor, the Qwen3-235B-A22B-Instruct-2507 model delivers measurable improvements:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MMLU-Pro scores rise from 75.2 to 83.0<\/strong>, a notable gain in general knowledge performance.<\/li>\n\n\n\n<li><strong>GPQA and SuperGPQA benchmarks improve by 15\u201320 percentage points<\/strong>, reflecting stronger factual accuracy.<\/li>\n\n\n\n<li><strong>Reasoning tasks<\/strong> such as AIME25 and ARC-AGI show more than double the previous performance.<\/li>\n\n\n\n<li><strong>Code generation improves<\/strong>, with LiveCodeBench scores increasing from 32.9 to 51.8.<\/li>\n\n\n\n<li><strong>Multilingual support expands<\/strong>, aided by improved coverage of long-tail languages and better alignment across dialects.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" height=\"450\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?w=800\" alt=\"\" class=\"wp-image-3014560\" style=\"width:839px;height:auto\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg 1920w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=300,169 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=768,432 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=800,450 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=1536,864 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=400,225 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=750,422 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=578,325 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg?resize=930,523 930w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><\/figure>\n\n\n\n<p>The model maintains a mixture-of-experts (MoE) architecture, activating 8 out of 128 experts during inference, with a total of 235 billion parameters\u201422 billion of which are active at any time. <\/p>\n\n\n\n<p>As mentioned before, the FP8 version introduces fine-grained quantization for better inference speed and reduced memory usage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-enterprise-ready-by-design\">Enterprise-ready by design<\/h2>\n\n\n\n<p>Unlike many open-source LLMs, which are often released under restrictive research-only licenses or require API access for commercial use, Qwen3 is squarely aimed at enterprise deployment. <\/p>\n\n\n\n<p>Boasting a permissive <strong>Apache 2.0 license<\/strong>, this means enterprises can use it freely for commercial applications. They may also:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Deploy models locally or through OpenAI-compatible APIs using vLLM and SGLang<\/li>\n\n\n\n<li>Fine-tune models privately using LoRA or QLoRA without exposing proprietary data<\/li>\n\n\n\n<li>Log and inspect all prompts and outputs on-premises for compliance and auditing<\/li>\n\n\n\n<li>Scale from prototype to production using dense variants (from 0.6B to 32B) or MoE checkpoints<\/li>\n<\/ul>\n\n\n\n<p>Alibaba\u2019s team also introduced <strong>Qwen-Agent<\/strong>, a lightweight framework that abstracts tool invocation logic for users building agentic systems. <\/p>\n\n\n\n<p>Benchmarks like TAU-Retail and BFCL-v3 suggest the instruction model can competently execute multi-step decision tasks\u2014typically the domain of purpose-built agents.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-community-and-industry-reactions\">Community and industry reactions<\/h2>\n\n\n\n<p>The release has already been well received by AI power users. <\/p>\n\n\n\n<p><strong>Paul Couvert<\/strong>, AI educator and founder of private LLM chatbot host Blue Shell AI, posted a comparison chart on X showing Qwen3-235B-A22B-Instruct-2507 outperforming Claude Opus 4 and Kimi K2 on benchmarks like GPQA, AIME25, and Arena-Hard v2, calling it <em>\u201ceven more powerful than Kimi K2\u2026 and even better than Claude Opus 4.\u201d<\/em><\/p>\n\n\n\n<p>AI influencer <strong>NIK (@ns123abc)<\/strong>, commented on its rapid impact: <em>\u201cYou\u2019re laughing. Qwen-3-235B made Kimi K2 irrelevant after only one week despite being one quarter the size and you\u2019re laughing.\u201d<\/em><\/p>\n\n\n\n<p>Meanwhile, <strong>Jeff Boudier<\/strong>, head of product at Hugging Face, highlighted the deployment benefits: <em>\u201cQwen silently released a massive improvement to Qwen3\u2026 it tops best open (Kimi K2, a 4x larger model) and closed (Claude Opus 4) LLMs on benchmarks.\u201d<\/em><\/p>\n\n\n\n<p>He praised the availability of an FP8 checkpoint for faster inference, 1-click deployment on Azure ML, and support for local use via MLX on Mac or INT4 builds from Intel.<\/p>\n\n\n\n<p>The overall tone from developers has been enthusiastic, as the model\u2019s balance of performance, licensing, and deployability appeals to both hobbyists and professionals.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-s-next-for-qwen-team\"><strong>What\u2019s next for Qwen team?<\/strong><\/h2>\n\n\n\n<p>Alibaba is already laying the groundwork for future updates. A separate reasoning-focused model is in the pipeline, and the Qwen roadmap points toward increasingly agentic systems capable of long-horizon task planning. <\/p>\n\n\n\n<p>Multimodal support, seen in Qwen2.5-Omni and Qwen-VL models, is also expected to expand further.<\/p>\n\n\n\n<p>And already, rumors and rumblings have started as Qwen team members tease yet another update to their model family incoming, with updates on their web properties revealing URL strings for a new Qwen3-Coder-480B-A35B-Instruct model, likely a 480-billion parameter mixture-of-experts (MoE) with a token context of 1 million.<\/p>\n\n\n\n<p>What Qwen3-235B-A22B-Instruct-2507 ultimately signals is not just another leap in benchmark performance, but a maturation of open models as viable alternatives to proprietary systems. <\/p>\n\n\n\n<p>The flexibility of deployment, strong general performance, and enterprise-friendly licensing give the model a unique edge in a crowded field.<\/p>\n\n\n\n<p>For teams looking to integrate advanced instruction-following models into their AI stack\u2014without the limitations of vendor lock-in or usage-based fees\u2014Qwen3 is a serious contender.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/alibabas-new-open-source-qwen3-235b-a22b-2507-beats-kimi-2-and-offers-low-compute-version\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Chinese e-commerce giant Alibaba has made waves globally in the tech and business communities with its own family of \u201cQwen\u201d generative AI large language models, beginning with the launch [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2757,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-2756","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/07\/GwZbdvdbEAU2Z4H-1.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2756","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=2756"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2756\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/2757"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=2756"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=2756"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=2756"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 16:08:53 UTC -->