{"id":4294,"date":"2025-11-08T07:20:59","date_gmt":"2025-11-08T07:20:59","guid":{"rendered":"https:\/\/violethoward.com\/new\/ship-fast-optimize-later-top-ai-engineers-dont-care-about-cost-theyre-prioritizing-deployment\/"},"modified":"2025-11-08T07:20:59","modified_gmt":"2025-11-08T07:20:59","slug":"ship-fast-optimize-later-top-ai-engineers-dont-care-about-cost-theyre-prioritizing-deployment","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/ship-fast-optimize-later-top-ai-engineers-dont-care-about-cost-theyre-prioritizing-deployment\/","title":{"rendered":"Ship fast, optimize later: top AI engineers don&#039;t care about cost \u2014 they&#039;re prioritizing deployment"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/3GLdSvkbnSK8BCy90z2vow\/fd7435230ee4d83e6f66a184129ba54b\/Recursor_Wonder_transformed.png?w=300&amp;q=30\" \/><\/p>\n<p>Across industries, rising compute expenses are often cited as a barrier to AI adoption \u2014 but leading companies are finding that cost is no longer the real constraint. <\/p>\n<p>The tougher challenges (and the ones top of mind for many tech leaders)? Latency, flexibility and capacity.<\/p>\n<p>At Wonder, for instance, AI adds a mere few cents per order; the food delivery and takeout company is much more concerned with cloud capacity with skyrocketing demands. Recursion, for its part, has been focused on balancing small and larger-scale training and deployment via on-premises clusters and the cloud; this has afforded the biotech company flexibility for rapid experimentation.<\/p>\n<p>The companies\u2019 true in-the-wild experiences highlight a broader industry trend: For enterprises operating AI at scale, economics aren&#x27;t the key decisive factor \u2014 the conversation has shifted from how to pay for AI to how fast it can be deployed and sustained.<\/p>\n<p>AI leaders from the two companies recently sat down with Venturebeat\u2019s CEO and editor-in-chief Matt Marshall as part of VB\u2019s traveling AI Impact Series. Here\u2019s what they shared. <\/p>\n<h3><b>Wonder: Rethink what you assume about capacity<\/b><\/h3>\n<p>Wonder uses AI to power everything from recommendations to logistics \u2014 yet, as of now, reported CTO James Chen, AI adds just a few cents per order. <\/p>\n<p>Chen explained that the technology component of a meal order costs 14 cents, the AI adds 2 to 3 cents, although that\u2019s \u201cgoing up really rapidly\u201d to 5 to 8 cents. Still, that seems almost immaterial compared to total operating costs. <\/p>\n<p>Instead, the 100% cloud-native AI company\u2019s main concern has been capacity with growing demand. Wonder was built with \u201cthe assumption\u201d (which proved to be incorrect) that there would be \u201cunlimited capacity\u201d so they could move \u201csuper fast\u201d and wouldn\u2019t have to worry about managing infrastructure, Chen noted. <\/p>\n<p>But the company has grown quite a bit over the last few years, he said; as a result, about six months ago, \u201cwe started getting little signals from the cloud providers, \u2018Hey, you might need to consider going to region two,\u2019\u201d because they were running out of capacity for CPU or data storage at their facilities as demand grew. <\/p>\n<p>It was \u201cvery shocking\u201d that they had to move to plan B earlier than they anticipated. \u201cObviously it&#x27;s good practice to be multi-region, but we were thinking maybe two more years down the road,\u201d said Chen. <\/p>\n<h3><b>What&#x27;s not economically feasible (yet)<\/b><\/h3>\n<p>Wonder built its own model to maximize its conversion rate, Chen noted; the goal is to surface new restaurants to relevant customers as much as possible. These are \u201cisolated scenarios\u201d where models are trained over time to be \u201cvery, very efficient and very fast.\u201d<\/p>\n<p>Currently, the best bet for Wonder\u2019s use case is large models, Chen noted. But in the long term, they\u2019d like to move to small models that are hyper-customized to individuals (via AI agents or concierges) based on their purchase history and even their clickstream. \u201cHaving these micro models is definitely the best, but right now the cost is very expensive,\u201d Chen noted. \u201cIf you try to create one for each person, it&#x27;s just not economically feasible.\u201d<\/p>\n<h3><b>Budgeting is an art, not a science<\/b><\/h3>\n<p>Wonder gives its devs and data scientists as much playroom as possible to experiment, and internal teams review the costs of use to make sure nobody turned on a model and \u201cjacked up massive compute around a huge bill,\u201d said Chen. <\/p>\n<p>The company is trying different things to offload to AI and operate within margins. \u201cBut then it&#x27;s very hard to budget because you have no idea,\u201d he said. One of the challenging things is the pace of development; when a new model comes out, \u201cwe can\u2019t just sit there, right? We have to use it.\u201d<\/p>\n<p>Budgeting for the unknown economics of a token-based system is \u201cdefinitely art versus science.\u201d<\/p>\n<p>A critical component in the software development lifecycle is preserving context when using large native models, he explained. When you find something that works, you can add it to your company\u2019s \u201ccorpus of context\u201d that can be sent with every request. That\u2019s big and it costs money each time. <\/p>\n<p>\u201cOver 50%, up to 80% of your costs is just resending the same information back into the same engine again on every request,\u201d said Chen. <\/p>\n<p>In theory, the more they do should require less cost per unit. \u201cI know when a transaction happens, I&#x27;ll pay the X cent tax for each one, but I don&#x27;t want to be limited to use the technology for all these other creative ideas.&quot;<\/p>\n<h3><b>The &#x27;vindication moment&#x27; for Recursion<\/b><\/h3>\n<p>Recursion, for its part, has focused on meeting broad-ranging compute needs via a hybrid infrastructure of on-premise clusters and cloud inference. <\/p>\n<p>When initially looking to build out its AI infrastructure, the company had to go with its own setup, as \u201cthe cloud providers didn&#x27;t have very many good offerings,\u201d explained CTO Ben Mabey. \u201cThe vindication moment was that we needed more compute and we looked to the cloud providers and they were like, \u2018Maybe in a year or so.\u2019\u201d<\/p>\n<p>The company\u2019s first cluster in 2017 incorporated Nvidia gaming GPUs (1080s, launched in 2016); they have since added Nvidia H100s and A100s, and use a Kubernetes cluster that they run in the cloud or on-prem. <\/p>\n<p>Addressing the longevity question, Mabey noted: \u201cThese gaming GPUs are actually still being used today, which is crazy, right? The myth that a GPU&#x27;s life span is only three years, that&#x27;s definitely not the case. A100s are still top of the list, they&#x27;re the workhorse of the industry.\u201d <\/p>\n<h3><b>Best use cases on-prem vs cloud; cost differences<\/b><\/h3>\n<p>More recently, Mabey\u2019s team has been training a foundation model on Recursion\u2019s image repository (which consists of petabytes of data and more than 200 pictures). This and other types of big training jobs have required a \u201cmassive cluster\u201d and connected, multi-node setups. <\/p>\n<p>\u201cWhen we need that fully-connected network and access to a lot of our data in a high parallel file system, we go on-prem,\u201d he explained. On the other hand, shorter workloads run in the cloud. <\/p>\n<p>Recursion\u2019s method is to \u201cpre-empt\u201d GPUs and Google tensor processing units (TPUs), which is the process of interrupting running GPU tasks to work on higher-priority ones. \u201cBecause we don&#x27;t care about the speed in some of these inference workloads where we&#x27;re uploading biological data, whether that&#x27;s an image or sequencing data, DNA data,\u201d Mabey explained. \u201cWe can say, \u2018Give this to us in an hour,\u2019 and we&#x27;re fine if it kills the job.\u201d <\/p>\n<p>From a cost perspective, moving large workloads on-prem is \u201cconservatively\u201d 10 times cheaper, Mabey noted; for a five year TCO, it&#x27;s half the cost. On the other hand, for smaller storage needs, the cloud can be \u201cpretty competitive\u201d cost-wise. <\/p>\n<p>Ultimately, Mabey urged tech leaders to step back and determine whether they\u2019re truly willing to commit to AI; cost-effective solutions typically require multi-year buy-ins. <\/p>\n<p>\u201cFrom a psychological perspective, I&#x27;ve seen peers of ours who will not invest in compute, and as a result they&#x27;re always paying on demand,&quot; said Mabey. &quot;Their teams use far less compute because they don&#x27;t want to run up the cloud bill. Innovation really gets hampered by people not wanting to burn money.\u201d <\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/venturebeat.com\/data-infrastructure\/ship-fast-optimize-later-top-ai-engineers-dont-care-about-cost-theyre\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Across industries, rising compute expenses are often cited as a barrier to AI adoption \u2014 but leading companies are finding that cost is no longer the real constraint. The tougher challenges (and the ones top of mind for many tech leaders)? Latency, flexibility and capacity. At Wonder, for instance, AI adds a mere few cents [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4295,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-4294","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/11\/Recursor_Wonder_transformed.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4294","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=4294"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4294\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/4295"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=4294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=4294"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=4294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d79d7d46fa5cbf45858bd1. Config Timestamp: 2026-04-09 12:37:16 UTC, Cached Timestamp: 2026-04-30 00:09:15 UTC -->