Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads. Speculators are smaller AI models that work alongside large language models during inference. They draft multiple tokens ahead, which the main model then verifies in parallel. This technique (called speculative decoding) has become essential…
Presented by Certinia Every professional services leader knows the feeling: a pipeline full of promising deals, but a bench that’s already stretched thin. That’s because growth has always been tied to a finite supply of consultants with finite availability to work on projects. Even with strong market demand, most firms only capture 10-20% of their…
The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more AI services into one platform, even integrating into where employees do their work. OpenAI’s ChatGPT, although still a separate window, is gradually…
OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to life. But for the enterprises and technical leaders watching closely, the most consequential announcement was the quiet general availability of Codex, the company's AI software…
Echelon, an artificial intelligence startup that automates enterprise software implementations, emerged from stealth mode today with $4.75 million in seed funding led by Bain Capital Ventures, targeting a fundamental shift in how companies deploy and maintain critical business systems. The San Francisco-based company has developed AI agents specifically trained to handle end-to-end ServiceNow implementations —…
In a packed theater at Fort Mason, after a whirlwind keynote of product announcements, OpenAI CEO Sam Altman sat down with Sir Jony Ive, the legendary designer behind Apple's most iconic products. The conversation, held exclusively for the 1,500 developers in attendance and not part of the public livestream, offered the clearest glimpse yet into…
Researchers at the University of Illinois Urbana-Champaign and Google Cloud AI Research have developed a framework that enables large language model (LLM) agents to organize their experiences into a memory bank, helping them get better at complex tasks over time. The framework, called ReasoningBank, distills “generalizable reasoning strategies” from an agent’s successful and failed attempts…
Many organizations would be hesitant to overhaul their tech stack and start from scratch. Not Notion. For the 3.0 version of its productivity software (released in September), the company didn’t hesitate to rebuild from the ground up; they recognized that it was necessary, in fact, to support agentic AI at enterprise scale. Whereas traditional AI-powered…
The latest addition to the small model wave for enterprises comes from AI21 Labs, which is betting that bringing models to devices will free up traffic in data centers. AI21’s Jamba Reasoning 3B, a “tiny” open-source model that can run extended reasoning, code generation and respond based on ground truth. Jamba Reasoning 3B handles more…
A cycle-accurate alternative to speculation — unifying scalar, vector and matrix compute For more than half a century, computing has relied on the Von Neumann or Harvard model. Nearly every modern chip — CPUs, GPUs and even many specialized accelerators — derives from this design. Over time, new architectures like Very Long Instruction Word (VLIW),…