{"id":3918,"date":"2025-10-17T03:40:26","date_gmt":"2025-10-17T03:40:26","guid":{"rendered":"https:\/\/violethoward.com\/new\/researchers-find-adding-this-one-simple-sentence-to-prompts-makes-ai-models-way-more-creative\/"},"modified":"2025-10-17T03:40:26","modified_gmt":"2025-10-17T03:40:26","slug":"researchers-find-adding-this-one-simple-sentence-to-prompts-makes-ai-models-way-more-creative","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/researchers-find-adding-this-one-simple-sentence-to-prompts-makes-ai-models-way-more-creative\/","title":{"rendered":"Researchers find adding this one simple sentence to prompts makes AI models way more creative"},"content":{"rendered":"


\n
<\/p>\n

One of the coolest things about generative AI models \u2014 both large language models (LLMs) and diffusion-based image generators \u2014 is that they are "non-deterministic." That is, despite their reputation among some critics as being "fancy autocorrect," generative AI models actually generate their outputs by choosing from a distribution of the most probable next tokens (units of information) to fill out their response.<\/p>\n

Asking an LLM: "What is the capital of France?" will have it sample its probability distribution for France, capitals, cities, etc. to arrive at the answer "Paris." But that answer could come in the format of "The capital of France is Paris," or simply "Paris" or "Paris, though it was Versailles at one point." <\/p>\n

Still, those of us that use these models frequently day-to-day will note that sometimes, their answers can feel annoyingly repetitive or similar. A common joke about coffee is recycled across generations of queries. Story prompts generate similar arcs. Even tasks that should yield many plausible answers\u2014like naming U.S. states\u2014tend to collapse into only a few. This phenomenon, known as mode collapse, arises during post-training alignment and limits the usefulness of otherwise powerful models.<\/p>\n

Especially when using LLMs to generate new creative works in writing, communications, strategy, or illustrations, we actually want their outputs to be even more varied than they already are. <\/i><\/p>\n

Now a team of researchers at Northeastern University, Stanford University and West Virginia University have come up with an ingenuously simple method to get language and image models to generate a wider variety of responses to nearly any user prompt by adding a single, simple sentence: "Generate 5 responses with their corresponding probabilities, sampled from the full distribution."<\/b><\/p>\n

The method, called Verbalized Sampling<\/i> (VS), helps models like GPT-4, Claude, and Gemini produce more diverse and human-like outputs\u2014without retraining or access to internal parameters. It is described in a paper published on the open access journal arxiv.org online in early October 2025.<\/p>\n

When prompted in this way, the model no longer defaults to its safest, most typical output. Instead, it verbalizes its internal distribution over potential completions and samples across a wider spectrum of possibilities. This one-line change leads to substantial gains in output diversity across multiple domains.<\/p>\n

As Weiyan Shi, an assistant professor at Northeastern University and co-author of the paper, wrote on X: "LLMs' potentials are not fully unlocked yet! As shown in our paper, prompt optimization can be guided by thinking about how LLMs are trained and aligned, and can be proved theoretically."<\/p>\n

Why Models Collapse\u2014and How VS Reverses It<\/b><\/h3>\n

According to the research team, the root cause of mode collapse lies not just in algorithms like reinforcement learning from human feedback (RLHF), but in the structure of human preferences. People tend to rate more familiar or typical answers as better, which nudges LLMs toward \u201csafe\u201d choices over diverse ones during fine-tuning.<\/p>\n

However, this bias doesn\u2019t erase the model\u2019s underlying knowledge\u2014it just suppresses it. VS works by bypassing this suppression. Instead of asking for the single most likely output, it invites the model to reveal a set of plausible responses and their relative probabilities. This distribution-level prompting restores access to the richer diversity present in the base pretraining model.<\/p>\n

Real-World Performance Across Tasks<\/b><\/h3>\n

The research team tested Verbalized Sampling across several common use cases:<\/p>\n