Thinking in Shorthand: How Token Assorted Is Making AI Smarter and Faster

9 min read2 days ago

“Simplicity is the ultimate sophistication.”
-Leonardo da Vinci

Introduction

What if AI could think faster and reason better at the same time? We’ve talked about how AI systems like ChatGPT and Claude have become pretty good at solving complex problems by walking through their reasoning step-by-step, but this process is surprisingly inefficient. When AI models “think out loud” to help them solve a problem they generate lengthy explanations where many words simply maintain coherent language rather than advancing the core logic. This inefficiency isn’t just about wasted words — it results in higher computational costs, slower responses, and limitations on what AI can accomplish. A new research paper called “Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning” by Meta and UC Berkeley (go Bears!) offers an elegant solution by teaching AI to use a hybrid reasoning approach that mixes compact “shorthand” tokens with regular text. Like a skilled human who knows when to use shorthand versus detailed notes, these AI systems compress their initial reasoning steps while preserving textual clarity where it matters most. The results are impressive: AI systems that not only consume fewer resources by reducing reasoning length by a fifth but also become more accurate at solving mathematical, logical, and planning problems. We are increasingly able to develop AI systems that are both more efficient and more accurate at solving complex problems, pointing us towards more capable, accessible, and environmentally sustainable artificial intelligence.

The Problem with How AI Thinks Today

Modern AI systems don’t simply jump to answers — they’re trained to “show their work” through a process called chain-of-thought (CoT) reasoning which has been shown to improve reasoning capabilities. It’s like those math problems where teachers insist you write out every step, even if you could solve it in your head easily. It’s a good approach to make sure you get the right answer, but it’s not efficient. When today’s AI tackles a complex problem, it might write paragraphs detailing its reasoning: “First, I need to calculate this… Next, I should consider that… Now I can apply this formula…” This verbose thinking process helps AI reach correct answers more consistently because it breaks complex tasks into manageable steps. For example, models using CoT are much better at the game NY Times Connections because they can try out different tactics to solve the problem. It’s why chatbots can now solve math problems, analyze logical arguments, or plan multi-step processes. But this approach has an inherent inefficiency — AI must process thousands of words that exist primarily to form grammatically correct sentences rather than advance the core reasoning.

DeepSeek R1’s wordy “thought process” while reasoning through a puzzle (FYI the answer is 40, which is got right)

This inefficiency creates significant practical challenges. Processing long text sequences demands substantial computational resources, driving up energy consumption in data centers that already consume electricity at the scale of small countries. Users experience these inefficiencies as frustrating delays — waiting minutes for complex responses. The resource requirements also limit where advanced AI can operate, making it difficult to deploy on smartphones, tablets, or other devices with limited processing power. For businesses, these inefficiencies translate directly into higher operational costs, as they must provide more powerful servers to handle the same number of requests. And most concerning is the environmental impact: every unnecessary word processed contributes to AI’s growing carbon footprint. As we rely on AI systems for critical tasks, making them more efficient isn’t just about convenience — it’s about creating sustainable technology that can scale to meet our global needs.

Token Assorted: The Innovative Approach

Token Assorted introduces something interesting in AI reasoning — a hybrid approach that’s like teaching AI to use shorthand for parts of its thinking process. Instead of writing out “I need to calculate the area of a circle using π times radius squared,” imagine if you could use a single symbol that captures that entire concept. That’s essentially what this research accomplishes. The AI learns to compress whole paragraphs worth of initial reasoning steps into compact “latent tokens” that contain all the essential meaning without the linguistic verbosity. It’s similar to how expert mathematicians might skip writing basic steps because they’ve internalized them, focusing their detailed explanations only on the complex parts that truly need elaboration. This approach balances efficiency with clarity — compressing the routine thinking while preserving detailed explanation where it matters most. After all, if it only thought in latent tokens, we wouldn’t be able to understand the shorthand answer.

The system works through a clever two-part approach. First, a pattern recognition system (technically called a Vector-Quantized Variational Autoencoder) learns to identify common reasoning patterns in AI text and creates shorthand tokens that represent them. Then, the researchers developed a training method that gradually teaches AI to use this shorthand effectively. Instead of forcing the AI to use shorthand for everything at once — which would be like asking someone to learn an entirely new language overnight — they randomly mix shorthand tokens with regular text during training. This creates a smooth learning curve that helps the AI adapt quickly. The key innovation is in finding the right balance: replacing enough text to gain efficiency without sacrificing the clarity that makes AI reasoning valuable in the first place. The final result is an AI that can think more concisely at the beginning of a reasoning chain and then shift to detailed explanations for the crucial final steps, much like how human experts communicate their thinking.

Why This Matters: Real-World Benefits

The Token Assorted approach delivers immediate and tangible benefits that are beyond academic. Most importantly, it creates faster AI responses by reducing reasoning trace length by an average of 17% — imagine every AI interaction taking less time while delivering better results. That’s a win-win right there. This efficiency gain results in an improved user experience as complex queries that once took seconds now return more quickly. But perhaps more surprisingly, this compressed reasoning doesn’t just maintain accuracy — it actually improves it. The research demonstrates significant performance improvements across various domains: mathematical reasoning saw accuracy jumps of up to 13.3% on challenging problems, while planning tasks improved by nearly 20% and logical reasoning tasks by up to 18.7%. This contradicts our usual expectation that shortcuts lead to lower quality; here, the AI actually reasons more effectively when using this hybrid approach.

These improvements will become broader benefits for society and technology infrastructure. Energy efficiency is a critical advantage — with AI data centers already consuming massive amounts of electricity, a 17% reduction in processing needs could significantly reduce carbon emissions as these systems scale globally. This efficiency also democratizes access to advanced AI by lowering the hardware requirements. Complex reasoning could become accessible on smartphones, tablets, and other devices with limited computational resources, rather than requiring expensive cloud servers. For businesses and developers, this means being able to deploy more capable AI systems at lower costs, enabling new applications in resource-constrained environments like remote healthcare diagnostics, educational tools in developing regions, or emergency response systems. The Token Assorted approach essentially does more with less — creating AI that’s not just smarter, but also more sustainable and accessible.

Practical Applications

The efficiencies gained through Token Assorted could alter educational technology by creating AI tutors that provide personalized, step-by-step guidance for students. For example, a math tutor that can quickly analyze where a student is struggling and generate clear, customized explanations that focus on precisely what that student needs to understand. Because these systems can solve problems more accurately while explaining their reasoning more efficiently, they could provide better feedback to students across subjects ranging from algebra to geometry to physics. For students who need extra help, these AI tutors wouldn’t just offer generic instructions but could adapt their explanations based on individual learning patterns, providing just the right level of detail. Virtual assistants would also become significantly more capable, handling complex requests like trip planning, recipe adjustments, or budget optimization with both greater speed and accuracy, making them truly useful for day-to-day problem-solving rather than just simple tasks. They could even run locally, privately on our devices without sacrificing smarts.

In critical professional fields, the implications could even be more impactful. Healthcare diagnostics could benefit from AI systems that efficiently process patient data and medical literature to suggest possible diagnoses with clearly explained reasoning, helping doctors consider options they might otherwise overlook. Scientific researchers could employ these systems to generate and evaluate hypotheses across vast datasets, potentially accelerating discoveries in fields from materials science to drug development. The financial sector stands to gain from more accurate risk assessments and market analyses that can both process more variables and explain their significance more clearly. What makes these applications particularly valuable is that Token Assorted’s approach provides transparency even with improved efficiency — the AI can still show its most critical reasoning steps where needed, enabling us to verify the logic rather than just accepting a black-box answer. This balance of efficiency and explainability makes this technology applicable even in highly regulated domains where understanding the reasoning process is legally or ethically required.

The Future of AI Reasoning

I’ve frequently written about AI reasoning and Token Assorted represents an important step in the broader evolution of AI reasoning capabilities, aligning with several important trends in AI development. First, it addresses the growing tension between model capability and computational efficiency — a concern as AI systems scale in both size and deployment. This research also connects to the increasing focus on making AI reasoning more transparent and interpretable, as it preserves explicit reasoning where most needed while compressing routine steps. It’s part of a larger, important movement away from treating language models as black boxes and toward systems that can explain their decision-making processes in ways humans can understand and trust. We’re likely moving toward AI systems that can dynamically adjust their reasoning style based on task complexity, audience expertise, and available computational resources — much like humans adapt their communication style to different contexts.

In the future, we might soon see more sophisticated approaches to efficient reasoning develop. Future systems could create personalized compression patterns based on user interaction history, compressing familiar reasoning paths while elaborating on new concepts. Deepseek’s R1 has already shown what reinforcement learning can achieve. We may also see reasoning compression extended to multimodal systems that combine text, images, and other data types into unified reasoning processes. However, there is still a lot of work to be done. How do we ensure that compressed reasoning doesn’t inadvertently encode biases or flawed logic that becomes harder to detect? Can these compressed reasoning patterns transfer effectively across different domains and tasks? And how do we determine the optimal balance between compression and explanation for different applications? Perhaps most importantly, this research might fundamentally change how humans and AI interact — moving from the current pattern where AI either provides complete reasoning or none at all, to a more collaborative model where users can request elaboration only on specific reasoning steps they find unclear or questionable. This could create a more natural, efficient dialogue that resembles human-to-human expert communication rather than the sometimes tediously verbose AI explanations we experience today.

Conclusion

Token Assorted represents a interesting advancement in artificial intelligence — one that challenges the assumption that AI must choose between being fast or being thorough. By teaching AI to use a hybrid reasoning approach that combines compact “shorthand” tokens with detailed textual explanations that we can understand, researchers have created systems that are both more efficient and more accurate. This 17% reduction in reasoning length might seem incremental, but it represents substantial energy savings, faster response times, and improved accessibility that will compound as AI becomes increasingly embedded in our daily lives and infrastructure. The research elegantly solves a tension in AI development: balancing the efficiency needed for practical deployment with the clarity required for trust and verification. As AI systems become more powerful and more prevalent in our lives, how they reason — and how efficiently they do it — will directly impact everything from the devices we can use to access them to the environmental sustainability of our digital infrastructure. We should all care about more efficient AI reasoning not just as a technical achievement, but as an essential step toward AI that can serve our needs effectively, equitably, and sustainably. Token Assorted reminds us that sometimes the most profound innovations aren’t about entirely new capabilities, but about making existing ones work better for the real world.