Mind Evolution: DeepMind is Teaching AI to Think Deeper Through Natural Selection

10 min readJan 28, 2025

“Unity is strength… when there is teamwork and collaboration, wonderful things can be achieved.”
- Mattie Stepanek

Today’s artificial intelligence has become quite good at tasks like generating text and answering questions, but getting AI to systematically solve complex real-world problems is still challenging. Even advanced language models struggle with tasks like planning multi-city trips or scheduling meetings across multiple time zones — the kind of problems that require keeping track of many moving parts, careful consideration of multiple constraints, and even creative problem-solving. A new approach called Mind Evolution, developed by researchers at Google DeepMind in the paper “Evolving Deeper LLM Thinking”, combines the capabilities of large language models with evolutionary search strategies. Mind Evolution achieves success rates of over 98% on challenging planning tasks — a huge improvement over previous methods that typically achieved less than 50% success. AI can now tackle complex problems without requiring them to be translated into formal mathematical representations first and can help with a much wider range of real-world planning and optimization challenges that we face every day.

The Problem with Current Approaches

Our current approaches to AI problem-solving have been caught between two limiting extremes. We have Large Language Models (LLMs) that engage in flexible natural language understanding but often struggle with systematic problem-solving — these models might generate plausible-sounding solutions but frequently miss crucial constraints, make basic logical mistakes, or fail to consider all aspects of a problem. For example, when planning a trip, they might suggest visiting attractions that aren’t open on the proposed dates or create schedules that ignore travel times between locations (a mistake I’ve made too often). The other extreme involves formal solvers — specialized algorithms that can find optimal solutions but require the problems to be precisely formulated in mathematical terms.

Solving the shortest distance between two points on a roadmap is easy to translate mathematically, while deciding whether a book is appropriate for a specific child requires understanding subtle context and maturity levels. Determining if a fantasy novel’s darker moments might be too scary for a sensitive 10-year-old who loves dragons but gets anxious about conflict requires understanding quite a bit about that particular child and that particular book. Trying to put together a formal algorithm to make this assessment would be super challenging, to say the least.

What’s been missing is an approach that combines the flexibility of natural language understanding with the systematic problem-solving capabilities of formal methods. Planning tasks that we express naturally — like “I need to meet five people tomorrow in different locations across San Francisco” or “Help me plan a 5-day trip to Italy with specific budget constraints” — require both methods. These requests are easy for us to understand but challenging to translate into the kind of formal constraints that traditional solving methods require. We have been looking for more flexible approaches that can work directly with natural language descriptions while still maintaining the rigor needed for reliable problem-solving. Enter DeepMind…

How Mind Evolution Works

Mind Evolution innovatively combines two the two approaches: the natural language capabilities of LLMs and the systematic optimization of evolutionary search. Think of it like having a team of creative writers who can generate possible solutions, combined with an editor who can evaluate and improve these solutions over multiple rounds. The system works by first maintaining a population of potential solutions, all written in natural language, and then uses an LLM to generate new solutions, combine promising ones, and refine them based on feedback. Unlike previous approaches that required problems to be translated into formal mathematical representations, Mind Evolution works directly with natural language descriptions of problems and constraints, making it much more flexible and applicable to real-world scenarios.

Mind Evolution’s approach to solving problems.

The key to Mind Evolution’s success lies in its evaluation and feedback loop. Instead of trying to formally prove a solution is correct, it uses programmatic evaluators that can check if a proposed solution works in practice. When planning a trip, the evaluator might check if all the suggested locations are actually open at the proposed times, if the budget constraints are met, and if the travel times between locations are realistic. This feedback, both positive and negative, is then used to guide the evolution of better solutions. The LLM analyzes what went wrong with previous attempts and proposes improvements, like how we refine our plans through iteration. The system learns from its mistakes and gradually converges on solutions that satisfy all constraints while still being practical.

Mind Evolution’s Key Components

Mind Evolution uses several mechanisms that work together to find optimal solutions. At its heart is a population-based search strategy, where multiple potential solutions are maintained simultaneously — it’s like having dozens of different travel itineraries or meeting schedules being considered at once. These solutions then undergo a process inspired by biological evolution, using concepts from genetic algorithms. The best solutions are selected for “reproduction” (like natural selection), combined to create new solutions (crossover), and randomly modified to explore new possibilities (mutation). This approach allows the system to explore many different approaches while continuously improving the most promising ones. It works well for both genetic diversity and conceptual diversity.

One interesting part of Mind Evolution is its use of an “island model” combined with a “Critic and Author” refinement process. The island model maintains separate populations of solutions that evolve independently, with occasional “migration” between islands — similar to how species of birds evolve differently on separate islands but occasionally intermix if they’re close enough. These islands help maintain variety in the solutions. At the same time, the Critic and Author process alternates between acting as a critic who analyzes problems with current solutions and an author who proposes improvements. This approach creates a structured way for the LLM to apply its understanding to incrementally improve solutions, like how I critically review and revise my own work. The DeepMind researchers found that this combination of approaches leads to significantly better results than simpler methods like generating many independent solutions or making sequential improvements to a single solution. Evolution FTW!

Real-World Applications: The Three Test Cases

Travel Planning

Travel planning is one of the best demonstrations of Mind Evolution’s capabilities because it involves juggling multiple real-world constraints that we typically express in natural language. The challenge involves creating detailed multi-day itineraries that must satisfy numerous constraints: staying within budget, coordinating flight times, selecting appropriate accommodations, planning restaurant visits, and ensuring all activities are possible within given time windows. If you’ve spent much time with plain LLMs, they’re not very good at these challenging types of tasks. On the TravelPlanner benchmark, Mind Evolution achieved a 95.6% success rate using Gemini 1.5 Flash (a smaller, faster model), and reached 100% success when using Gemini 1.5 Pro (a larger, smarter version) for unsolved cases. These results are a huge improvement over previous methods — basic LLM approaches achieved only 5.6% success, while even advanced methods like Best-of-N managed just 55.6%. The improvements are even more compelling in complex cases: where previous systems might create plans that ignore restaurant operating hours or suggest impossible flight connections, Mind Evolution consistently produces practical, detailed itineraries that respect all constraints. For example, in one test case, while other systems suggested accommodations that violated minimum stay requirements or exceeded budget constraints, Mind Evolution successfully crafted a three-day journey that coordinated flights, accommodations, and dining options while staying within all specified constraints. I’d be willing to try it out.

TravelPlanner benchmark results show strong performance across scenarios.

Meeting Scheduling

Meeting scheduling is another tough optimization challenge because it combines temporal, spatial, and interpersonal constraints in ways that can quickly become computationally complex. Each person has their own time windows of availability, preferred meeting durations, and location constraints, while travel times between different locations must be factored into the equation — making it difficult to maximize the number of successful meetings within a given timeframe. Mind Evolution treats it as an evolutionary optimization problem, where potential schedules are continuously refined based on how well they satisfy these complicated, overlapping constraints. Using Gemini 1.5 Flash, the system achieved an 85% success rate on the validation set, and when combined with Gemini 1.5 Pro for challenging cases, reached an impressive 98.4% success rate. Again, Mind Evolution significantly outperformed traditional approaches, with baseline LLM methods achieving only 20.8% success and Best-of-N reaching just 69.4%. Importantly, Mind Evolution maintained its high performance even as the number of people to schedule increased, clearly demonstrating its ability to handle increasingly complex scheduling scenarios that would be challenging even for human planners.

Meeting Planning results are consistently better, achieving some success even with 10 people’s schedules to navigate.

Trip Planning Across Multiple Cities

Let’s make it even more complicated! The challenge of planning trips across multiple cities adds several layers of complexity beyond simple travel planning — it requires coordinating multiple flights with specific connectivity constraints, balancing time spent in each city, and ensuring that important events (like attending specific shows or meetings) align perfectly with the itinerary. For example, if a traveler needs to attend a wedding in one city and a conference in another, the entire schedule must be built around these fixed points while respecting flight availability between cities. Mind Evolution could handle these intricate constraints, achieving a 96.2% success rate on the validation set with Gemini 1.5 Flash, and reaching 100% success when combined with Gemini 1.5 Pro. The system’s performance is particularly strong when handling trips involving 8–10 cities, where it maintained high success rates while other methods’ performance degraded significantly. Mind Evolution is operating at or above human-level planning capabilities, as it can quickly generate valid solutions for complex multi-city itineraries that would take an experienced human travel agent significant time to craft — and it does so while considering hundreds of potential constraints simultaneously.

TravelPlanner benchmark scores shine over other methods.

The StegPoet Challenge

The StegPoet challenge is an interesting new benchmark that tests AI systems’ ability to blend creative writing with precise constraint satisfaction. This task requires encoding a hidden sequence of numbers within a poem or story, using a cipher where specific words represent different numbers. The challenge isn’t just about embedding these code words — they must appear in the exact sequence specified, maintain a certain minimum distance from each other, and the resulting text must still read naturally and adhere to a specified writing style (such as emulating Shel Silverstein’s poetic style). This task is particularly interesting because it combines creative writing with steganography — the art of hiding messages in plain sight — while requiring strict adherence to multiple constraints.

Mind Evolution’s performance on StegPoet shows its ability to handle tasks that require both creativity and precision. Using Gemini 1.5 Pro, it achieved an 87.1% success rate on the validation set and 79.2% on the test set — fantastic results for a task that most existing AI systems struggle to approach. StegPoet represents a type of problem that can’t be solved through formal mathematical optimization — it requires understanding natural language, maintaining narrative coherence, and satisfying precise sequential constraints simultaneously. Mind Evolution’s approach could be valuable for other creative tasks that require balancing artistic expression with strict technical requirements, such as generating technical documentation that must be both precise and engaging or creating educational content that needs to cover specific concepts in an entertaining way.

Why Mind Evolution Matters

Until now, there’s been a gap between AI’s ability to engage in natural language conversations and its ability to solve complex problems reliably. Mind Evolution bridges this gap by showing that AI can handle intricate planning and optimization tasks without requiring problems to be translated into formal mathematical terms first. Organizations can use similar systems to handle complex logistics, scheduling, and planning tasks while communicating with them in natural language. For instance, a business could describe their supply chain constraints and goals in plain English, and the system could generate and optimize detailed operational plans while considering hundreds of interconnected factors.

In healthcare, similar systems could help optimize patient scheduling while considering complex medical constraints and resource availability. In education, they could help design personalized learning paths that balance multiple learning objectives with student preferences and available support resources. The system’s ability to handle creative tasks with precise constraints, as we saw in the StegPoet challenge, suggests applications in areas like automated content creation, where material needs to satisfy both creative and technical requirements. Mind Evolution’s success shows we’re on the verge of AI systems that can think more deeply and systematically about problems while remaining accessible to users without technical expertise in formal problem specification.

Conclusion

Mind Evolution advances AI problem-solving capabilities, showing that it’s possible to combine the flexibility of large language models with the systematic rigor of evolutionary search to solve complex real-world problems. The system’s achievement of over 98% success rates on challenging planning tasks, while working directly with natural language descriptions, pushes us toward more practical and accessible AI solutions. Mind Evolution’s success across different challenges — from travel planning to creative writing with constraints — suggests that this approach could be broadly applicable to many domains where complex planning and optimization are needed. The next steps involve scaling this technology to even more complex problems and integrating it with real-world systems where it can provide practical value. Mind Evolution shows that we’re moving closer to AI systems that can truly think deeply about problems in ways that complement and enhance human problem-solving abilities. The future of AI might lie not in choosing between the flexibility of language models and the rigor of formal methods, but in finding innovative ways to combine their strengths.