Unlocking SOTA: How Reasoning Models Guide Tool Execution
In the rapidly evolving landscape of AI, “reasoning models” like DeepSeek R1 have emerged as powerful thinkers. They excel at complex logic, math, and planning. However, a common challenge has been integrating these deep thinkers with practical tool execution.
Recently, we implemented a novel approach in our system that leverages the best of both worlds: the planning capability of reasoning models and the robust execution capability of standard chat models.
The Challenge: Thinking vs. Doing
Reasoning models are trained to “think” before they answer. They generate a Chain-of-Thought (CoT) that explores the problem space. While this is fantastic for accuracy, sometimes we want to separate the planning of actions from the execution of actions. We want the “brain” to decide what to do, and the “hands” to do it precisely.
Our Solution: The “Guide, Don’t Touch” Approach
We introduced a mechanism where the reasoning model receives the full list of available tools but is explicitly instructed not to call them. Instead, its role is to analyze the user’s request and guide the subsequent model.
Here is how the flow works:
- Input: The user sends a request (e.g., “Analyze the latest stock trends for TechCorp”) along with available tools (e.g.,
get_stock_price,get_news). - Reasoning Phase: The system forwards this request to the Reasoning Model.
- The Constraint: We inject a system prompt that says:
“You are responsible for context understanding, intent reasoning, and knowledge association… Note: Do not call any tools, only suggest which tools the subsequent LLM should use.”
- Guidance: The Reasoning Model produces a “Thinking” block. It analyzes the user’s intent, breaks down the task, and decides: “First, I need to get the stock price using
get_stock_price, then search for recent news usingget_news.” - Execution Phase: This rich reasoning context is then passed to the subsequent Chat Model (like GPT-4 or a specialized tool-use model).
- Action: The Chat Model, seeing the expert plan, executes the tools exactly as prescribed.
Why This Achieves SOTA Results
This architecture mimics human problem-solving: Plan first, then act.
- Reduced Hallucinations: By forcing a reasoning step, the model is less likely to jump to incorrect tool usage.
- Complex Task Handling: For multi-step tasks, the reasoning model can outline the entire workflow before a single tool is called.
- Separation of Concerns: We can use a model optimized for reasoning (like DeepSeek R1) for the “brain” and a model optimized for instruction following and JSON output for the “hands”.
A Glimpse into the Code
In our messages.yaml, we define the template that enforces this behavior:
reasoning_template: |
You are responsible for context understanding, intent reasoning, and knowledge association...
Please think logically and cautiously. Your Chain-of-Thought will provide important reference for subsequent AI processing.
Note: Do not call any tools, only suggest which tools the subsequent LLM should use.
And in our handler logic, we capture this reasoning and inject it into the context of the final execution model, ensuring a seamless hand-off.
This simple yet effective change has significantly improved the reliability and accuracy of our agentic workflows, pushing our system closer to State-of-the-Art performance.