Tree-of-thought reasoning with branch scoring
Explore three reasoning paths, score each against criteria, pick the winner — all in one prompt.
Problems with multiple plausible approaches (debugging, design choices, ambiguous puzzles) where the first path the model commits to isn't always the best. Explicit branching forces exploration before commitment.
The prompt
Copy this verbatim. Replace the {{ … }} placeholders with your values.
<instructions>
For the problem in <problem>, explore three distinct approaches before committing.
Output the following tags in order:
<approach_1>
<reasoning>your step-by-step thinking for this approach</reasoning>
<conclusion>what this approach would yield</conclusion>
</approach_1>
<approach_2>...</approach_2>
<approach_3>...</approach_3>
<scoring>
For each approach, score on:
- correctness (1–5): does it actually solve the problem?
- simplicity (1–5): how easy is it to verify / explain?
- robustness (1–5): does it hold up on edge cases?
Show scores as a table.
</scoring>
<choice>
State which approach wins and why, in one paragraph.
</choice>
<final>
Produce the chosen approach's final answer, polished.
</final>
Rules:
- Approaches must be genuinely distinct, not minor variations of one idea.
- Score each approach independently; don't anchor on approach 1.
- If two approaches tie, prefer simplicity over cleverness.
</instructions>
<problem>{{ problem_text }}</problem>
Sample input
Problem: Our service returns 500 errors intermittently under load. We have logs, metrics, and traces. What's the most efficient debugging approach?
Expected output
<approach_1>
<reasoning>Start with logs. Grep for 500-correlated stack traces; group by exception type...</reasoning>
<conclusion>Identifies the failing code path but doesn't tell us why it fails only under load.</conclusion>
</approach_1>
<approach_2>
<reasoning>Start with metrics. Plot 500 rate vs. concurrency, GC pause, connection pool size...</reasoning>
<conclusion>Surfaces the systemic trigger (likely connection exhaustion) but not the failing call path.</conclusion>
</approach_2>
<approach_3>
<reasoning>Start with traces. Find a sampled trace of a failing request; walk every span...</reasoning>
<conclusion>Ties the failing call path to the systemic trigger in one artifact.</conclusion>
</approach_3>
<scoring>
| Approach | Correctness | Simplicity | Robustness |
|------------|-------------|------------|------------|
| Logs | 3 | 5 | 3 |
| Metrics | 3 | 4 | 4 |
| Traces | 5 | 3 | 5 |
</scoring>
<choice>
Approach 3 wins: it gives both the where (call path) and the why (systemic conditions) in one
artifact, which the other two only give in isolation. Slightly harder to set up if tracing
isn't already on, but available here.
</choice>
<final>
[full debugging playbook using traces as the primary tool]
</final>
Notes & tuning tips
- The forced "three distinct approaches" constraint is what stops the model from collapsing into one path with three rewordings.
- Score table makes the choice auditable — useful when a human reviewer wants to overrule the model.
- Costs ~3× a direct-answer prompt in tokens. Reserve for genuinely ambiguous problems.
- Pair with chain-of-thought (
) inside each approach for harder problems.
What this example uses
Tags: <instructions>
Patterns: chain of thought
More like this
Map-reduce summarization over many chunks
Summarize each chunk independently, then synthesize a single coherent summary — two prompts, structured handoff.
complexGenerate → self-critique → revise in one call
Three-stage prompt where Claude drafts, scores its own draft against a rubric, then revises.
complexPlan-then-act with explicit sub-task scaffolding
Two-turn pattern: first turn produces a numbered plan; second turn executes each sub-task and returns structured results.
complexExtract → validate → transform pipeline in one call
Four-stage data pipeline: extract raw fields, validate against rules, transform to target shape, emit errors.
Tree-of-thought reasoning with branch scoring. claudexml.com. https://claudexml.com/examples/tree-of-thought/