# Map-reduce summarization over many chunks — Claude XML example

> Summarize each chunk independently, then synthesize a single coherent summary — two prompts, structured handoff.
>
> Source: https://claudexml.com/examples/map-reduce-summarize/ · Last updated 2026-05-25

Home / Examples / Map-reduce summarization over many chunks
    Complex · advanced

# Map-reduce summarization over many chunks

    Summarize each chunk independently, then synthesize a single coherent summary — two prompts, structured handoff.

    Documents that exceed your context budget — or where single-pass summaries lose detail. Map each chunk to a small summary, then reduce the summaries into one.


## The prompt

    Copy this verbatim. Replace the `{{ … }}` placeholders with your values.


```xml
<!-- PASS 1 (map): run this once per chunk, in parallel -->
<instructions>
You are summarizing one chunk of a larger document. Produce JSON inside <chunk_summary> tags:

{
  "key_claims": ["string"],          // up to 5; one fact per item, verbatim where possible
  "entities": ["string"],            // people, orgs, products mentioned
  "open_questions": ["string"],      // things this chunk raises but does not answer
  "chunk_id": "{{ chunk_id }}"
}

Do not generalize beyond this chunk. Do not reference "the document" — you only see one chunk.
</instructions>

<chunk id="{{ chunk_id }}">{{ chunk_text }}</chunk>

Return inside <chunk_summary> tags.

<!-- PASS 2 (reduce): one call, taking all chunk_summary outputs as input -->
<instructions>
You will receive N per-chunk summaries inside <chunk_summary> tags. Synthesize a final summary.

Output three sections, each in its own tag:

<tldr>One paragraph, ≤120 words.</tldr>
<key_findings>Bulleted, deduplicated across chunks. Cite chunk_id in [c:ID] format.</key_findings>
<unanswered>Bulleted list of open questions that no chunk answered.</unanswered>

Rules:
- Deduplicate claims that appear in multiple chunks.
- Surface contradictions explicitly: "[c:3] states X but [c:7] states Y."
- Do not invent facts not present in any chunk_summary.
</instructions>

{{ chunk_summaries_concatenated }}
```


## Sample input


```xml
Pass 1: a 6kb chunk of a research report. Pass 2: 12 chunk_summary blobs from pass 1.
```


## Expected output


```xml
<tldr>
The report finds that small fine-tuned retrievers (110M params) match 7B baselines on domain queries
at one-tenth the cost, with most gains coming from query-log diversity rather than model size...
</tldr>
<key_findings>
- Fine-tuned 110M retriever beats 7B baseline by 4 nDCG@10 points on the in-domain test set [c:3, c:8]
- 8× latency reduction with no measurable recall loss [c:4]
- Generalization to long-tail queries is unverified [c:9, c:11]
</key_findings>
<unanswered>
- How does the approach hold up on multilingual queries?
- What is the cost of building the 50k query training set in domains without query logs?
</unanswered>
```


## Notes & tuning tips


- Two prompts, not one. Parallelize pass 1 across chunks; pass 2 runs once on the concatenated outputs.
- Per-chunk JSON output is what makes deduplication tractable in pass 2 — free prose doesn't deduplicate cleanly.
- If you can't run two passes, single-pass summarization with all chunks inline is acceptable up to ~50k tokens; beyond that, accuracy degrades sharply.
- Chunk IDs are load-bearing — they're how the final summary cites back to source positions.


## What this example uses

    Tags: <instructions> <format>

    Patterns: multi document structured output


## More like this


      complex
### Generate → self-critique → revise in one call
Three-stage prompt where Claude drafts, scores its own draft against a rubric, then revises.

      complex
### Plan-then-act with explicit sub-task scaffolding
Two-turn pattern: first turn produces a numbered plan; second turn executes each sub-task and returns structured results.

      complex
### Tree-of-thought reasoning with branch scoring
Explore three reasoning paths, score each against criteria, pick the winner — all in one prompt.

      complex
### Extract → validate → transform pipeline in one call
Four-stage data pipeline: extract raw fields, validate against rules, transform to target shape, emit errors.




Cite this page

`Map-reduce summarization over many chunks. claudexml.com. https://claudexml.com/examples/map-reduce-summarize/`
