# Invoice / receipt line-item extractor — Claude XML example

> Convert an OCR'd invoice into structured JSON: vendor, totals, line items.
>
> Source: https://claudexml.com/examples/invoice-extractor/ · Last updated 2026-05-25

Home / Examples / Invoice / receipt line-item extractor
    Extraction · intermediate

# Invoice / receipt line-item extractor

    Convert an OCR'd invoice into structured JSON: vendor, totals, line items.

    AP automation: invoices come in as PDFs, OCR'd to messy text. You need vendor name, dates, currency, totals, and per-line-item details.


## The prompt

    Copy this verbatim. Replace the `{{ … }}` placeholders with your values.


```xml
<instructions>
Extract invoice data from the OCR text in <invoice>. Return JSON inside <result> tags.

Rules:
- If a number isn't clearly stated, use null. Do not estimate.
- Currency must be an ISO-4217 code (USD, EUR, GBP, JPY, ...).
- Dates must be ISO-8601 (YYYY-MM-DD). Convert if the source uses another format.
- Line item quantity defaults to 1 if not stated.
- Subtotal + tax should equal total within rounding; if not, flag with "totals_inconsistent": true at the top level.
</instructions>

<format>
{
  "vendor_name": "string or null",
  "vendor_tax_id": "string or null",
  "invoice_number": "string or null",
  "issue_date": "YYYY-MM-DD or null",
  "due_date": "YYYY-MM-DD or null",
  "currency": "ISO-4217 code",
  "subtotal": 0.0,
  "tax": 0.0,
  "total": 0.0,
  "totals_inconsistent": false,
  "line_items": [
    { "description": "string", "quantity": 1, "unit_price": 0.0, "amount": 0.0 }
  ]
}
</format>

<invoice>{{ invoice_ocr_text }}</invoice>

Return inside <result> tags.
```


## Sample input


```xml
ACME WIDGETS INC
Invoice #INV-2026-0512
Date: May 12, 2026  Due: 30 days net
Widget A   x2   $50.00   $100.00
Shipping            $12.00
Subtotal $112.00  Tax $9.80  Total $121.80
Thank you.
```


## Expected output


```xml
<result>
{
  "vendor_name": "ACME WIDGETS INC",
  "vendor_tax_id": null,
  "invoice_number": "INV-2026-0512",
  "issue_date": "2026-05-12",
  "due_date": "2026-06-11",
  "currency": "USD",
  "subtotal": 112.00,
  "tax": 9.80,
  "total": 121.80,
  "totals_inconsistent": false,
  "line_items": [
    { "description": "Widget A", "quantity": 2, "unit_price": 50.00, "amount": 100.00 },
    { "description": "Shipping", "quantity": 1, "unit_price": 12.00, "amount": 12.00 }
  ]
}
</result>
```


## Notes & tuning tips


- Date conversion is the most common failure — restate the format requirement.
- Add a server-side sanity check: subtotal + tax ≈ total (within $0.05) catches OCR errors.
- For multi-page invoices, send each page as a separate  and ask for a combined extraction.


## What this example uses

    Tags: <instructions> <format>

    Patterns: structured output


## More like this


      extraction
### Structured JSON extractor from email
Pull sender, subject, intent, urgency, and action items from a raw email body.

      extraction
### Résumé → structured JSON
Convert a free-form résumé into a normalized candidate profile.

      extraction
### Email → calendar event
Detect meeting requests in email and emit a structured event with start/end in ISO format.




Cite this page

`Invoice / receipt line-item extractor. claudexml.com. https://claudexml.com/examples/invoice-extractor/`
