Extraction · intermediate
Invoice / receipt line-item extractor
Convert an OCR'd invoice into structured JSON: vendor, totals, line items.
AP automation: invoices come in as PDFs, OCR'd to messy text. You need vendor name, dates, currency, totals, and per-line-item details.
The prompt
Copy this verbatim. Replace the {{ … }} placeholders with your values.
<instructions>
Extract invoice data from the OCR text in <invoice>. Return JSON inside <result> tags.
Rules:
- If a number isn't clearly stated, use null. Do not estimate.
- Currency must be an ISO-4217 code (USD, EUR, GBP, JPY, ...).
- Dates must be ISO-8601 (YYYY-MM-DD). Convert if the source uses another format.
- Line item quantity defaults to 1 if not stated.
- Subtotal + tax should equal total within rounding; if not, flag with "totals_inconsistent": true at the top level.
</instructions>
<format>
{
"vendor_name": "string or null",
"vendor_tax_id": "string or null",
"invoice_number": "string or null",
"issue_date": "YYYY-MM-DD or null",
"due_date": "YYYY-MM-DD or null",
"currency": "ISO-4217 code",
"subtotal": 0.0,
"tax": 0.0,
"total": 0.0,
"totals_inconsistent": false,
"line_items": [
{ "description": "string", "quantity": 1, "unit_price": 0.0, "amount": 0.0 }
]
}
</format>
<invoice>{{ invoice_ocr_text }}</invoice>
Return inside <result> tags.
Sample input
ACME WIDGETS INC
Invoice #INV-2026-0512
Date: May 12, 2026 Due: 30 days net
Widget A x2 $50.00 $100.00
Shipping $12.00
Subtotal $112.00 Tax $9.80 Total $121.80
Thank you.
Expected output
<result>
{
"vendor_name": "ACME WIDGETS INC",
"vendor_tax_id": null,
"invoice_number": "INV-2026-0512",
"issue_date": "2026-05-12",
"due_date": "2026-06-11",
"currency": "USD",
"subtotal": 112.00,
"tax": 9.80,
"total": 121.80,
"totals_inconsistent": false,
"line_items": [
{ "description": "Widget A", "quantity": 2, "unit_price": 50.00, "amount": 100.00 },
{ "description": "Shipping", "quantity": 1, "unit_price": 12.00, "amount": 12.00 }
]
}
</result>
Notes & tuning tips
- Date conversion is the most common failure — restate the format requirement.
- Add a server-side sanity check: subtotal + tax ≈ total (within $0.05) catches OCR errors.
- For multi-page invoices, send each page as a separate
and ask for a combined extraction.
What this example uses
Tags: <instructions> <format>
Patterns: structured output
More like this
extraction
Structured JSON extractor from email
Pull sender, subject, intent, urgency, and action items from a raw email body.
extractionRésumé → structured JSON
Convert a free-form résumé into a normalized candidate profile.
extractionEmail → calendar event
Detect meeting requests in email and emit a structured event with start/end in ISO format.
Cite this page
Invoice / receipt line-item extractor. claudexml.com. https://claudexml.com/examples/invoice-extractor/