Extract → validate → transform pipeline in one call
Four-stage data pipeline: extract raw fields, validate against rules, transform to target shape, emit errors.
Production ETL where you can't afford a downstream parse failure: extract, validate at the point of extraction, transform, and surface validation errors as structured data rather than as malformed output.
The prompt
Copy this verbatim. Replace the {{ … }} placeholders with your values.
<instructions>
Process the input through four stages. Output each in its own tag.
1. <extracted>
Raw fields pulled verbatim from <input>. Use null when a field is absent.
Shape: { "<field_name>": "verbatim value or null" }
2. <validation>
Apply the rules in <rules>. For each rule, output:
{ "rule": "rule_id", "passed": true|false, "field": "field_name", "note": "string or null" }
3. <transformed>
Only if all required-tier rules passed: emit the canonical shape in <target_schema>.
Otherwise: emit null.
4. <errors>
List every failed validation, with rule_id and a human-readable message.
Empty array if validation fully passed.
Output the four tags in order. Do not skip any. Do not include prose outside the tags.
</instructions>
<rules>
{{ validation_rules }}
</rules>
<target_schema>
{{ target_schema }}
</target_schema>
<input>
{{ input_data }}
</input>
Sample input
rules: R1 (required) email present and valid format; R2 (required) age is integer 13–120; R3 (warn) phone E.164 format
target_schema: { user_id (uuid), email (lowercased), age (int), phone (E.164 or null) }
input: 'Sarah, [email protected], age 29, phone 415-555-0142'
Expected output
<extracted>
{ "name": "Sarah", "email": "[email protected]", "age": "29", "phone": "415-555-0142" }
</extracted>
<validation>
[
{ "rule": "R1", "passed": true, "field": "email", "note": null },
{ "rule": "R2", "passed": true, "field": "age", "note": "converted from string" },
{ "rule": "R3", "passed": false, "field": "phone", "note": "not E.164; missing +1 prefix" }
]
</validation>
<transformed>
{ "user_id": "(server-generated)", "email": "[email protected]", "age": 29, "phone": null }
</transformed>
<errors>
[ { "rule": "R3", "field": "phone", "message": "Phone '415-555-0142' is not E.164. Stored as null." } ]
</errors>
Notes & tuning tips
- Putting validation between extract and transform lets the model self-gate: if required rules fail,
is null and your downstream code never sees garbage. - Rules and target schema as
/ tags makes the prompt reusable across record types — only swap the parameters. - Don't ask the model to be the source of truth for IDs (uuid, timestamps); generate those server-side after a successful transform.
- For very high-volume pipelines, this is ~3× the cost of bare extraction. Worth it when the alternative is silent data corruption.
What this example uses
Tags: <instructions> <format>
Patterns: structured output
More like this
Map-reduce summarization over many chunks
Summarize each chunk independently, then synthesize a single coherent summary — two prompts, structured handoff.
complexGenerate → self-critique → revise in one call
Three-stage prompt where Claude drafts, scores its own draft against a rubric, then revises.
complexPlan-then-act with explicit sub-task scaffolding
Two-turn pattern: first turn produces a numbered plan; second turn executes each sub-task and returns structured results.
complexTree-of-thought reasoning with branch scoring
Explore three reasoning paths, score each against criteria, pick the winner — all in one prompt.
Extract → validate → transform pipeline in one call. claudexml.com. https://claudexml.com/examples/extract-validate-transform/