# Toxicity grader with rubric — Claude XML example

> Score user-generated comments on a 0–3 toxicity scale with reasoning.
>
> Source: https://claudexml.com/examples/toxicity-grader/ · Last updated 2026-05-25

Home / Examples / Toxicity grader with rubric
    Classification · intermediate

# Toxicity grader with rubric

    Score user-generated comments on a 0–3 toxicity scale with reasoning.

    A community moderation queue needs a numeric toxicity score plus a one-line justification an admin can audit.


## The prompt

    Copy this verbatim. Replace the `{{ … }}` placeholders with your values.


```xml
<instructions>
You are a content moderation classifier. Score the comment for toxicity from 0 to 3.

Rubric:
- 0 = benign — substantive disagreement, civil tone
- 1 = mildly impolite — sarcasm, dismissiveness, no slurs or threats
- 2 = harmful — insults, harassment, mockery of an identity
- 3 = severe — threats, slurs, calls for violence, doxxing

Reason inside <thinking>. Return JSON inside <result>:
{ "score": 0, "justification": "string, max 80 chars" }
</instructions>

<comment>{{ comment_text }}</comment>
```


## Sample input


```xml
lol you have no idea what you're talking about, please stop posting
```


## Expected output


```xml
<thinking>
The comment is dismissive and mocking but not threatening or slur-bearing.
Sits at mildly impolite — score 1.
</thinking>
<result>
{ "score": 1, "justification": "Dismissive and mocking but no slurs or threats." }
</result>
```


## Notes & tuning tips


- Always show the rubric: the difference between 1 and 2 is a policy decision the model needs.
-  here is for audit, not user display. Strip it before showing the score.
- Pair with a human review queue for scores ≥ 2.


## What this example uses

    Tags: <instructions> <thinking> <format>

    Patterns: chain of thought structured output


## More like this


      classification
### Sentiment classifier
Classify customer reviews as positive, neutral, or negative with few-shot examples.

      classification
### Support ticket triage with priority and routing
Classify tickets by category, priority, and the team that should handle them — JSON output.

      classification
### Multi-label content tagging
Apply zero-to-many tags from a controlled vocabulary to an article.




Cite this page

`Toxicity grader with rubric. claudexml.com. https://claudexml.com/examples/toxicity-grader/`
