Operational Pipeline v0.2.0Beta Multi-model 2026-04-30 20:01 UTC

Automated political fact-checking with multi-model consensus analysis.

Model panel insights

50 distinct claims across 4 frontier models. All numbers update on every report publish.

Per-model summary

ModelClaimsDissentsDissent %Truthy biasLone ↑Lone ↓
Anthropic502142%+0.3541
Google501632%+0.0611
OpenAI502346%-0.4513
xAI501530%+0.0342

Truthy bias

Average signed gap between this model’s truthy-axis score and the panel mean, per claim. Positive = leaner toward Truthy; negative = stricter.

Anthropic
+0.35
Google
+0.06
OpenAI
-0.45
xAI
+0.03

Pairwise agreement

Share of co-checked claims where the two models cast identical fine-label verdicts.

AnthropicGoogleOpenAIxAI
Anthropic34%n=5038%n=5042%n=50
Google34%n=5038%n=5054%n=50
OpenAI38%n=5038%n=5036%n=50
xAI42%n=5054%n=5036%n=50

Top extreme splits

Claims where exactly one model was the lone outlier (Δ ≥ 3 points on the truthy axis). Sorted by magnitude.

Method

Truthy-axis scores: True (+2), Mostly True (+1), Unverifiable (0), Exaggerated/Misleading (-1), False (-2). Dissents are counted against the published consensus verdict for each claim. Pairwise agreement uses the full 6-bucket fine label, not the projected 5-bucket Truthy scale, so it’s a strict measurement of label identity.

The Opus-vs-rest scan is the standalone variant that inspired this page; both share their constants via truthbot.publish.insights.LABEL_SCORE.

← About this site