Executive Summary

WARNING64%
Severity: medium

Health score is a composite indicator (task metrics + XAI quality), not raw accuracy.

Key Findings

  • Frequent confusion pair

Top Actions

  1. Reduce top class confusions: 4, 9, 7, 5, 8, 3
81.6%
accuracy
79.8%
f1_macro
0.5875
loss
0.796
Cohen’s κ
0.022
ECE (top-1)
10
Classes
10,000
Samples

Class Diagnostics

10
#ClassAccuracyPrecisionRecallF1Cohen’s κSupportPredSeverity
1191.9% 95.8% 91.9% 93.8% 0.930 11351089ok
2095.2% 81.1% 95.2% 87.6% 0.861 9801151ok
3682.5% 90.1% 82.5% 86.1% 0.847 958877ok
4778.3% 93.1% 78.3% 85.0% 0.835 1028865ok
5282.8% 86.0% 82.8% 84.4% 0.827 1032994ok
6480.1% 82.8% 80.1% 81.4% 0.794 982951ok
7374.1% 86.7% 74.1% 79.9% 0.778 1010863ok
8986.4% 67.7% 86.4% 75.9% 0.729 10091288ok
9882.2% 63.3% 82.2% 71.5% 0.680 9741266ok
10559.9% 81.4% 59.9% 69.0% 0.665 892656ok

Most Confused Pairs

4→9
145×
7→9
144×
5→8
124×
3→8
86×
2→8
81×
1→8
70×
5→3
68×
9→4
60×

Confusion Matrix

True \ Pred
0
1
2
3
4
5
6
7
8
9
0
933
1
3
8
1
34
1
1043
6
2
1
8
4
70
1
2
29
1
855
21
11
3
20
9
81
2
3
6
8
55
748
1
47
3
23
86
33
4
5
1
4
787
1
29
1
9
145
5
45
2
6
68
48
534
15
1
124
49
6
54
17
34
12
34
790
17
7
1
11
19
6
14
805
28
144
8
53
2
9
9
17
24
8
9
801
42
9
25
4
5
9
60
2
16
16
872

Findings

1
Frequent confusion pair Observed confused_pair
4→9: 145 confusions (#1 pair). 7→9: 144 confusions (#2 pair). 5→8: 124 confusions (#3 pair). ...and 1 more.
class 4class 9class 7class 5class 8class 3
count=145count=144count=124count=86

Confusion Analysis

3
Class 4 → Class 9145 confusions
Mean saliency (correct)
Correctly classified as 4 (mean saliency)
Mean saliency (confused)
Confused 4→9 (mean saliency)
When correctly classified as class 4, attention is centered, moderate spread (13% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Confused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sample
Class 7 → Class 9144 confusions
Mean saliency (correct)
Correctly classified as 7 (mean saliency)
Mean saliency (confused)
Confused 7→9 (mean saliency)
When correctly classified as class 7, attention is centered, moderate spread (15% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Confused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sample
Class 5 → Class 8124 confusions
Mean saliency (correct)
Correctly classified as 5 (mean saliency)
Mean saliency (confused)
Confused 5→8 (mean saliency)
When correctly classified as class 5, attention is centered, moderate spread (12% coverage). In confused predictions (predicted as 8), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Confused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sampleConfused sample

XAI Insights

Q = 0.25·Acc + 0.20·Focus + 0.15·Coverage + 0.15·Coherence + 0.10·Edge + 0.15·Consistency
Acc = prediction accuracy on probed samples; Focus = spatial concentration of attention; Coverage = fraction of object covered; Coherence = spatial continuity; Edge = proportion of attention away from borders; Consistency = stability across similar inputs.

Global XAI Quality: 0.868

Computed on 46 samples. Small probe set — treat as indicative, not definitive.
Class 80.616(1 samples)
accuracy
0.00
focus
0.72
coverage
1.00
coherence
1.00
edge
0.96
consistency
0.50
CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (21% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (13% coverage)
WRONG pred=0 conf=0.99
Attention: centered, moderate spread (23% coverage)
WRONG pred=0 conf=0.99
Attention: centered, diffuse (30% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (27% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (24% coverage)
Class 50.779(5 samples)
accuracy
0.80
focus
0.75
coverage
1.00
coherence
0.48
edge
0.59
consistency
0.99
CORRECT conf=0.99
Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.99
Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.99
Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.99
Attention: centered, moderate spread (12% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (14% coverage)
WRONG pred=0 conf=0.97
Attention: lower region, moderate spread (12% coverage)
WRONG pred=0 conf=0.97
Attention: lower region, moderate spread (14% coverage)
WRONG pred=3 conf=0.97
Attention: border-focused (edge=31%), moderate spread (13% coverage)
Class 60.783(5 samples)
accuracy
0.40
focus
0.72
coverage
1.00
coherence
1.00
edge
0.91
consistency
0.98
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
WRONG pred=4 conf=0.98
Attention: centered, moderate spread (12% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (13% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (17% coverage)
WRONG pred=0 conf=0.97
Attention: centered, moderate spread (27% coverage)
Class 30.883(5 samples)
accuracy
0.80
focus
0.74
coverage
1.00
coherence
1.00
edge
0.88
consistency
0.98
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
WRONG pred=7 conf=0.99
Attention: centered, moderate spread (18% coverage)
WRONG pred=2 conf=0.99
Attention: centered, moderate spread (17% coverage)
WRONG pred=7 conf=0.96
Attention: centered, moderate spread (15% coverage)
WRONG pred=7 conf=0.93
Attention: centered, moderate spread (18% coverage)
Class 20.910(5 samples)
accuracy
1.00
focus
0.74
coverage
1.00
coherence
0.85
edge
0.87
consistency
0.98
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
WRONG pred=0 conf=0.99
Attention: centered, moderate spread (19% coverage)
WRONG pred=0 conf=0.99
Attention: centered, moderate spread (19% coverage)
WRONG pred=0 conf=0.98
Attention: centered, diffuse (31% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (21% coverage)
Class 00.929(5 samples)
accuracy
1.00
focus
0.69
coverage
1.00
coherence
1.00
edge
0.92
consistency
0.99
CORRECT conf=1.00
Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (26% coverage)
WRONG pred=8 conf=0.92
Attention: centered, moderate spread (20% coverage)
WRONG pred=8 conf=0.88
Attention: centered, moderate spread (21% coverage)
WRONG pred=5 conf=0.85
Attention: centered, moderate spread (19% coverage)
WRONG pred=8 conf=0.81
Attention: centered, moderate spread (20% coverage)
Class 70.933(5 samples)
accuracy
1.00
focus
0.76
coverage
1.00
coherence
1.00
edge
0.82
consistency
0.99
CORRECT conf=1.00
Attention: centered, moderate spread (19% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)
WRONG pred=9 conf=0.97
Attention: centered, moderate spread (15% coverage)
WRONG pred=9 conf=0.95
Attention: centered, moderate spread (13% coverage)
WRONG pred=9 conf=0.95
Attention: centered, moderate spread (14% coverage)
WRONG pred=9 conf=0.95
Attention: centered, moderate spread (12% coverage)
Class 90.940(5 samples)
accuracy
1.00
focus
0.76
coverage
1.00
coherence
1.00
edge
0.91
consistency
0.98
CORRECT conf=0.98
Attention: centered, moderate spread (17% coverage)
CORRECT conf=0.98
Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.97
Attention: centered, moderate spread (15% coverage)
CORRECT conf=0.97
Attention: centered, moderate spread (16% coverage)
WRONG pred=0 conf=0.99
Attention: centered, moderate spread (20% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (25% coverage)
WRONG pred=0 conf=0.98
Attention: centered, moderate spread (23% coverage)
WRONG pred=3 conf=0.98
Attention: centered, moderate spread (23% coverage)
Class 40.948(5 samples)
accuracy
1.00
focus
0.77
coverage
1.00
coherence
1.00
edge
0.95
consistency
0.99
CORRECT conf=1.00
Attention: centered, moderate spread (15% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (14% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00
Attention: centered, moderate spread (15% coverage)
WRONG pred=6 conf=0.98
Attention: centered, moderate spread (15% coverage)
WRONG pred=9 conf=0.96
Attention: centered, moderate spread (10% coverage)
WRONG pred=9 conf=0.96
Attention: centered, moderate spread (14% coverage)
WRONG pred=6 conf=0.96
Attention: centered, moderate spread (15% coverage)
Class 10.956(5 samples)
accuracy
1.00
focus
0.83
coverage
1.00
coherence
1.00
edge
0.90
consistency
1.00
CORRECT conf=0.97
Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.96
Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.96
Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.96
Attention: centered, moderate spread (11% coverage)
WRONG pred=8 conf=0.93
Attention: centered, moderate spread (17% coverage)
WRONG pred=8 conf=0.91
Attention: centered, moderate spread (16% coverage)
WRONG pred=3 conf=0.91
Attention: centered, moderate spread (16% coverage)
WRONG pred=6 conf=0.89
Attention: centered, moderate spread (15% coverage)

Dataset Health

5,000
Scanned
1957669
Duplicate Pairs
1
Duplicate Groups
0
Flagged Images

Class Distribution

1
1,135
2
1,032
7
1,028
3
1,010
9
1,009
4
982
0
980
8
974
6
958
5
892

Near-Duplicate Images

These images were detected as near-identical by perceptual hashing (dHash). They may inflate training metrics if present in both train and validation sets.

Duplicate Group (6 images)
near_duplicates 5000 affected
5000 images in 1 near-duplicate group(s). Duplicates can bias training and inflate metrics.
Dataset health uses perceptual hashing (dHash) and statistical heuristics. False positives possible on small/synthetic datasets.

Recommendations

1
1Reduce top class confusions: 4, 9, 7, 5, 8, 3 Observed
label 4 recall=80%. label 9 recall=86%. label 7 recall=78%.
From this run: label 4 recall=80%; label 9 recall=86%; label 7 recall=78%; count=145; count=144
→ Inspect XAI overlays for both classes (see Confusion Analysis section). Pair-specific augmentation or metric learning (ArcFace; Deng et al., CVPR 2019) increases inter-class separation.
[Observed] Literature context (not verified on this run): Targeted data collection or metric learning often reduces specific confusions; quantify with your confusion matrix after changes.
📚 Deng et al., ArcFace, CVPR 2019