Confidence levels:
Observed = measured from data
Likely = heuristic signal
Suspected = needs verification
WARNING 64% Severity: medium Health score is a composite indicator (task metrics + XAI quality), not raw accuracy.
Top Actions Reduce top class confusions: 4, 9, 7, 5, 8, 3
# Class Accuracy Precision Recall F1 Cohen’s κ Support Pred Severity 1 1 91.9% 95.8% 91.9% 93.8% 0.930 1135 1089 ok 2 0 95.2% 81.1% 95.2% 87.6% 0.861 980 1151 ok 3 6 82.5% 90.1% 82.5% 86.1% 0.847 958 877 ok 4 7 78.3% 93.1% 78.3% 85.0% 0.835 1028 865 ok 5 2 82.8% 86.0% 82.8% 84.4% 0.827 1032 994 ok 6 4 80.1% 82.8% 80.1% 81.4% 0.794 982 951 ok 7 3 74.1% 86.7% 74.1% 79.9% 0.778 1010 863 ok 8 9 86.4% 67.7% 86.4% 75.9% 0.729 1009 1288 ok 9 8 82.2% 63.3% 82.2% 71.5% 0.680 974 1266 ok 10 5 59.9% 81.4% 59.9% 69.0% 0.665 892 656 ok
True \ Pred
0
933
1
3
8
1
34
1
1043
6
2
1
8
4
70
1
2
29
1
855
21
11
3
20
9
81
2
3
6
8
55
748
1
47
3
23
86
33
4
5
1
4
787
1
29
1
9
145
5
45
2
6
68
48
534
15
1
124
49
6
54
17
34
12
34
790
17
7
1
11
19
6
14
805
28
144
8
53
2
9
9
17
24
8
9
801
42
9
25
4
5
9
60
2
16
16
872
4→9: 145 confusions (#1 pair). 7→9: 144 confusions (#2 pair). 5→8: 124 confusions (#3 pair). ...and 1 more.
class 4 class 9 class 7 class 5 class 8 class 3
count=145 count=144 count=124 count=86
Correctly classified as 4 (mean saliency)
Confused 4→9 (mean saliency)
When correctly classified as class 4, attention is centered, moderate spread (13% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Correctly classified as 7 (mean saliency)
Confused 7→9 (mean saliency)
When correctly classified as class 7, attention is centered, moderate spread (15% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Correctly classified as 5 (mean saliency)
Confused 5→8 (mean saliency)
When correctly classified as class 5, attention is centered, moderate spread (12% coverage). In confused predictions (predicted as 8), attention shifts to centered, moderate spread (13% coverage).
Individual confused samples (8)
Global XAI Quality: 0.868 Computed on 46 samples. Small probe set — treat as indicative, not definitive.
CORRECT conf=1.00 Attention: centered, moderate spread (20% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (21% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (20% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (13% coverage)
WRONG pred=0 conf=0.99 Attention: centered, moderate spread (23% coverage)
WRONG pred=0 conf=0.99 Attention: centered, diffuse (30% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (27% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (24% coverage)
CORRECT conf=0.99 Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.99 Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.99 Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.99 Attention: centered, moderate spread (12% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (14% coverage)
WRONG pred=0 conf=0.97 Attention: lower region, moderate spread (12% coverage)
WRONG pred=0 conf=0.97 Attention: lower region, moderate spread (14% coverage)
WRONG pred=3 conf=0.97 Attention: border-focused (edge=31%), moderate spread (13% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
WRONG pred=4 conf=0.98 Attention: centered, moderate spread (12% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (13% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (17% coverage)
WRONG pred=0 conf=0.97 Attention: centered, moderate spread (27% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
WRONG pred=7 conf=0.99 Attention: centered, moderate spread (18% coverage)
WRONG pred=2 conf=0.99 Attention: centered, moderate spread (17% coverage)
WRONG pred=7 conf=0.96 Attention: centered, moderate spread (15% coverage)
WRONG pred=7 conf=0.93 Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (18% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
WRONG pred=0 conf=0.99 Attention: centered, moderate spread (19% coverage)
WRONG pred=0 conf=0.99 Attention: centered, moderate spread (19% coverage)
WRONG pred=0 conf=0.98 Attention: centered, diffuse (31% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (21% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (25% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (26% coverage)
WRONG pred=8 conf=0.92 Attention: centered, moderate spread (20% coverage)
WRONG pred=8 conf=0.88 Attention: centered, moderate spread (21% coverage)
WRONG pred=5 conf=0.85 Attention: centered, moderate spread (19% coverage)
WRONG pred=8 conf=0.81 Attention: centered, moderate spread (20% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (19% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (16% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (20% coverage)
WRONG pred=9 conf=0.97 Attention: centered, moderate spread (15% coverage)
WRONG pred=9 conf=0.95 Attention: centered, moderate spread (13% coverage)
WRONG pred=9 conf=0.95 Attention: centered, moderate spread (14% coverage)
WRONG pred=9 conf=0.95 Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.98 Attention: centered, moderate spread (17% coverage)
CORRECT conf=0.98 Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.97 Attention: centered, moderate spread (15% coverage)
CORRECT conf=0.97 Attention: centered, moderate spread (16% coverage)
WRONG pred=0 conf=0.99 Attention: centered, moderate spread (20% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (25% coverage)
WRONG pred=0 conf=0.98 Attention: centered, moderate spread (23% coverage)
WRONG pred=3 conf=0.98 Attention: centered, moderate spread (23% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (15% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (14% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (17% coverage)
CORRECT conf=1.00 Attention: centered, moderate spread (15% coverage)
WRONG pred=6 conf=0.98 Attention: centered, moderate spread (15% coverage)
WRONG pred=9 conf=0.96 Attention: centered, moderate spread (10% coverage)
WRONG pred=9 conf=0.96 Attention: centered, moderate spread (14% coverage)
WRONG pred=6 conf=0.96 Attention: centered, moderate spread (15% coverage)
CORRECT conf=0.97 Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.96 Attention: centered, moderate spread (10% coverage)
CORRECT conf=0.96 Attention: centered, moderate spread (12% coverage)
CORRECT conf=0.96 Attention: centered, moderate spread (11% coverage)
WRONG pred=8 conf=0.93 Attention: centered, moderate spread (17% coverage)
WRONG pred=8 conf=0.91 Attention: centered, moderate spread (16% coverage)
WRONG pred=3 conf=0.91 Attention: centered, moderate spread (16% coverage)
WRONG pred=6 conf=0.89 Attention: centered, moderate spread (15% coverage)
Near-Duplicate Images These images were detected as near-identical by perceptual hashing (dHash). They may inflate training metrics if present in both train and validation sets.
Duplicate Group (6 images)
near_duplicates 5000 affected
5000 images in 1 near-duplicate group(s). Duplicates can bias training and inflate metrics.
Dataset health uses perceptual hashing (dHash) and statistical heuristics. False positives possible on small/synthetic datasets.
label 4 recall=80%. label 9 recall=86%. label 7 recall=78%.
From this run: label 4 recall=80%; label 9 recall=86%; label 7 recall=78%; count=145; count=144
→ Inspect XAI overlays for both classes (see Confusion Analysis section). Pair-specific augmentation or metric learning (ArcFace; Deng et al., CVPR 2019) increases inter-class separation.
[Observed] Literature context (not verified on this run): Targeted data collection or metric learning often reduces specific confusions; quantify with your confusion matrix after changes.
📚 Deng et al., ArcFace, CVPR 2019