BNNR Analysis Report

v0.6.3 Classification 10 classes · 10,000 samples

Confidence levels: Observed = measured from data Likely = heuristic signal Suspected = needs verification

Executive Summary

WARNING64%

Severity: medium

Health score is a composite indicator (task metrics + XAI quality), not raw accuracy.

Key Findings

Frequent confusion pair

Top Actions

Reduce top class confusions: 4, 9, 7, 5, 8, 3

81.6%

accuracy

79.8%

f1_macro

0.5875

loss

0.796

Cohen’s κ

0.022

ECE (top-1)

Classes

10,000

Samples

Class Diagnostics

#	Class	Accuracy	Precision	Recall	F1	Cohen’s κ	Support	Pred	Severity
1	1	91.9%	95.8%	91.9%	93.8%	0.930	1135	1089	ok
2	0	95.2%	81.1%	95.2%	87.6%	0.861	980	1151	ok
3	6	82.5%	90.1%	82.5%	86.1%	0.847	958	877	ok
4	7	78.3%	93.1%	78.3%	85.0%	0.835	1028	865	ok
5	2	82.8%	86.0%	82.8%	84.4%	0.827	1032	994	ok
6	4	80.1%	82.8%	80.1%	81.4%	0.794	982	951	ok
7	3	74.1%	86.7%	74.1%	79.9%	0.778	1010	863	ok
8	9	86.4%	67.7%	86.4%	75.9%	0.729	1009	1288	ok
9	8	82.2%	63.3%	82.2%	71.5%	0.680	974	1266	ok
10	5	59.9%	81.4%	59.9%	69.0%	0.665	892	656	ok

Most Confused Pairs

4→9

145×

7→9

144×

5→8

124×

3→8

86×

2→8

81×

1→8

70×

5→3

68×

9→4

60×

Confusion Matrix

True \ Pred

933

1043

855

748

787

145

534

124

790

805

144

801

872

Findings

Frequent confusion pair Observed confused_pair

4→9: 145 confusions (#1 pair). 7→9: 144 confusions (#2 pair). 5→8: 124 confusions (#3 pair). ...and 1 more.

class 4class 9class 7class 5class 8class 3

count=145count=144count=124count=86

Confusion Analysis

Class 4 → Class 9145 confusions

Correctly classified as 4 (mean saliency)

Confused 4→9 (mean saliency)

When correctly classified as class 4, attention is centered, moderate spread (13% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).

Individual confused samples (8)

Class 7 → Class 9144 confusions

Correctly classified as 7 (mean saliency)

Confused 7→9 (mean saliency)

When correctly classified as class 7, attention is centered, moderate spread (15% coverage). In confused predictions (predicted as 9), attention shifts to centered, moderate spread (13% coverage).

Individual confused samples (8)

Class 5 → Class 8124 confusions

Correctly classified as 5 (mean saliency)

Confused 5→8 (mean saliency)

When correctly classified as class 5, attention is centered, moderate spread (12% coverage). In confused predictions (predicted as 8), attention shifts to centered, moderate spread (13% coverage).

Individual confused samples (8)

XAI Insights

Q = 0.25·Acc + 0.20·Focus + 0.15·Coverage + 0.15·Coherence + 0.10·Edge + 0.15·Consistency

Acc = prediction accuracy on probed samples; Focus = spatial concentration of attention; Coverage = fraction of object covered; Coherence = spatial continuity; Edge = proportion of attention away from borders; Consistency = stability across similar inputs.

Global XAI Quality: 0.868

Computed on 46 samples. Small probe set — treat as indicative, not definitive.

Class 80.616(1 samples)

accuracy

0.00

focus

0.72

coverage

1.00

coherence

1.00

edge

0.96

consistency

0.50

CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (21% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (13% coverage)

WRONG pred=0 conf=0.99
Attention: centered, moderate spread (23% coverage)

WRONG pred=0 conf=0.99
Attention: centered, diffuse (30% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (27% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (24% coverage)

Class 50.779(5 samples)

accuracy

0.80

focus

0.75

coverage

1.00

coherence

0.48

edge

0.59

consistency

0.99

CORRECT conf=0.99
Attention: centered, moderate spread (12% coverage)

CORRECT conf=0.99
Attention: centered, moderate spread (10% coverage)

CORRECT conf=0.99
Attention: centered, moderate spread (12% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (14% coverage)

WRONG pred=0 conf=0.97
Attention: lower region, moderate spread (12% coverage)

WRONG pred=0 conf=0.97
Attention: lower region, moderate spread (14% coverage)

WRONG pred=3 conf=0.97
Attention: border-focused (edge=31%), moderate spread (13% coverage)

Class 60.783(5 samples)

accuracy

0.40

focus

0.72

coverage

1.00

coherence

1.00

edge

0.91

consistency

0.98

CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)

WRONG pred=4 conf=0.98
Attention: centered, moderate spread (12% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (13% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (17% coverage)

WRONG pred=0 conf=0.97
Attention: centered, moderate spread (27% coverage)

Class 30.883(5 samples)

accuracy

0.80

focus

0.74

coverage

1.00

coherence

1.00

edge

0.88

consistency

0.98

CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)

WRONG pred=7 conf=0.99
Attention: centered, moderate spread (18% coverage)

WRONG pred=2 conf=0.99
Attention: centered, moderate spread (17% coverage)

WRONG pred=7 conf=0.96
Attention: centered, moderate spread (15% coverage)

WRONG pred=7 conf=0.93
Attention: centered, moderate spread (18% coverage)

Class 20.910(5 samples)

accuracy

1.00

focus

0.74

coverage

1.00

coherence

0.85

edge

0.87

consistency

0.98

CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (18% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)

WRONG pred=0 conf=0.99
Attention: centered, moderate spread (19% coverage)

WRONG pred=0 conf=0.98
Attention: centered, diffuse (31% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (21% coverage)

Class 00.929(5 samples)

accuracy

1.00

focus

0.69

coverage

1.00

coherence

1.00

edge

0.92

consistency

0.99

CORRECT conf=1.00
Attention: centered, moderate spread (25% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (26% coverage)

WRONG pred=8 conf=0.92
Attention: centered, moderate spread (20% coverage)

WRONG pred=8 conf=0.88
Attention: centered, moderate spread (21% coverage)

WRONG pred=5 conf=0.85
Attention: centered, moderate spread (19% coverage)

WRONG pred=8 conf=0.81
Attention: centered, moderate spread (20% coverage)

Class 70.933(5 samples)

accuracy

1.00

focus

0.76

coverage

1.00

coherence

1.00

edge

0.82

consistency

0.99

CORRECT conf=1.00
Attention: centered, moderate spread (19% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (16% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (20% coverage)

WRONG pred=9 conf=0.97
Attention: centered, moderate spread (15% coverage)

WRONG pred=9 conf=0.95
Attention: centered, moderate spread (13% coverage)

WRONG pred=9 conf=0.95
Attention: centered, moderate spread (14% coverage)

WRONG pred=9 conf=0.95
Attention: centered, moderate spread (12% coverage)

Class 90.940(5 samples)

accuracy

1.00

focus

0.76

coverage

1.00

coherence

1.00

edge

0.91

consistency

0.98

CORRECT conf=0.98
Attention: centered, moderate spread (17% coverage)

CORRECT conf=0.98
Attention: centered, moderate spread (12% coverage)

CORRECT conf=0.97
Attention: centered, moderate spread (15% coverage)

CORRECT conf=0.97
Attention: centered, moderate spread (16% coverage)

WRONG pred=0 conf=0.99
Attention: centered, moderate spread (20% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (25% coverage)

WRONG pred=0 conf=0.98
Attention: centered, moderate spread (23% coverage)

WRONG pred=3 conf=0.98
Attention: centered, moderate spread (23% coverage)

Class 40.948(5 samples)

accuracy

1.00

focus

0.77

coverage

1.00

coherence

1.00

edge

0.95

consistency

0.99

CORRECT conf=1.00
Attention: centered, moderate spread (15% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (14% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (17% coverage)

CORRECT conf=1.00
Attention: centered, moderate spread (15% coverage)

WRONG pred=6 conf=0.98
Attention: centered, moderate spread (15% coverage)

WRONG pred=9 conf=0.96
Attention: centered, moderate spread (10% coverage)

WRONG pred=9 conf=0.96
Attention: centered, moderate spread (14% coverage)

WRONG pred=6 conf=0.96
Attention: centered, moderate spread (15% coverage)

Class 10.956(5 samples)

accuracy

1.00

focus

0.83

coverage

1.00

coherence

1.00

edge

0.90

consistency

1.00

CORRECT conf=0.97
Attention: centered, moderate spread (10% coverage)

CORRECT conf=0.96
Attention: centered, moderate spread (10% coverage)

CORRECT conf=0.96
Attention: centered, moderate spread (12% coverage)

CORRECT conf=0.96
Attention: centered, moderate spread (11% coverage)

WRONG pred=8 conf=0.93
Attention: centered, moderate spread (17% coverage)

WRONG pred=8 conf=0.91
Attention: centered, moderate spread (16% coverage)

WRONG pred=3 conf=0.91
Attention: centered, moderate spread (16% coverage)

WRONG pred=6 conf=0.89
Attention: centered, moderate spread (15% coverage)

Dataset Health

5,000

Scanned

1957669

Duplicate Pairs

Duplicate Groups

Flagged Images

Class Distribution

1,135

1,032

1,028

1,010

1,009

982

980

974

958

892

Near-Duplicate Images

These images were detected as near-identical by perceptual hashing (dHash). They may inflate training metrics if present in both train and validation sets.

Duplicate Group (6 images)

near_duplicates 5000 affected

5000 images in 1 near-duplicate group(s). Duplicates can bias training and inflate metrics.

Dataset health uses perceptual hashing (dHash) and statistical heuristics. False positives possible on small/synthetic datasets.

Recommendations

1Reduce top class confusions: 4, 9, 7, 5, 8, 3 Observed

label 4 recall=80%. label 9 recall=86%. label 7 recall=78%.

From this run: label 4 recall=80%; label 9 recall=86%; label 7 recall=78%; count=145; count=144

→ Inspect XAI overlays for both classes (see Confusion Analysis section). Pair-specific augmentation or metric learning (ArcFace; Deng et al., CVPR 2019) increases inter-class separation.

[Observed] Literature context (not verified on this run): Targeted data collection or metric learning often reduces specific confusions; quantify with your confusion matrix after changes.

📚 Deng et al., ArcFace, CVPR 2019