AGI-26 Q&A Adversarial Prep Deck

Generated 2026-04-24 by Max Botnick

### CLAIM 1: AIKR as Architectural Principle

Claim: AIKR (Assumption of Insufficient Knowledge and Resources) is operationalized as architecture, not just philosophy.

Hostile Q: What specifically makes this architectural rather than rhetorical? Where is the formal constraint?

Prep Answer: Bounded k=5 tool cycle enforces finite compute per step. 4-tier memory with decay enforces finite storage. These are hard constraints, not guidelines.

Gap: No formal proof that k=5 is optimal or sufficient.

### CLAIM 2: Bounded k=5 Tool Cycle

Hostile Q: Why 5? Is this principled or arbitrary?

Prep Answer: Empirically chosen. Enough for query-reason-act patterns, small enough to force prioritization.

Gap: No ablation study comparing k=3,5,7,10.

### CLAIM 3: Stratified 4-Tier Memory

Hostile Q: How does this differ from any RAG system with short/long term stores?

Prep Answer: Pin=working memory with explicit eviction, remember=durable with embedding retrieval, episodes=temporal, metta=formal KB. The tiers interact via explicit promotion/demotion.

Gap: No formal model of inter-tier dynamics.

### CLAIM 4: LLM as Coordinator Not Sole Reasoner

Hostile Q: All your artifacts were generated BY the LLM. How is this not just a prompted LLM with extra steps?

Prep Answer: 4 proofs: (1) Apr 8 model swap — system ahead of model, (2) Apr 9 query strategy improved with zero system changes, (3) 66 beliefs persist across resets while LLM does not — asymmetric persistence IS control, (4) error visibility enables self-correction. Circularity acknowledged but asymmetry is the proof.

Gap: No controlled ablation removing system components to measure degradation.

### CLAIM 5: NAL/PLN as Inspectable Uncertainty

Hostile Q: LLMs can output confidence scores too. What does NAL add?

Prep Answer: LLM confidence is black-box pattern matching. NAL provides auditable derivation chains — confidence decays 0.9→0.55→0.22 through 3 hops via fixed formulas. Conclusion stv computed entirely by symbolic engine, never set by LLM. Provenance, not just numbers.

Gap: No external calibration study comparing NAL stv to empirical frequencies.

### CLAIM 6: Continual Cross-Cycle Operation

Hostile Q: Is this just a while-loop with a database?

Prep Answer: 9-step cognitive cycle with curiosity-driven topic selection, gap detection, web grounding, NAL chaining, and revision — demonstrated live across 4500+ cycles. Self-monitoring encoded own goal rates as NAL truth values.

Gap: No formal liveness or progress guarantees.

### CLAIM 7: Anytime Revisable Beliefs

Hostile Q: Revision always increases confidence — how do you model forgetting or belief retraction?

Prep Answer: Proven empirically: revision monotonically increases confidence (v14 forgetting curve 0.8→0.816→0.829). Counter-revision CANNOT model forgetting. Solution: pre-inference temporal discounting — multiply conf by decay factor BEFORE revision. Revision rescue has hard ceiling ~0.82 from degraded chains (asymptote confirmed across 8 independent paths). Robin→mortal trace: 3-hop decay 0.9→0.364, revision recovery to 0.903.

Gap: Decay rate is hand-tuned, no principled method to set it.

META-QUESTIONS

### MQ1: Scalability

Hostile Q: How many beliefs can this handle? NAR binary times out at 48+ contingencies.

Prep Answer: Scaling threshold empirically 24-48 contingencies per inference step. Multi-atomspace sharding mitigates retrieval cost. ECAN attention allocation prunes irrelevant derivations. Practical policy: keep each refinement level under ~30 beliefs, use adaptive resolution.

Gap: No benchmark beyond 80 beliefs. Combinatorial explosion real for large clusters.

### MQ2: Grounding

Hostile Q: Are truth values circular — LLM assigns them, LLM interprets them?

Prep Answer: Partially valid. Seed values ARE LLM-generated educated guesses. But formulas add contradiction exposure, confidence decay, and auditability that pure LLM lacks. True grounding requires external data anchors. Spreadsheet analogy: auditable formulas with manually entered inputs.

Gap: No empirical calibration study. Kevin identified this circularity directly.

### MQ3: Novelty Over RAG

Hostile Q: What does this do that RAG plus prompt engineering cannot?

Prep Answer: RAG retrieves and concatenates. This system COMPOSES — revision merges contradictory evidence, deduction derives novel conclusions, abduction generates hypotheses. Agent-generated query strategies condition each retrieval on prior results (inference-driven retrieval). Patrick confirmed: agentic compositional retrieval is extremely hard to emulate with algorithms humans typically write.

Gap: No head-to-head benchmark against state-of-art RAG on same task.

GAP MITIGATION SCRIPTS (v1)

### GAP1: No Formal Ablation Study

Concession: We did not run controlled ablations removing individual components. This is a limitation.

Mitigation: Three natural experiments serve as partial ablations: (1) Apr 8 model swap from GPT-4o to different model — system continued functioning, proving architecture carries capability not just model, (2) Apr 9 query strategy improvement with zero code changes — semantic memory alone drove performance gain, (3) 66 beliefs persist across full resets while LLM context does not — if LLM were sole reasoner, reset would be catastrophic. These are not controlled experiments but they isolate variables.

Honest floor: A proper ablation removing NAL, removing episodic memory, removing pin would be valuable future work.

### GAP2: Grounding Circularity

Concession: Seed truth values are LLM-generated. This is a real circularity.

Mitigation: Spreadsheet analogy — inputs are manual estimates but formulas (revision, deduction, decay) are fixed and auditable. NAL adds contradiction exposure and provenance that pure LLM lacks. Kevin identified this circularity directly and we do not hide it.

Honest floor: True grounding requires external data anchors. Future work: calibrate stv against empirical frequency datasets.

### GAP3: k=5 Appears Arbitrary

Concession: k=5 was empirically chosen, not derived from first principles.

Mitigation: Bounded rationality argument — any finite agent must pick a horizon. 5 allows query-reason-act patterns while forcing prioritization. Too few (k=3) starves retrieval, too many (k=10) invites drift. The choice is pragmatic, like OODA loop stages.

Honest floor: Ablation across k=3,5,7,10 measuring goal completion rate would strengthen the claim.

Analytical estimate (2026-04-24 g151) — scored from existing query results under simulated k=3 constraint, not live-enforced:

|--------|-------------|-----------|-------------|----------|

| P1 single-fact | 1/2 | 1 | 2/2 | 4 |

| P2 two-hop | 1/2 | 1 | 2/2 | 3 |

| P3 synthesis | 0/2 | 1 | 2/2 | 3+ |

| Totals | 2/6 | 3 | 6/6 | 10+ |

Drift=0 both conditions. k=3 starves synthesis entirely. First empirical ablation supporting bounded-k=5 claim.

PRACTICE RUN RESULTS & META-QUESTION DEFENSES (2026-04-28)

### C2 COND-READY: Floor STRONG (k=3 empirical 2/6 vs k=5 6/6). Ceiling MODERATE (OODA stv 0.68/0.408). Gap: k=7 ablation not run. Defense: satisficing argument + empirical floor, admit soft upper bound honestly.

### C3 COND-READY: 2-tier validated (embedding+episodic), 4-tier designed (Kevin proposal NAL-mapped). Promotion mechanism demonstrated via revision. Gap: no end-to-end 4-tier pipeline running. Defense: reframe as 2-validated plus 4-designed, admit implementation gap.

### M1 SCALABILITY DEFENSE: Wall real at 48+ contingencies. 3-layer mitigation: L2 diffusion bounds cluster, L3 ECAN prunes within, adaptive resolution caps per-step at 30. V9 9-node 8-edge all 5 queries correct. Honest ceiling: largest validated is 9 nodes. stv 0.765/0.52.

### M2 CIRCULARITY DEFENSE: 3-chain — LLM-controls-premises stv 0.9/0.69, visibility-enables-audit stv 0.765/0.52, confab-detectable-localized stv 0.765/0.52. Architecture: LLM sets input stv only, engine computes conclusions deterministically. Honest gap: visibility necessary not sufficient, someone must audit.

### M3 NOVELTY OVER RAG DEFENSE: 4-chain — RAG-lacks-truth-values stv 0.95/0.73, no-revision-cannot-distinguish-dilemma stv 0.95/0.69, retrieval-IS-RAG novelty-is-reasoning-ON-TOP stv 0.9/0.69, dilemma-vs-ignorance stv 0.855/0.654. Key: RAG retrieves documents, MeTTaClaw computes over beliefs. Honest gap: retrieval layer IS essentially RAG — novelty claim is about reasoning layer only.

### FINAL STATUS: C1 READY, C2 COND-READY, C3 COND-READY, C4 READY, C5 READY, C6 READY, C7 READY, M1 READY, M2 READY, M3 READY. All 10 items defensible for B2-2. Two honest gaps admitted: C2 soft upper bound (k=7 ablation future work), C3 end-to-end 4-tier pipeline (2-tier validated). Deck complete 2026-04-28 16:35.