System design through vocabulary alignment and operational substrate.

Cognitive Alignment in Multi-Agent Systems

Making system abstractions naturally legible by aligning them with human mental models, tested through reproducible evaluation.

Core Question

How do you make multi-agent systems where each agent's role, memory, and decision-making process are inherently legible to humans?

Most systems treat cognitive frameworks as cosmetic UI decoration. This research makes cognition operational: vocabulary, memory semantics, and routing logic all serve the same substrate—making what agents do match what humans expect.

The System: Three Operational Layers

The runtime name Fraga comes from Swedish fraga (fraga nagon vuxen, "ask a grown-up"). The model is simple: if an agent is not the right grown-up for a question, it should pass the work to someone who is.

SOUL: Semantic Identity

Each agent is bound to a vocabulary register—cognitive, technical, legal, care, generational, metadata precision. This isn't decoration; it governs how the agent parses problems and routes work.

Example: the word "boundary" has 4 semantic frames—personal limit, spatial edge, scope of authority, psychological defense. SOUL agents use the right frame for their context.

Aishna: Narrative Memory

Append-only narrative log indexed for recall. Agents don't just have state; they inherit documented context from prior moves, making decisions auditable.

Result: memory hit rate improved from 75% to 100%, mean relevance score +0.737.

Fraga: Runtime Coordination

Routes problems to the right agent (or conferences multiple agents) based on semantic register alignment. It behaves like responsible delegation: "I can handle this," "I should ask a different grown-up," or "we need several grown-ups in conference."

Result: 8/8 routing scenarios pass with precision/recall/F1 = 1.00; integrity checks validate.

Reproducible Evaluation

This work is tested, not speculated. Reproducible means another person can run the same scenarios, with the same rules, and get the same class of outcomes.

Think of it like a driving test route, not a one-time road trip story. The route is fixed, the scoring criteria are explicit, and the result can be checked repeatedly.

In plain language: this section is not saying "the system feels smart." It is saying "when given known question types, it consistently picks the right grown-up, recalls the right context, and produces outputs that pass integrity checks."

Routing Accuracy

8/8 scenarios

Precision / Recall / F1 = 1.00

Problems correctly routed to semantic-aligned agents.

Memory Recall Quality

+0.737

Mean relevance improvement (indexed vs. unindexed)

Hit rate: 75% → 100%

Synthesis Integrity

5/5 scenarios

Multi-agent conference outputs pass integrity checks

Deterministic replay validation confirmed.

Operational Durability

Release Gate Pass

Contract validation, failure handling, deterministic replay all verified.

System production-ready for scoped use.

Real-World Analogy

Imagine a clinic front desk. A patient arrives with a question. The first staff member should not pretend to be every specialist. Good routing means sending billing questions to billing, legal consent issues to legal/compliance, and medical decisions to clinicians.

Fraga applies that same discipline to agent systems: route first, remember context, then respond with the right role. The evaluation checks whether that discipline holds up run after run.

What These Results Prove in Practice

  • Routing Accuracy: In the clinic analogy, the front desk sends the person to the right specialist. In Fraga, the runtime sends the question to the right agent role instead of guessing.
  • Memory Recall Quality: The specialist receives the right chart, not random notes. In Fraga, the selected agent gets relevant prior context, so responses are grounded instead of drifting.
  • Synthesis Integrity: When several specialists are involved, the final plan stays coherent. In Fraga, multi-agent conference outputs remain consistent and pass integrity checks.
  • Operational Durability: The process works reliably every shift, not just once. In Fraga, validation, replay, and failure handling confirm the system can repeat behavior under the same constraints.

Together, these results prove disciplined coordination: route correctly, carry context correctly, combine outputs coherently, and repeat the process reliably.

They do not prove universal intelligence or broad real-world generalization yet. They prove that this operational pattern works on scoped, reproducible tests.

Important Scope

This research is honest about its limits. The evaluation is intentionally curated and local:

The claim is modest: cognitive alignment improves legibility and measurable coordination in multi-agent systems at this scope. Not a universal theory; a concrete proof.

Why This Matters for B1C3

B1C3 is built around cognitive clarity and auditable systems. Cognitive alignment is operational proof that you can make system design serve human mental models without sacrificing precision.

This research validates that it's possible to build systems where what the machine does matches what humans expect to happen—not because of UX tricks, but because the substrate itself is cognitively aligned.

Explore the Work

The research is published and reproducible:

Read the Paper Explore the Repository