IO is a context engine — a persistent orchestration layer that enriches every prompt with personal RAG, clinical rules, and ethical reasoning before any LLM processes it. The model is interchangeable. IO is not.
Most AI assistants are prompt wrappers — you send text, you get
text back. IO is architecturally different. Before any LLM sees your input, the
Context Engine has already loaded your profile, retrieved relevant RAG chunks from
1880 knowledge base entries, assembled enriched_context as a structured
markdown block, and injected clinical rules from config/clinical_rules.yaml.
The LLM receives a fully enriched prompt every time, without being asked.
After the LLM responds, ALMA evaluates the output against five axioms — Conciencia, Claridad, Límite, Pragmatismo, Cuidado. Two independent layers: L1 injects axioms as pre-generation context (zero latency, zero tokens). L2 evaluates post-generation against axiom inverses — currently in structural redesign (v3) toward a deterministic pipeline. Every correction logs to shared/training/active/alma_training.jsonl — structured reasoning pairs for future fine-tuning when the Anthropic API supports it.
Haiku handles loop reasoning (~28s total). Sonnet handles final synthesis — only when the user decides the accumulated context justifies it. The LLM is a replaceable component. IO is the layer that persists across models, sessions, and interactions. That is the architectural bet behind this project.
Every input passes through the Context Engine before the router sees it.
No LLM call happens without enriched_context. No output leaves without ALMA evaluation.
enriched_context, injects clinical rules.
Every node downstream receives a fully enriched state.needs_planner() gate.
Two versions. One architectural decision that changed everything.
io_agent.py
(2400 lines). Keyword routing. Ollama + qwen3:14b for all inference. ChromaDB RAG.
10-node graph. 32/32 benchmarks passing. The problem: routing failure was the primary
failure mode. needs_planner() gated most queries before the planner was
ever reached. Capability audit score: 28/52.io_agent.py split into
modular components under io_core/. Ollama eliminated — architectural
decision, not a failure. Anthropic Claude as sole LLM provider. Context Engine extracted
as deterministic L0 node (no LLM, ~200–400ms). 5-priority router replaces
needs_planner(). Connector pipeline verified end-to-end.
3 autonomous loops operational.IO is a working system under active development, not a polished product. Current open items documented publicly — knowing the gap between what works and what's next is what separates a system builder from a prompt engineer.
Current state of IO as of phase1/react-graph (commit 835b892). ✅ shipped · 🔄 in progress · 📅 planned.