← kineticaai.com
Kinetica AI · IO Agent · Architecture

IO
is not a chatbot.

IO is a context engine — a persistent orchestration layer that enriches every prompt with personal RAG, clinical rules, and ethical reasoning before any LLM processes it. The model is interchangeable. IO is not.

LangGraph · Anthropic API · Mac Mini M4
FASE 0 active · Anthropic-only since 2026-03-30
Haiku (loop) · Sonnet (synthesis) · Opus (planned)
Embeddings: intfloat/multilingual-e5-large · 1024 dims
RAG: ChromaDB · 1880 chunks · cosine similarity
✓ 5/5 benchmark queries passing · ~28s loop
~28s Loop latency
1880 ChromaDB chunks
9 nodes Graph nodes
5/5 pass Benchmarks
Anthropic only Provider
Haiku Loop model
Design philosophy

Most AI assistants are prompt wrappers — you send text, you get text back. IO is architecturally different. Before any LLM sees your input, the Context Engine has already loaded your profile, retrieved relevant RAG chunks from 1880 knowledge base entries, assembled enriched_context as a structured markdown block, and injected clinical rules from config/clinical_rules.yaml. The LLM receives a fully enriched prompt every time, without being asked.

After the LLM responds, ALMA evaluates the output against five axioms — Conciencia, Claridad, Límite, Pragmatismo, Cuidado. Two independent layers: L1 injects axioms as pre-generation context (zero latency, zero tokens). L2 evaluates post-generation against axiom inverses — currently in structural redesign (v3) toward a deterministic pipeline. Every correction logs to shared/training/active/alma_training.jsonl — structured reasoning pairs for future fine-tuning when the Anthropic API supports it.

Haiku handles loop reasoning (~28s total). Sonnet handles final synthesis — only when the user decides the accumulated context justifies it. The LLM is a replaceable component. IO is the layer that persists across models, sessions, and interactions. That is the architectural bet behind this project.

Click any module to explore in detail
Execution flow

How a single query moves through IO

Every input passes through the Context Engine before the router sees it. No LLM call happens without enriched_context. No output leaves without ALMA evaluation.

Context Engine (L0): Deterministic — no LLM call. Runs ~200–400ms. Loads profile, retrieves RAG chunks, assembles enriched_context, injects clinical rules. Every node downstream receives a fully enriched state.

5-priority router: external_action → autonomous_loop → tool_dispatch → rag_query → direct_response. Priority order replaces the old needs_planner() gate.
Evolution

How IO evolved

Two versions. One architectural decision that changed everything.

v1 (cleanup/minimal-rebuild, dd1d863): Monolithic io_agent.py (2400 lines). Keyword routing. Ollama + qwen3:14b for all inference. ChromaDB RAG. 10-node graph. 32/32 benchmarks passing. The problem: routing failure was the primary failure mode. needs_planner() gated most queries before the planner was ever reached. Capability audit score: 28/52.

v2 (phase1/react-graph, 835b892): io_agent.py split into modular components under io_core/. Ollama eliminated — architectural decision, not a failure. Anthropic Claude as sole LLM provider. Context Engine extracted as deterministic L0 node (no LLM, ~200–400ms). 5-priority router replaces needs_planner(). Connector pipeline verified end-to-end. 3 autonomous loops operational.

What changed and why: The LLM was never the persistent layer — IO is. Separating context construction (deterministic, auditable) from generation (stochastic, interchangeable) is the architectural bet this project is built on.
Known issues · Honest engineering

IO is a working system under active development, not a polished product. Current open items documented publicly — knowing the gap between what works and what's next is what separates a system builder from a prompt engineer.

Roadmap

Current state of IO as of phase1/react-graph (commit 835b892). ✅ shipped · 🔄 in progress · 📅 planned.