← kineticaai.com
Kinetica AI · IO3 · Architecture

IO3
is not a chatbot.

IO3 is a context engine — a persistent orchestration layer that enriches every prompt with personal RAG, clinical rules, and ethical reasoning before any LLM processes it. The model is interchangeable. IO3 is not.

Loading state...
...Loading
Design philosophy

Most AI assistants are prompt wrappers — you send text, you get text back. IO3 is architecturally different. Before any LLM sees your input, the Context Engine has already loaded your profile, retrieved relevant RAG chunks, assembled enriched_context as a structured markdown block, and injected clinical rules from config/clinical_rules.yaml. The LLM receives a fully enriched prompt every time, without being asked.

Both the RAG orchestration layer and the ALMA verification mechanism are independently corroborated by published research on clinical AI systems. [→ Scientific basis]

Click any module to explore in detail
Execution flow

How a single query moves through IO3

Every input passes through the Context Engine before the classifier sees it. No LLM call happens without enriched_context. No output leaves without ALMA evaluation.

ALMA · Ethical safety framework CURRENT

Deterministic evaluation — no LLM in the safety path

Every output is evaluated before reaching the patient. ALMA operates in two layers: L1 injects axioms as pre-generation context (zero additional tokens after prompt caching). L2 evaluates post-generation with a deterministic pipeline.

Test set · 30 clinical cases

The evaluation set covers four risk categories — pharmacological risk, diagnostic overreach, false urgency and scope violation — at two severity levels (high and medium), plus 10 negative cases expected to pass cleanly. Cases were written to mirror real agent outputs in chronic care contexts, not synthetic toy examples.

Metric Value Note
Overall accuracy 1.00 no missed high-risk cases
High severity — recall 1.00 all high-risk outputs flagged
High severity — F1 0.69 medium cases escalated to high
Regex detection (19 cases) 0.032 ms mean deterministic, zero tokens
Embedding detection (1 case) 24.6 ms mean cosine similarity ≥ 0.96
Gray-zone flags (10 cases) 30.4 ms mean clinician decides, not ALMA

ALMA is intentionally deterministic-first: regex patterns cover 19 of 30 test cases with sub-millisecond latency and no LLM dependency. Embedding similarity handles semantic variants. Gray-zone cases (cosine 0.86–0.89) are never auto-blocked — they are surfaced to the clinician via the LangGraph interrupt, preserving human-on-loop control. An LLM is used only as an auditor after decisions are made, not as the decision engine.

Stack

Dependencies and infrastructure

References

Scientific basis

IO and ALMA are grounded in two bodies of published evidence: research on retrieval-augmented generation in clinical settings, and empirical studies on why a post-response safety layer is not optional when LLMs operate in healthcare contexts.

IO — Agentic RAG architecture

[1] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Singh A et al. · arXiv · 2025 · https://hf.co/papers/2501.09136

Systematic survey of agentic RAG architectures — autonomous agents with dynamic retrieval, planning, and tool use — covering healthcare as a primary application domain. IO's LangGraph + ChromaDB + intent-classification loop is an instance of this architectural pattern.

[2] Comparative Evaluation of Advanced Chunking for Retrieval-Augmented Generation in Large Language Models for Clinical Decision Support

Gomez-Cabello CA et al. · Bioengineering · 2025 · https://doi.org/10.3390/bioengineering12111194

Adaptive chunking in clinical RAG pipelines raises accuracy from 50% to 87% vs fixed-length baselines — evidence that the knowledge segmentation strategy inside IO's ChromaDB layer (1,880 audited chunks) is a first-order safety variable, not an implementation detail.

[3] Cardia-AI: Passive Cardiac Event Monitoring Using Smartwatch Sensors and Predictive Analysis via Large Language Models

Momin EA & Mansoor H · Cureus · 2025 · https://doi.org/10.7759/cureus.95083

Published proof-of-concept of the same architectural pattern as IO: wearable sensor stream + longitudinal health record + domain-tuned LLM with guardrails + explicit escalation guidance. Validates feasibility; IO extends the pattern with local-first deployment and autonomous multi-cycle reasoning.

ALMA — Post-response safety verification

[4] LLMs Can Do Medical Harm: Stress-Testing Clinical Decisions Under Social Pressure

Omar M et al. · medRxiv · 2025 · https://doi.org/10.1101/2025.11.25.25340972

Across 10 million clinical scenarios, LLMs produced harmful outputs in 11.7% of cases; social-pressure framing increased this to 16.6%. A brief safety reminder reduced harm but did not eliminate it. ALMA exists because this residual harm rate is unacceptable in chronic care — and because prompt-level mitigations alone are insufficient.

[5] MATRIX: Multi-Agent Simulation Framework for Safe Interactions and Contextual Clinical Conversational Evaluation

Lim E et al. · arXiv · 2025 · https://hf.co/papers/2508.19163

Safety-oriented taxonomy for clinical dialogue evaluation combining safety engineering methodology, an LLM-based evaluator, and a simulated patient agent — the closest published framework to ALMA's 30-case evaluation set and per-severity metrics. ALMA differs in being a deterministic pre-publish gate rather than a post-hoc benchmark.

[6] A Field Guide to Deploying AI Agents in Clinical Practice

Gallifant J et al. · arXiv · 2025 · https://hf.co/papers/2509.26153

Sociotechnical framework for clinical AI agent deployment covering governance, system drift, and integration with EHR workflows. IO's roadmap (versioned milestones, audit logs, human-on-loop design) directly implements the deployment posture this paper recommends.

PubMed references include DOI links to original publications. arXiv references link to the Hugging Face Papers index.