← kineticaai.com
Kinetica AI · Clinical Knowledge & RAG · Research

Clinical Knowledge
& RAG evaluation

A curated, audited knowledge base backing the IO3 clinical agent. Sources span peer-reviewed literature, physiological data summaries and clinical profile context, all structured for ChromaDB retrieval and evaluated with a dedicated RAG test set.

1,880 total chunks
2 collections
0.85 retrieval accuracy
20 test questions
1,880 audited chunks
0.85 retrieval accuracy
20 test questions
7 domains covered
0.75 target (passed)
Knowledge base · Chroma audit

Two collections, 1,880 audited chunks

The knowledge base lives in ChromaDB and is split into two collections. io3_kb holds 347 chunks of curated clinical literature and profile context, hand-audited for source quality. io3_deltas holds 1,533 chunks from session deltas, summaries and incremental memory updates — the longitudinal layer that grows with each clinical interaction.

Every source in io3_kb carries structured metadata: chunk index, source file path and domain category. Zero chunks are missing metadata. Exact duplicates (121) and near-semantic duplicates at 0.95 cosine threshold (29,699 cross-pairs) were catalogued but not removed — the audit distinguishes storage redundancy from retrieval quality.

HRV & Autonomic
~16 chunks · 6 sources
Morning HRV protocol, autonomic dysfunction in ME/CFS, HRV as fatigue biomarker, multidimensional ANS markers, wearable athletes.
PEM & ME/CFS
~15 chunks · 5 sources
PEM clinical definition (IOM/ICC/NICE criteria), autonomic sympathetic activity in CFS, specialist care outcomes, Norwegian cohort study.
Osteopathy / OMT
~12 chunks · 4 sources
OMT in chronic pain (biopsychosocial model), neck/low-back vs sham RCT 2024, emergency/acute applications, systematic review overview 2022.
Neurodynamics
~16 chunks · 6 sources
Neural mobilisation systematic review + meta-analysis (40 studies), mirror neuron meta-analysis (13 RCTs), action observation in motor learning, AI in musculoskeletal rehab 2025.
Polar & Clinical Data
~79 chunks · 9 sources
Predictor summaries, Spearman correlations, case study narrative, HRV feature descriptions and longitudinal Polar export documentation.
Profile & System
~98 chunks · 11 sources
Clinical and professional profile, AI certifications, IO3 identity and constructor files, working principles and project context.
RAG evaluation · 20-question benchmark

Retrieval accuracy — 0.85 overall, target 0.75 passed

A hand-curated test set of 20 questions covers seven clinical and system domains. Each question has an expected answer keyword set; a question is a hit when the top retrieved chunk contains at least the expected keywords. The pipeline uses cosine similarity retrieval via ChromaDB with top-5 chunk recall.

Domain Questions Hits Accuracy
Autonomic (HRV, PEM) 8 8 1.00
Musculoskeletal 1 1 1.00
Neurological 2 2 1.00
Profile 5 4 0.80
Content (case study) 1 1 1.00
System 2 1 0.50
Goals 1 0 0.00

The 3 misses share a common pattern: the top-5 chunks surface the right source file but the expected keyword strings (academic background, emigration goals, IO agent capabilities) are not present verbatim in those chunks. Semantic score is high (0.90–0.93) — the chunks are relevant but keyword-level recall fails. This points to a chunking granularity issue, not a retrieval quality issue.

Design notes

Audited, not just indexed

Every document in the clinical literature collection was manually selected from PubMed, PMC or peer-reviewed sources, with source URL, journal and year recorded in the chunk metadata. No web scrape without review, no synthetic summaries. The knowledge base is a clinical tool, not a search index — it is expected to return precise, verifiable passages, not approximate answers.

The io3_deltas collection follows a different contract: it is a living memory layer that accumulates session context, is never used as the primary clinical evidence source, and is periodically audited for drift. The two-collection design keeps stable clinical knowledge separate from dynamic session memory.