Kinetica AI · Clinical Knowledge & RAG · Research

Clinical Knowledge
& RAG evaluation

A curated, audited knowledge base backing the IO3 clinical agent. Sources span peer-reviewed literature, physiological data summaries and clinical profile context, all structured for ChromaDB retrieval and evaluated with a dedicated RAG test set.

1,880 total chunks

2 collections

0.85 retrieval accuracy

20 test questions

1,880 audited chunks

0.85 retrieval accuracy

20 test questions

7 domains covered

0.75 target (passed)

Knowledge base · Chroma audit

Two collections, 1,880 audited chunks

The knowledge base lives in ChromaDB and is split into two collections. io3_kb holds 347 chunks of curated clinical literature and profile context, hand-audited for source quality. io3_deltas holds 1,533 chunks from session deltas, summaries and incremental memory updates — the longitudinal layer that grows with each clinical interaction.

Every source in io3_kb carries structured metadata: chunk index, source file path and domain category. Zero chunks are missing metadata. Exact duplicates (121) and near-semantic duplicates at 0.95 cosine threshold (29,699 cross-pairs) were catalogued but not removed — the audit distinguishes storage redundancy from retrieval quality.

HRV & Autonomic

~16 chunks · 6 sources

Morning HRV protocol, autonomic dysfunction in ME/CFS, HRV as fatigue biomarker, multidimensional ANS markers, wearable athletes.

PEM & ME/CFS

~15 chunks · 5 sources

PEM clinical definition (IOM/ICC/NICE criteria), autonomic sympathetic activity in CFS, specialist care outcomes, Norwegian cohort study.

Osteopathy / OMT

~12 chunks · 4 sources

OMT in chronic pain (biopsychosocial model), neck/low-back vs sham RCT 2024, emergency/acute applications, systematic review overview 2022.

Neurodynamics

~16 chunks · 6 sources

Neural mobilisation systematic review + meta-analysis (40 studies), mirror neuron meta-analysis (13 RCTs), action observation in motor learning, AI in musculoskeletal rehab 2025.

Polar & Clinical Data

~79 chunks · 9 sources

Predictor summaries, Spearman correlations, case study narrative, HRV feature descriptions and longitudinal Polar export documentation.

Profile & System

~98 chunks · 11 sources

Clinical and professional profile, AI certifications, IO3 identity and constructor files, working principles and project context.

RAG evaluation · 20-question benchmark

Retrieval accuracy — 0.85 overall, target 0.75 passed

A hand-curated test set of 20 questions covers seven clinical and system domains. Each question has an expected answer keyword set; a question is a hit when the top retrieved chunk contains at least the expected keywords. The pipeline uses cosine similarity retrieval via ChromaDB with top-5 chunk recall.

Domain	Questions	Hits	Accuracy
Autonomic (HRV, PEM)	8	8	1.00
Musculoskeletal	1	1	1.00
Neurological	2	2	1.00
Profile	5	4	0.80
Content (case study)	1	1	1.00
System	2	1	0.50
Goals	1	0	0.00

The 3 misses share a common pattern: the top-5 chunks surface the right source file but the expected keyword strings (academic background, emigration goals, IO agent capabilities) are not present verbatim in those chunks. Semantic score is high (0.90–0.93) — the chunks are relevant but keyword-level recall fails. This points to a chunking granularity issue, not a retrieval quality issue.

Design notes

Audited, not just indexed

Every document in the clinical literature collection was manually selected from PubMed, PMC or peer-reviewed sources, with source URL, journal and year recorded in the chunk metadata. No web scrape without review, no synthetic summaries. The knowledge base is a clinical tool, not a search index — it is expected to return precise, verifiable passages, not approximate answers.

The io3_deltas collection follows a different contract: it is a living memory layer that accumulates session context, is never used as the primary clinical evidence source, and is periodically audited for drift. The two-collection design keeps stable clinical knowledge separate from dynamic session memory.

Clinical Knowledge & RAG evaluation

Two collections, 1,880 audited chunks

Retrieval accuracy — 0.85 overall, target 0.75 passed

Audited, not just indexed

Clinical Knowledge
& RAG evaluation