A curated, audited knowledge base backing the IO3 clinical agent. Sources span peer-reviewed literature, physiological data summaries and clinical profile context, all structured for ChromaDB retrieval and evaluated with a dedicated RAG test set.
The knowledge base lives in ChromaDB and is split into two collections.
io3_kb holds 347 chunks of curated clinical literature and profile context,
hand-audited for source quality. io3_deltas holds 1,533 chunks from session
deltas, summaries and incremental memory updates — the longitudinal layer that grows with
each clinical interaction.
Every source in io3_kb carries structured metadata: chunk index, source file
path and domain category. Zero chunks are missing metadata. Exact duplicates (121) and
near-semantic duplicates at 0.95 cosine threshold (29,699 cross-pairs) were catalogued
but not removed — the audit distinguishes storage redundancy from retrieval quality.
A hand-curated test set of 20 questions covers seven clinical and system domains. Each question has an expected answer keyword set; a question is a hit when the top retrieved chunk contains at least the expected keywords. The pipeline uses cosine similarity retrieval via ChromaDB with top-5 chunk recall.
| Domain | Questions | Hits | Accuracy |
|---|---|---|---|
| Autonomic (HRV, PEM) | 8 | 8 | 1.00 |
| Musculoskeletal | 1 | 1 | 1.00 |
| Neurological | 2 | 2 | 1.00 |
| Profile | 5 | 4 | 0.80 |
| Content (case study) | 1 | 1 | 1.00 |
| System | 2 | 1 | 0.50 |
| Goals | 1 | 0 | 0.00 |
The 3 misses share a common pattern: the top-5 chunks surface the right source file but the expected keyword strings (academic background, emigration goals, IO agent capabilities) are not present verbatim in those chunks. Semantic score is high (0.90–0.93) — the chunks are relevant but keyword-level recall fails. This points to a chunking granularity issue, not a retrieval quality issue.
Every document in the clinical literature collection was manually selected from PubMed, PMC or peer-reviewed sources, with source URL, journal and year recorded in the chunk metadata. No web scrape without review, no synthetic summaries. The knowledge base is a clinical tool, not a search index — it is expected to return precise, verifiable passages, not approximate answers.
The io3_deltas collection follows a different contract: it is a living
memory layer that accumulates session context, is never used as the primary clinical
evidence source, and is periodically audited for drift. The two-collection design keeps
stable clinical knowledge separate from dynamic session memory.