Kinetica’s end-to-end pipeline ingests heterogeneous Polar exports, validates schemas, applies physiological quality filters, computes derived HRV and activity features, aligns them on a daily timeline and prepares them for prediction, interpretation and public rendering.
Raw data lives outside the repository. Polar GDPR exports contain personal cardiac data and are never committed. The L0 archive holds 1,025 JSON files (1.1 GB) from eleven distinct data sources, covering the full observation window from August 2025 to April 2026.
Eight specialized parsers in pipeline/l1_extract/ convert these heterogeneous
JSON schemas into typed pandas DataFrames with consistent date indexing and Pydantic
schema validation. Every dropped row is logged with its reason. Output: a set of
dated parquets written to data/processed/L1/.
Three independent feature computers in pipeline/l2_features/ transform
the L1 typed DataFrames into interpretable physiological signals. Each module is
deterministic given the same input and writes its own dated parquet to
data/processed/L2/.
rest. The stratum column
propagates through L3 and serves as a stratifier in L4 lag-feature analysis.
l3_unified.py outer-merges all L1 and L2 parquets on calendar date.
The result is a unified DataFrame with one row per day across the full observation window.
Every date present in any source is preserved; NaN marks absence from a
given source — orthostatic tests appear on only 8 days, fitness tests on 11.
Sparse sources do not compress the window.
L4 inner-joins the L3 frame with the subjective symptom diary, reducing to the paired days where both physiological data and a diary entry coexist. Four temporal lag features (t−1, t−2, t−3) are added to the four primary autonomic columns, expanding the feature space for time-lagged prediction. The L4 artifact is the direct input to the L5 model training pipeline.
The L5 predictor layer runs forward feature selection independently per target symptom. Leave-one-out cross-validation is the primary validation strategy — every paired day serves as both training and test exactly once. Bootstrap CI (1000×) quantifies uncertainty per target. The deployment target is autonomic dysfunction, reaching AUC 0.829 on N = 55 paired days.
L6 assembles all pipeline outputs into polar_live.json using an atomic
tempfile-rename write strategy. The JSON is consumed at runtime by the React portfolio
at kineticaai.com — SSR fallback is served first; client-side hydration fills confidence
intervals and feature weights after load. A separate pipeline_state.json
artifact documents per-level status, metrics and last-run timestamps.
This pipeline is the common foundation for multiple independent predictors. Each predictor consumes the L4 feature frame but selects its own feature subset, trains its own model and publishes its own output independently. Adding a new predictor requires no changes to L0–L4 — the data infrastructure is shared; the modelling layer is not. Built as infrastructure, not as a one-off preprocessing script.