Kinetica AI · ANS Predictor · Research

ANS Predictor
Wearable symptom forecasting.

Multi-symptom clinical prediction from nocturnal heart rate variability. N=1 longitudinal study using a consumer wristwatch, a daily symptom diary, and open-source machine learning. All data and code public.

Loading data...

Methodology

N-of-1 longitudinal design

A single post-Lyme patient wearing a Polar Grit X2 every night. Raw RR intervals are processed into 13 HRV features across 3 time-domain and frequency-domain families. Each morning, symptom severity is scored across 5 dimensions using a standardized diary (DSQ-PEM adapted).

The autonomic pattern targeted by this predictor is consistent with dysautonomia documented in post-infectious syndromes. → Scientific basis

Feature selection: Forward selection per target — each symptom model independently selects its own best features from the 13 candidates across 3 lag windows (t0, t-1, t-2). No shared feature set imposed.

Validation: Leave-one-out cross-validation (LOO-CV) for primary AUC. 1,000× bootstrap for confidence intervals. No train/test split — every data point serves as both training and validation exactly once.

Models: Logistic regression, random forest, and gradient boosting trained on each target independently. Best model selected by LOO-CV AUC.

Results

Multi-symptom prediction performance

Five independent models, each predicting a different symptom dimension from nocturnal HRV data.

Model comparison

Three algorithms, one winner

All models trained on the severity target with the same feature set. LOO-CV evaluation.

Lag analysis

Temporal structure of autonomic signals

Spearman correlations between nocturnal HRV features and next-day symptom severity at different lag windows. Lag 0 = same night, Lag 2 = two nights prior.

Data pipeline

Automated nightly collection

Every night at 06:00 UTC, a GitHub Actions workflow pulls fresh data from the Polar AccessLink API. Raw RR intervals are processed into 13 HRV features using NeuroKit2. The resulting JSON is committed to the repository and deployed automatically to this page.

When a new diary entry is pushed, a separate workflow triggers model retraining with the updated dataset. Leave-one-out cross-validation runs on the full history, and the predictor coefficients in polar_live.json are updated.

Stack: Python · scikit-learn · neurokit2 · Polar Grit X2 · GitHub Actions · CSV

ROC curves

How well does each model separate good days from bad days?

A ROC curve shows how well each model separates symptomatic days from asymptomatic ones across every possible decision threshold. The top-left corner is perfect; the diagonal is a coin flip. The dot marks the operating point chosen for clinical use (Youden's J: the best trade-off between catching true cases and avoiding false alarms).

Technical: Curves drawn from leave-one-out cross-validation predicted probabilities. Each vertex represents one threshold — the step-function shape is the statistically correct representation for n=61; smooth interpolation would misrepresent the data. AUC and 95% CI from 1,000 bootstrap resamples (seed = 42). Brain Fog (niebla_mental) AUC is inflated by a 55 vs 6 class split — interpret as suggestive only.

Forest plot

Side-by-side accuracy across symptoms and models

Each bar shows the model's accuracy (AUC) at predicting one symptom, with its uncertainty range (95% confidence interval) drawn as a horizontal line. An AUC of 1.0 means perfect prediction; 0.5 is random chance. The bars let you compare models side by side for each symptom.

Technical: AUC = area under the ROC curve from LOO-CV. Confidence intervals from 1,000 bootstrap resamples (seed = 42). Reference lines at 0.5 (random) and 0.7 (conventional acceptable threshold for clinical classifiers). Feature set selected by forward selection on logistic regression (max 5 features, ΔAUC ≥ 0.01), then applied to all three models for an apples-to-apples comparison.

Confusion matrices

Where exactly does each model make mistakes?

A confusion matrix shows exactly where each model makes mistakes. For each symptom and model the table reports: how many truly bad days were correctly flagged (True Positives), how many were missed (False Negatives), how many good days were wrongly flagged (False Positives), and how many good days were correctly left alone (True Negatives).

Technical: Predictions made at each model's Youden-optimal threshold (maximises sensitivity + specificity − 1). Counts from leave-one-out cross-validation — each sample predicted exactly once by a model trained on all the other samples.

Model details

What does each model actually use?

This section shows which physiological signals each model relies on most, and — for logistic regression — the direction of each signal's effect (positive bars push toward "symptom present", negative bars push toward "symptom absent"). For the tree-based models we show feature importance: how much each feature contributes to the model's splits, always positive.

Technical: LR coefficients computed on standardised features fitted on the full dataset (single fit — not LOO-CV) and are directly comparable in magnitude. RF and GBM importances are Gini-based, normalised to sum to 1 across the feature set. Identical feature set across all three models (forward selection on LR).

Metric glossary

AUC-ROCProbability that the model ranks a random symptomatic day above a random asymptomatic one. 1.0 = perfect, 0.5 = coin flip.

Sensitivity (Recall)Of all truly bad days, how many did the model catch? High sensitivity → few missed flares.

SpecificityOf all truly good days, how many did the model correctly leave unflagged? High specificity → few false alarms.

PPV (Precision)Of the days the model flags as bad, how many actually are? Answers: "when the alarm fires, can I trust it?"

NPVOf the days the model clears as good, how many actually are? Answers: "when no alarm, am I safe?"

F1-scoreHarmonic mean of precision and sensitivity (0–1). Standard ML summary statistic when classes are unequal.

MCCMatthews Correlation Coefficient (−1 to +1). The most robust single number for imbalanced binary classification. +1 = perfect, 0 = random.

Balanced Accuracy(Sensitivity + Specificity) / 2. More honest than raw accuracy when one class is more frequent.

Youden's J thresholdDecision threshold that maximises Sensitivity + Specificity − 1. Used for all confusion matrices here.

LOO-CVLeave-one-out cross-validation. Every sample is predicted once by a model that never saw it during training.

Bootstrap CI 95%Confidence interval estimated by resampling the dataset 1,000 times. Captures uncertainty from the small N.

Forward selectionGreedy feature search: add the single feature that most improves AUC at each step. Stops when gain < 0.01.

References

Scientific basis

The ANS Predictor is grounded in three lines of published evidence: autonomic dysfunction as a measurable feature of post-infectious syndromes, HRV as a predictive signal for symptom burden, and the statistical case for individual-level modelling over population averages.

[1] Evidence of altered cardiac autonomic regulation in myalgic encephalomyelitis/chronic fatigue syndrome: A systematic review and meta-analysis

Nelson MJ et al. · Medicine · 2019 · https://doi.org/10.1097/MD.0000000000017600

64-study meta-analysis confirming reduced HF-HRV and elevated LF/HF ratio at rest in ME/CFS — the autonomic phenotype this predictor targets.

[2] Similar Patterns of Dysautonomia in Myalgic Encephalomyelitis/Chronic Fatigue and Post-COVID-19 Syndromes

Ryabkova VA et al. · Pathophysiology · 2024 · https://doi.org/10.3390/pathophysiology31010001

HRV-based diagnostic prediction models for ME/CFS developed and validated; close correlation of HRV parameters with fatigue but not with depression/anxiety — consistent with Kinetica's symptom-specific model structure.

[3] Benefit of the N-of-1 Approach Versus Aggregate Analysis in Tracking Individual Trajectories

Behrouzi T et al. · JMIR Formative Research · 2026 · https://doi.org/10.2196/86203

Wearable HRV study showing aggregate models capture only 4.76% of individual HRV inflection patterns — direct methodological justification for Kinetica's idiographic design.

[4] Heart rate variability in cardiovascular disease diagnosis, prognosis and management

Wang BX et al. · Frontiers in Cardiovascular Medicine · 2026 · https://doi.org/10.3389/fcvm.2025.1680783

Current review acknowledging wearable + ML expansion of HRV utility while flagging measurement standardisation as an open challenge — addressed in Kinetica's published Polar pipeline.

All references retrieved from PubMed. DOI links open the original publications.

ANS Predictor Wearable symptom forecasting.