The ANS Predictor and the Sleep Quality Predictor were trained independently, on different feature candidates, asking different clinical questions. On the 42 days where both have data, they agree 79% of the time — and both independently selected the same feature as their strongest predictor.
In N-of-1 research, replication across independent analyses is the closest substitute for external validation. When two models — trained on different feature sets, optimised for different targets, validated through separate LOO-CV runs — independently converge on the same physiological signal, that convergence is evidence of a real biological relationship rather than overfitting noise.
hrv_rmssd_night_t0 — nocturnal RMSSD from the same night — is the feature independently selected by both models as their primary predictor of next-day fatigue. The ANS predictor selected it from a pool of 13 HRV and sleep variables. The Sleep predictor selected it from a different pool of 20 candidates. Both coefficients are negative and of similar magnitude (−0.923 vs −0.823), pointing to the same physiological direction: higher overnight parasympathetic activity reduces fatigue probability the following day.
This is not a trivial finding. The two models use different predictors for the remaining signal — the ANS predictor adds wake minutes and HF power; the Sleep predictor adds lag-1 RMSSD — which explains their different AUC levels. The convergence on the core signal is what matters.
The highlighted feature (hrv_rmssd_night_t0) appears in both models, independently selected, with nearly identical coefficient sign and magnitude. The remaining features are model-specific and account for the AUC difference.
AUC evaluated on the 42 days where both models have complete feature data (LOO-CV, 1,000× bootstrap CI). The ANS model's wider feature set gives it a 0.08 AUC advantage on this shared subset — but both confidence intervals overlap, and neither model has been prospectively validated.
Each point is one diary day. The x-axis is the Sleep predictor's estimated fatigue probability; the y-axis is the ANS predictor's. The dashed lines mark each model's Youden threshold (~0.42). Points in the top-right and bottom-left quadrants are days where both models agree. Colour shows actual fatigue level (blue = low-fatigue day, red = high-fatigue day).
The 79% day-level agreement and r=0.66 probability correlation are higher than chance (expected agreement ≈ 57% given class frequencies), but they are not independent validations — both models use the shared feature hrv_rmssd_night_t0, which drives most of the correlation.
The more meaningful finding is the independent feature selection convergence: two greedy forward searches on different feature pools, both stopping at the same first feature. This is what would survive a strict reproducibility test — not the day-level agreement, which partially reflects the shared predictor.
The 9 disagreement days (21%) are clinically interesting: these are days where autonomic vigilance patterns (ANS model) and nocturnal recovery patterns (Sleep model) send different signals. Analysing these days in detail would be the natural next step for a prospective study.
A combined model using all four features from both predictors (hrv_rmssd_night_t0, hrv_rmssd_night_t1, sleep_wake_min_t2, hrv_hf_power_t0) would be the natural third-layer hypothesis — testing whether the non-overlapping features provide additive predictive value above the shared nocturnal RMSSD signal. This is the planned next analytical step.