INTRODUCTION

In the USA, approximately 6.2 million older adults are currently living with dementia.1,2 Prior studies estimate that only about half have received a diagnosis.3,4 Individuals who lack a diagnosis may not get the support they need, leaving them vulnerable to potential harms including fragmented medical care, preventable illnesses, safety problems such as car and firearm accidents, and financial abuse.5,6,7,8,9,10,11 Growing awareness of this problem has motivated appeals for large-scale, low-cost strategies to improve dementia detection.4,12,13 To date, the United States Preventive Services Task Force has concluded that there is insufficient evidence on the benefits and harms to recommend universal dementia screening for adults age 65 and older.14,15 A targeted screening approach focused on those at elevated risk of dementia may be more effective and feasible for addressing the problem of undiagnosed dementia.

We previously developed an electronic health record (EHR)–based tool that uses routinely collected clinical data to identify patients with increased risk of having undiagnosed dementia, who could potentially be targeted for assessment.16 The EHR Risk of Alzheimer’s and Dementia Assessment Rule (eRADAR) was developed using data collected from 1994 to 2015 for the Adult Changes in Thought (ACT) study, a prospective cohort study of dementia embedded within Kaiser Permanente Washington (KPWA).17 eRADAR’s predictive performance has not been evaluated outside the original sample. Moreover, the ACT cohort was over 90% white, and diagnosis data used to develop eRADAR came from International Classification of Diseases, version 9 (ICD-9) codes; the USA has since transitioned to version 10 (ICD-10). There is a need for robust external validation to assess the accuracy of eRADAR in diverse populations and contemporary practice.

This study externally validated the eRADAR screening tool in two real-world health systems:18,19,20 KPWA, an integrated health system, and primary care practices at the University of California San Francisco Health system (UCSF), an academic medical center. This validation study addressed four aspects of eRADAR’s performance: (1) generalizability from a research population (ACT) to a real-world clinical population (KPWA); (2) transportability to a non-integrated healthcare system (UCSF) which, like most US healthcare systems, receives little information about care occurring outside of that system; (3) performance over time, including across the 2015 transition from ICD-9 coding to ICD-10 and other temporal changes in clinical practice; and (4) performance across racial/ethnic groups, to ensure that implementation does not exacerbate existing inequities in dementia diagnosis and treatment.1,21,22,23,24,25

METHODS

Setting

External validation of eRADAR was performed in two health systems—KPWA and UCSF—and compared to results from internal validation in the original study population (ACT). KPWA is a not-for-profit integrated health system providing insurance coverage and medical care (including primary and specialty care) to about 700,000 members in Washington, including 98,000 Medicare Advantage beneficiaries. UCSF Health is a not-for-profit academic medical system providing primary and specialty care. Three UCSF primary care practices were included in this study: Division of General Internal Medicine, Women’s Health, and Lakeshore Family Medicine. These practices deliver medical care to about 48,000 patients, 25% with Medicare or Medicare Advantage insurance.

KPWA and UCSF both utilize the Epic EHR system to record clinical information including encounters, diagnoses, procedures, and medication orders. Epic was deployed at KPWA in 2005 and at UCSF in 2011. KPWA also maintains a research virtual data warehouse with nearly complete capture of outside encounters, diagnoses, and medication dispensings, which is possible because of its role as an insurance provider; such data are not available in most other health systems,26 including UCSF, making it especially important to evaluate eRADAR in a more typical setting with data limited to encounters within that health system.

Institutional Review Boards at UCSF and KPWA approved the study procedures and granted waivers of consent and HIPAA authorization.

Study Design

This retrospective validation study was designed to reflect how eRADAR would be implemented in typical healthcare settings. We envision that eRADAR scores would be calculated periodically for patients without a dementia diagnosis, and patients identified as “high risk” would be recommended for cognitive assessment. For this validation analysis, we calculated eRADAR scores annually on January 1 of each study year (2010–2020 at KPWA and 2014–2019 at UCSF) in older patients without dementia based on clinical data in the prior 2 years. We then followed patients to identify incident dementia diagnoses in the subsequent 12 months. People could be included in the sample for multiple years. Predictive performance was evaluated at the person-year level (rather than the person level) to correspond to this design.

Study Population

Validation analyses included all patients aged 65 years or older who met site-specific criteria for adequate utilization with the health system (to ensure availability of clinical data), did not have a dementia diagnosis or dementia medication in the prior 2 years, and were not on hospice care. In order to study eRADAR’s validity in a population without documented memory problems, we excluded from the analysis patients with diagnoses in the prior 2 years of amnestic disorder/memory loss, mild cognitive impairment (MCI), and post-stroke cognitive impairment. Diagnosis codes, medications, and site-specific utilization criteria for inclusion and exclusion are provided in Supplementary Table S1.

KPWA members participating in the ACT study, whose data may have been used in eRADAR model development,16 were not excluded because patient identifiers were not available under that project’s Institutional Review Board approval. Impact on validation was likely negligible; the original eRADAR study only included visits from 1461 ACT study participants between 2010 and 2015, which comprise 1.1% of members in the KPWA validation sample for this study.

eRADAR Prediction Model

eRADAR was originally developed in a randomly selected training set (70% of the ACT sample) and internally validated in a testing set (remaining 30%). eRADAR includes 31 predictors of undiagnosed dementia such as demographic characteristics and diagnoses, medications, vital signs, and healthcare utilization from the prior 2 years (Table S2 and Table 1). For this external validation study, EHR data were extracted for the 2-year period before January 1 of the index year. Diagnoses of comorbid medical conditions were defined using ICD-9 codes recommended by Elixhauser27 or Charlson28 and ICD-10 conversions recommended by Quan et al.29 Details on diagnosis codes and predictor definitions are provided in Table S2.

Table 1 Summary of Patient Characteristics by Sample

Outcome: Undiagnosed Dementia

eRADAR was developed to predict risk of undiagnosed dementia, which was identified for ACT participants via formal assessment at biennial study visits.17 In real-world clinical practice, cognitive screening assessments are not routinely done, so there is no comparable measure of undiagnosed dementia available. These validation analyses used incident dementia diagnosis within 12 months of January 1 of the index year as a proxy for undiagnosed dementia. (See Table S3 for ICD-9/10 diagnosis codes.) Given that dementia diagnoses usually occur relatively late in the disease process,30 we hypothesized undiagnosed dementia was likely present at the start of the year in which it was diagnosed. For sensitivity analysis, we considered incident dementia diagnosis within 18 months to observe more events and more precisely estimate performance. Additional sensitivity analyses validated eRADAR for a composite outcome of incident dementia or MCI diagnosis within 12 months, since some providers may initially assign an MCI diagnosis when dementia is, in fact, present.31

Outcome observation was censored for death or, at KPWA, health plan disenrollment. At KPWA, deaths were identified through patient health records, insurance enrollment records, and state mortality records. At UCSF, deaths were identified through patient health records, which only include deaths at UCSF hospitals or for which the healthcare team is notified.

Measuring eRADAR Performance

We examined measures of performance that reflected how health systems would use eRADAR to identify high-risk patients to target for dementia assessment. To select an eRADAR cut point above which patients are classified as “high risk,” health system leaders consider whether that cut point accurately identifies people with undiagnosed dementia (sensitivity) while limiting unneeded evaluations for people without undiagnosed dementia (specificity). At each threshold, the intensity and cost of an intervention should be appropriate for the rate of undiagnosed dementia among those flagged as “high risk” (positive predictive value [PPV]). We considered the 99th, 95th, 90th, 85th, and 75th percentiles of the eRADAR score because these would be realistic, feasible cut points that a healthcare system might use. Performance was also evaluated using area under the curve (AUC), which summarizes sensitivity and specificity across all possible thresholds.32

eRADAR was also validated within subgroups defined by race/ethnicity. Race/ethnicity information is collected by both health systems via patient self-report at clinical visits. Neither race nor ethnicity is a predictor in eRADAR. The performance of eRADAR within racial/ethnic subgroups was evaluated for the 18-month incident dementia diagnosis to increase statistical power to detect differences across subgroups. “Fairness” was assessed on the basis of similar AUC, sensitivity (also known as equalized opportunity), and PPV (predictive parity) across race/ethnicity.33

Performance estimates were adjusted for censoring due to health plan disenrollment or death using inverse probability weighting.34 Analytic details are provided in the online supplement.

RESULTS

Analyses included 688,599 person-years among 129,315 patients at KPWA and 47,348 person-years among 13,444 patients at UCSF. Compared to the ACT cohort in which eRADAR was originally developed, external validation samples were younger and more racially/ethnically diverse and had fewer comorbidities and less healthcare utilization (Tables 1 and 2). The UCSF sample had a greater proportion of observations from Asian/Asian American, Black/African American, and Hispanic/Latinx patients than the KPWA sample. Medicare fee-for-service insurance coverage was also more common at UCSF than KPWA.

Table 2 Prevalence of eRADAR Predictors by Sample

eRADAR scores, that is, estimates of the risk of undiagnosed dementia, were generally low for both validation samples (median [interquartile range] = 1.0% [0.6–1.9%] for both) and about half of those observed in the ACT cohort (2.1% [1.2–3.8%]). Characteristics, predictors, and eRADAR scores of external validation samples were similar across study years (Table S4).

Incident dementia diagnoses were recorded for 7631 KPWA patients (a rate of 11.1 events per 1000 person-years) and 216 UCSF patients (4.6 events per 1000 person-years). As expected, given differences in sample characteristics and study design, a higher rate of undiagnosed dementia was observed in the ACT sample: 31 per 1000 biennial visits or 15 per 1000 person-years. Incident dementia rates in the UCSF sample declined from 6.0 per 1000 person-years in 2014 to 3.5 per 1000 person years in 2019 (Table S5). Rates at KPWA ranged between 9.3 and 14.4 events per 1000 person-years with no distinctive time trend.

Figure 1 shows receiver operating characteristic curves from the original internal validation sample (ACT) and current external validation samples (KPWA and UCSF). AUC was greater for KPWA (0.84 [95% confidence interval {CI}: 0.84–0.85]) than UCSF (0.79 [0.76–0.82]). AUC 95% CIs for both external validation samples overlapped with that of the ACT internal validation sample (0.81 [0.78–0.84]).

Figure 1
figure 1

Receiver operating characteristic curves for eRADAR prediction of 12-month incident dementia diagnosis in ACT testing set (solid line), KPWA validation sample (dotted line), and UCSF validation sample (dashed line).

eRADAR sensitivity was highest in the KPWA sample and similar between UCSF and ACT (Table 2). For example, classifying those with eRADAR scores above the 90th percentile as high-risk captures 54% (95% CI: 53–56%) of KPWA person-years with incident dementia diagnosis, 44% (38–51%) of UCSF person-years with dementia diagnosis, and 36% (28–44%) of ACT visits with undiagnosed dementia.

eRADAR PPV was greater in the ACT internal validation sample than the external validation samples (Table 3), as was expected because PPV is strongly influenced by the outcome rate.35 To quantify how eRADAR could improve identification of high-risk individuals for dementia evaluation, we compared outcome rates in the entire sample (which represents the PPV of universal screening) to rates among those designated as high risk at a given cut point (PPV of eRADAR). In the ACT testing set, visits with eRADAR scores above the 90th percentile were 3.7 times more likely to have undiagnosed dementia than the average visit. Similarly, KPWA and UCSF person-years with eRADAR scores above the 90th percentile were, respectively, 4.3 and 4.4 times more likely to receive an incident dementia diagnosis within the next year than the average person-year.

Table 3 Classification Accuracy (% [95% CI]) of eRADAR Prediction Model by Validation Sample

eRADAR performance in external validation samples was similar across years (Figure S1) and across the ICD-9/10 transition (Fig. 2). AUC at KPWA was 0.84 during ICD-9 years and 0.85 during ICD-10; AUC at UCSF was 0.79 and 0.78 in ICD-9 and ICD-10 years, respectively (Figure S2). PPV showed small, not clinically meaningful, variability year to year, likely because PPV is sensitive to small changes in outcome rate when outcomes are relatively rare, as seen in these samples.

Figure 2
figure 2

Performance of eRADAR for predicting 12-month incident dementia diagnosis in KPWA and UCSF external validation sets across the International Classification of Disease (ICD) version 9 to 10 transition, measured by sensitivity (a and b) and positive predictive value (PPV, c and d) of eRADAR scores above the 85th (solid line) and 95th (dotted line) percentiles. Shaded regions indicate point-wise 95% confidence intervals. ICD-9 years include all observations with an index year of 2014 or earlier (before the ICD-10 transition). ICD bridge years include observations from 2015 to 2017 for which clinical data in the prior 2 years (used to calculate the eRADAR score) may include ICD-9 diagnosis codes; additionally, incident dementia diagnoses for 2015 observations may include ICD-9 or ICD-10 codes, as the transition occurred on October 1, 2015. ICD-10 years include observations from 2018 and later years where eRADAR predictors and outcomes were defined using ICD-10 diagnosis codes exclusively.

In sensitivity analyses, changes in outcome definition and ascertainment period did not meaningfully affect performance (Tables S6-S7). Changes in PPV followed the expected pattern (higher PPV with higher outcome incidence) and did not indicate unanticipated variation in performance.

eRADAR also performed similarly across racial/ethnic groups (Fig. 3, Table S8), though there was wider variability in estimates for subgroups at UCSF due to smaller sample sizes.

Figure 3
figure 3

Performance of eRADAR for predicting 18-month incident dementia diagnosis in racial/ethnic subgroups at KPWA and UCSF, measured by area under the curve (AUC, a) and by sensitivity (b) and positive predictive value (PPV, c) of eRADAR scores above the 85th percentile. Point estimates and 95% confidence intervals (lines) are shown. AT UCSF, there were too few observations and events within person-years with American Indian/Alaskan Native, Native Hawaiian/ Pacific Islander, or other race indicated or without race/ethnicity recorded to evaluate performance within these groups. Multiple races, or multiracial, is not an option for patient self-report at UCSF.

DISCUSSION

eRADAR uses routinely collected clinical data to predict risk of undiagnosed dementia in older patients, with the goal of identifying high-risk patients who could potentially be targeted for dementia assessment. In this study, eRADAR demonstrated strong external validity in two diverse health systems, including evaluation of temporal trends and racial/ethnic differences.

Our findings affirm the generalizability of eRADAR from a research sample to real-world clinical populations. Volunteers for research studies are typically not representative of general patient populations,36,37,38 and, as shown in our data, ACT participant characteristics do not reflect those of KPWA members overall. We found that eRADAR accurately predicted dementia risk in a wider sample of KPWA members.

This study also demonstrated eRADAR’s transportability to a new setting that better represents health systems in which the prediction model is likely to be implemented.19,20 eRADAR was developed using data from KPWA, an integrated healthcare system that has nearly complete capture of external care, unlike most US healthcare systems. UCSF is more typical in that it has easy access to EHR data only for care provided within the health system, and many patients receive additional care in other settings. UCSF also serves a patient population with more socioeconomic and racial/ethnic diversity than KPWA. Successful external validation of eRADAR at UCSF suggests that eRADAR may accurately predict undiagnosed dementia risk in a variety of healthcare settings and populations.

Our findings highlight the need to match clinical prediction models with appropriate interventions based on their performance characteristics. eRADAR shows high discrimination, sensitivity, and specificity in both healthcare settings examined, but the PPV remains low due to a low incidence of dementia diagnosis in the populations served.39 Because most patients classified as high risk by eRADAR will not have dementia, follow-up should not be overly invasive, expensive, or burdensome.12 Care should be taken to address the potential for stigma, anxiety, and increased suicide risk that may result from patients being identified as high risk.6,7,40,41

One study limitation is that outcomes were derived from dementia diagnoses indicated in the EHR (perhaps without formal cognitive assessment) and, as such, were susceptible to misclassification, including both under- and overdiagnosis. In particular, validation analyses likely underestimated PPV because dementia is underdiagnosed (by as much as 50%) in routine clinical practice.3,4 Original estimation of eRADAR benefited from a “gold standard” assessment of dementia status (biennial cognitive screening in a research study), in which it was likely that very few dementia cases were missed.17 For this large-scale validation study, it was not feasible to conduct universal cognitive screening. As an alternative, we evaluated how well the eRADAR model predicted future dementia diagnoses in the EHR. We are currently planning a randomized pragmatic trial to examine the impact of providing cognitive and functional evaluations to patients with high eRADAR scores. Results from that study will improve estimation of eRADAR’s performance among high-risk patients as well as identify factors supporting and barriers to implementation of eRADAR in clinical practice.

Comparing the performance of clinical prediction models across racial and ethnic groups is fundamental to ensuring their use does not exacerbate existing inequities in access to needed healthcare services.33,42,43,44 Our analysis indicated that eRADAR provided equal opportunity for benefit across race/ethnicity, that is, a similar proportion of patients later diagnosed with dementia were correctly identified as high risk. eRADAR also showed predictive parity across racial/ethnic groups, meaning that patients classified as high risk had similar rates of incident dementia diagnoses. A limitation of this analysis is that the precision of performance estimates was lower within subgroups with a small number of incident dementia diagnoses observed, particularly Native Hawaiian and Pacific Islander members at KPWA and all racial/ethnic groups at UCSF except White non-Hispanic and Asian patients (including American Indian and Alaskan Native, Black, Hispanic, and Native Hawaiian and Pacific Islander patients). Additionally, underdiagnosis of dementia is disproportionately common in Black/African American and Hispanic/Latinx patients;21,22,23 the potential for differential outcome misclassification is an additional limitation of our validation analyses within racial/ethnic groups. To address these gaps, our trial will prospectively monitor eRADAR accuracy in racial/ethnic groups.

eRADAR performance was also robust to temporal variability, including the ICD-9/10 transition. Temporal validation is essential for clinical prediction models intended for real-world deployment to establish prospective accuracy.19,20 Performance of models that depend on healthcare utilization (including diagnostic assessments, measuring vital signs, or performing labs and imaging) may be impacted by changes in usual care practices. This study’s evaluation of temporal changes in eRADAR’s performance was unable to capture the potential impact of the COVID-19 pandemic, including interruptions in healthcare and the shift to telemedicine. External validation at KPWA included incident dementia diagnoses observed in 2020, but pandemic effects are not adequately reflected in eRADAR predictors, which are measured over the preceding 2 years, for any observations in this study. Ongoing monitoring of performance is needed for any clinical prediction model to verify that accuracy is maintained and is particularly important given the major impact of the COVID-19 pandemic on healthcare utilization including deferred care and telemedicine.

CONCLUSION

This study validated eRADAR, which predicts risk of undiagnosed dementia using EHR data, in two real-world health systems whose patient populations differ from the original development sample. External validation demonstrated eRADAR’s generalizability to a non-research sample and transportability to a different US health system. eRADAR’s performance was robust to temporal changes and consistent across racial/ethnic groups. eRADAR may be a useful tool to help healthcare providers identify high-risk patients to target for dementia assessment.