Literature DB >> 31700988

Practical algorithms for amyloid β probability in subjective or mild cognitive impairment.

Nancy Maserejian¹, Shijia Bian², Wenting Wang², Judith Jaeger^3,4, Jeremy A Syrjanen⁵, Jeremiah Aakre⁵, Clifford R Jack⁶, Michelle M Mielke⁷, Feng Gao².

Abstract

INTRODUCTION: Practical algorithms predicting the probability of amyloid pathology among patients with subjective cognitive decline or mild cognitive impairment may help clinical decisions regarding confirmatory biomarker testing for Alzheimer's disease.
METHODS: Algorithm feature selection was conducted with Alzheimer's Disease Neuroimaging Initiative and Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing data. Probability algorithms were developed in Alzheimer's Disease Neuroimaging Initiative using nested cross-validation accompanied by stratified subsampling to obtain 1000 internally validated decision trees. Semi-independent validation was conducted using Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing. Independent external validation was conducted in the population-based Mayo Clinic Study of Aging.
RESULTS: Two algorithms were developed using age and normalized immediate recall z-scores, with or without apolipoprotein E ε4 carrier status. Both algorithms had robust performance across data sets and when substituting different recall memory tests. DISCUSSION: The statistical framework resulted in robust probability estimation. Application of these algorithms may assist in clinical decision-making for further testing to diagnose amyloid pathology.

Entities: Chemical

Keywords: ADNI; AIBL; APOE ε4; Algorithm; Alzheimer's disease; Amyloid; Biomarker; Immediate recall; MCSA

Year: 2019 PMID： 31700988 PMCID： PMC6827360 DOI： 10.1016/j.dadm.2019.09.001

Source DB: PubMed Journal: Alzheimers Dement (Amst) ISSN： 2352-8729

Introduction

Alzheimer's disease (AD) dementia is a chronic neurodegenerative disorder that is both progressive and irreversible [1,2]. Accumulation of brain amyloid beta (Aβ) and tau pathology are defining characteristics of the AD continuum and occur decades before cognitive symptoms are present [1,[3], [4], [5]]. Early intervention to alter the underlying Aβ or tau pathology is considered a potential approach to prevent or delay AD progression, and such treatments are in development [[6], [7], [8], [9]]. Although biomarkers for Aβ and tau pathology are often used to diagnose AD in research settings, these biomarkers are not typically used to diagnose AD in routine clinical practice today, primarily owing to resource limitations and costs [1]. If a new therapy targeting AD pathology were to become available, methods to confirm the presence of AD pathology, including positron emission tomography (PET) imaging—the only US Food and Drug Administration–approved biomarker for AD—and cerebrospinal fluid (CSF) measures, are projected to remain inaccessible to many patients [1,8,9]. As such, practical methods to determine which patients are most likely to benefit from more invasive and costly confirmatory biomarker testing for the presence of AD pathology may be helpful for prioritizing potentially limited resources. The objective of this work was to provide practical algorithms to estimate the probability that a patient exhibiting cognitive problems possibly due to AD is Aβ positive, using currently available inputs. Prior research has identified factors that are associated with Aβ pathology, such as age, cognitive impairment, apolipoprotein E (APOE) genotype, CSF inflammatory or protein biomarkers [10], and certain lifestyle factors [[11], [12], [13], [14]]. However, risk factor models do not directly translate into clinically useful or practical algorithms. Many models lack external validation, include inputs with small effect sizes, or include inputs that are burdensome or costly (e.g., extensive neuropsychological testing or imaging) [10,[15], [16], [17]]. Recently, more practical algorithms to estimate the likelihood of Aβ positivity among patients with subjective cognitive decline (SCD) or mild cognitive impairment (MCI) were published [11,18,19]. For example, the Swedish BioFINDER study's “optimal” model used data on age, APOE genotype, and delayed recall score [18]. Although performance, as measured by area under the curve (AUC), has been acceptable in these reports, the algorithms published to date have limited flexibility because they require the input of APOE genotype and a specified cognitive test. Moreover, the data sets are composed of patients from highly specialized clinics, and it is unknown whether the performance would remain robust in a broader population of symptomatic individuals. Our intention was to develop algorithms that would support clinical decision-making regarding future biomarker testing, while also allowing quick administration and flexible inputs, such that providers could select their preferred cognitive measures and use of genetic information. We anticipated that algorithms using currently available inputs would not replace Aβ tests but rather allow providers to more efficiently and confidently send symptomatic patients for more invasive and costly Aβ testing, if needed for diagnosis and treatment planning. To cast a broad net and improve power to detect predictors, we first included all nondemented participants across two data sets in the analysis to identify predictors; in the next phase of deriving probability estimates, we focused on symptomatic patients owing to the current clinical context in which symptoms are ascertained before considering pathology. Given that there are scenarios in which genetic testing is not conducted, we designed two versions: one utilizing APOE ε4 data and another without it. To achieve robust and generalizable algorithms, we developed a multistage statistical framework, using a combination of epidemiologic and random forest decision tree modeling methods, with an independent external validation using a community population-based sample.

Methods

We developed a statistical framework and multiphase approach to develop and validate the algorithms using 3 data sources: the Alzheimer's Disease Neuroimaging Initiative (ADNI), Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing (AIBL), and Mayo Clinic Study of Aging (MCSA). The analysis phases were (1) initial feature (i.e., variable) selection in ADNI and AIBL, (2) deep development of probability algorithms in ADNI, (3) semi-independent validation in AIBL, and (4) independent external validation in the population-based MCSA. The goal was to develop an algorithm that was not over-fit to a particular sample of individuals, and then test it in an entirely different sample of individuals with SCD or MCI, achieving good performance in both clinic-based and community-based research cohorts. Therefore, the differences among the cohorts are beneficial as external validation. Before initiating the analysis, we conducted a literature review of factors associated with Aβ to provide critical context to interpret results of the subsequent data-driven approach.

Initial feature selection in ADNI and AIBL

The goal of this phase was to identify the variables that most strongly predicted elevated brain Aβ among nondemented participants with consistency across two different data sets, ADNI and AIBL. These variables, referred to as “features” hereafter, would then be carried forward to the next phase for the deep development of predicted probabilities. ADNI is a multisite longitudinal study launched in 2004, observing the impact of aging via clinical assessment, imaging, and biomarkers in a population largely recruited from memory clinics. AIBL is also a longitudinal study of aging and was launched in 2006 with a focus on cognition, biomarkers, and lifestyle factors in the development of AD. Study designs were previously published for ADNI [20,21] and AIBL [22], and updated methods are available online: http://adni.loni.usc.edu and https://aibl.csiro.au. ADNI and AIBL participants were included if they had a completed Aβ-PET scan and qualified as cognitively unimpaired, SCD or MCI at their first Aβ-PET scan. Feature selection included cognitively unimpaired participants to maximize the power of feature selection and understand the importance of SCD indicators. Selection of candidate features for the algorithm was driven by both quantitative statistical measures and practical considerations (e.g., length and ease of administration, suitability of the instrument to a range of educational levels). A data-driven approach was used to analyze all available demographic, medical history, physical examination, vital signs, genetic, family history and lifestyle factors, and neuropsychological tests including summary scores, domain-level scores, and item-level scores, as potential features for selection; 654 variables from ADNI and 169 variables from AIBL were analyzed. Because the objective was a quick, low-burden, and low-cost algorithm that could be used in a primary care setting, data from magnetic resonance imaging or CSF biomarkers were not considered as candidate variables. Internally validated decision trees were used to identify the strongest features from ADNI (n = 760 participants) and AIBL (n = 746 participants). For primary analysis of predictors of Aβ pathology, 80% of participant samples were used from each data set and trained 100 × 1000 times. Aβ pathology was classified as present by PET in accordance with the ADNI and AIBL study protocols (Supplementary Material). The importance of the potential features was evaluated by the frequency that they were present in the simulated decision trees and their average position when present. Once the key features were determined, iterations of models were compared on distribution of the AUCs to determine which combination of features would provide the strongest overall performance.

Deep development of probability algorithm in ADNI

To derive the estimated probability of Aβ positivity, we developed a new simulation framework using nested cross-validation accompanied by stratified subsampling procedures of participants with SCD or MCI in ADNI and decision tree methods. MCI status was ascertained according to the ADNI and AIBL study criteria, which were consistent with accepted clinical methods [23,24] (Supplementary Material). SCD was classified as present in the ADNI data set among participants who did not qualify as having MCI if either the patient or informant Everyday Cognition Questionnaire score reached its respective threshold score (≥1.31 informants or ≥1.36 participants) [[25], [26], [27]]. For AIBL, SCD was classified as present by an IQcode 16-item short form score of ≥3.38 [28,29]. The methods used to ascertain SCD are not prescriptive components of the algorithms but rather operationalization of the available measures in each study to allow a reasonable representation of a target patient population. Two algorithms were developed, either omitting or including APOE ε4 carrier status, which was confirmed to be a strong predictor during the first phase of this analysis (see the Results section). The statistical framework used stratified resampling and maximum AUC–split criteria to obtain internally validated decision trees averaged from 1000 iterations (Fig. 1). The decision tree method estimates probability based on the proportion of amyloid-positive samples in each class/category under each tree branch. The procedure was repeated, with stratified sampling of 250 participants resampled 1000 times, thereby simulating 1000 optimal decision trees trained from the target patient population. Sampling was stratified by 5-year age groups and MCI:SCD proportion within age group to resemble the target clinical scenarios. The predicted probability of Aβ positivity was obtained using the average of 1000 optimal decision trees for each combination of predictors.

Fig. 1

Statistical framework: procedure for one optimal decision tree. One thousand decision trees were used to derive the probability distribution of the algorithm, with resampling without replacement using age-stratified and subjective cognitive decline:mild cognitive impairment–stratified sampling. Abbreviation: AUC, area under the curve.

Algorithm validation in AIBL and MCSA

Because AIBL was used in the first phase of feature selection but not used to derive probability estimates, the AIBL data set served as a semi-independent validation of the probability estimation of the algorithms. For this semi-independent validation, the AIBL data set was analyzed using 1000 iterations sampled by 5-year age groups and SCD:MCI proportion within age groups to resemble the target clinical scenarios, consistent with our method of the ADNI sampling. A fully independent external validation of the algorithms was conducted using the MCSA, an epidemiologic study of aging and MCI in a population-based cohort. The study design was detailed previously [30].The primary validation included 711 participants with SCD (n = 490) or MCI (n = 221) at the most recent Aβ measure occurring from November 2006 to August 2017. MCI was defined using consensus agreement and published criteria [24,30]. SCD was determined to be present if the patient or informant Everyday Cognition Questionnaire score reached its respective threshold (≥1.31 informants or ≥ 1.36 participants) [[25], [26], [27]]. Aβ positivity was defined by Pittsburgh compound B PET standardized uptake value ratio >1.42 [31]. A secondary validation further included cognitively unimpaired participants without SCD (n = 1012) to evaluate the potential for broader clinical application of the algorithms in all nondemented (n = 1723). Algorithm performance was evaluated using AUC, specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and positive likelihood ratio. A probability threshold of 0.5 was applied to the primary analysis of performance, and secondary analyses examined performance using probabilities of 0.4, 0.6, and 0.7 as thresholds to predict positive Aβ status. Sensitivity analyses were conducted by age subgroup and by MCI or SCD status. Although the algorithm was derived using the normalized immediate recall score on a word list learning task, its performance using other recall measures was also evaluated. Specifically, using the AIBL data, the Rey Auditory Verbal Learning Test (RAVLT) immediate recall z-score was substituted with the z-scores of the California Verbal Learning Test (CVLT) immediate, short-delayed, and long-delayed recall measures; using ADNI and MCSA data, the CVLT z-scores were substituted with the RAVLT short-delayed and long-delayed measures. Z-scores were obtained from published norms that were age-adjusted or age- and sex-adjusted. To estimate the potential impact of the algorithm on referrals to specialists or Aβ confirmation testing, the performance metrics from MCSA validation were applied to the numbers of patients projected to be in the health care system for possible SCD or MCI due to AD on approval of a disease-modifying therapy for AD [8,9].

Results

Feature selection in ADNI and AIBL

Characteristics of the ADNI and AIBL samples are summarized in Table 1. Of the 654 variables in ADNI and 169 features in AIBL, output from the 100 × 100 decision trees indicated that APOE ε4 status, age, and cognitive test consistently had the highest predictive value (Supplementary Fig. 1). The combination of all three features provided superior performance compared with combinations of just two or one (e.g., all three AUC = 78.3%; age and APOE ε4 AUC = 72.9%, ADNI; Supplementary Fig. 2).

Table 1

Characteristics of the data sets used for development and validation

Characteristic	ADNI			AIBL			MCSA
	Feature selection and probability development			Feature selection and semi-independent validation			Validation
	Aβ+	Aβ−	Total	Aβ+	Aβ−	Total	Aβ+	Aβ−	Total
Participants, n	311	307	618	152	108	260	352	359	711
SCD, n (%)	49 (15.7)	95 (30.9)	144 (23.3)	35 (23.0)	46 (42.6)	81 (31.1)	204 (57.9)	286 (79.7)	490 (68.9)
MCI, n (%)	262 (84.2)	212 (69.1)	474 (76.7)	117 (77.0)	62 (57.4)	179 (68.9)	148 (42.1)	73 (20.3)	221 (31.1)
Female, n (%)	147 (47.3)	143 (46.6)	290 (46.9)	63 (41.4)	44 (40.7)	107 (41.2)	156 (44.3)	141 (39.3)	297 (41.8)
Mean age, y (SD)	72.9 (6.9)	70.1 (7.2)	71.6 (7.2)	76.0 (6.5)	72.5 (7.5)	74.5 (7.1)	79.6 (7.9)	70.6 (10.4)	75.1 (10.3)
Higher education, n (%)∗	201 (64.6)	216 (70.4)	417 (67.5)	44 (28.9)	35 (32.4)	79 (30.4)	132 (37.5)	138 (38.4)	270 (38.0)
APOE ε4 status, n (%)
Noncarrier	113 (36.3)	234 (76.5)	347 (56.1)	62 (40.8)	89 (82.4)	151 (58.1)	181 (51.4)	285 (79.4)	466 (65.5)
Carrier, heterozygous	154 (49.5)	64 (20.9)	218 (35.3)	61 (40.1)	13 (12.0)	74 (28.5)	146 (41.5)	64 (17.8)	210 (28.3)
Carrier, homozygous	43 (13.8)	8 (2.6)	51 (8.3)	15 (9.9)	0	15 (5.8)	18 (5.1)	1 (0.3)	19 (2.7)
Missing APOE ε4 data	1 (0.3)	1 (0.3)	2 (0.3)	14 (9.2)	6 (5.6)	20 (7.7)	7 (2.0)	9 (2.5)	16 (2.3)

Abbreviations: Aβ, amyloid β; ADNI, Alzheimer's Disease Neuroimaging Initiative; AIBL, Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing; APOE, apolipoprotein E; MCSA, Mayo Clinic Study of Aging; SCD, subjective cognitive decline; MCI, mild cognitive impairment.

Higher education was defined as years of education ≥16 in ADNI and MCSA and ≥15 in AIBL.

Characteristics of the data sets used for development and validation Abbreviations: Aβ, amyloid β; ADNI, Alzheimer's Disease Neuroimaging Initiative; AIBL, Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing; APOE, apolipoprotein E; MCSA, Mayo Clinic Study of Aging; SCD, subjective cognitive decline; MCI, mild cognitive impairment. Higher education was defined as years of education ≥16 in ADNI and MCSA and ≥15 in AIBL. Iterative comparison of algorithm performance to select an appropriate cognitive assessment indicated that various cognitive assessments performed similarly well, with median AUCs over 1000 iterations ranging from 0.72 to 0.76 in ADNI and 0.70 to 0.72 in AIBL (Supplementary Fig. 3). Recall measures had slightly higher median AUCs (e.g., 0.75 AVLT in ADNI; 0.72 CVLT in AIBL) than global measures such as the Mini–Mental State Examination (0.73 in ADNI; 0.70 in AIBL), clock drawing (0.72 in ADNI; 0.71 in AIBL), or most measures of verbal fluency, language, attention, and subjective cognitive decline, but differences were not significant. The similarity across cognitive tests remained when the iterative decision trees included APOE status and age (AUCs in ADNI: 0.70 Alzheimer's Disease Assessment Scale-Cognitive to 0.72 Boston Naming Test; AUCs in AIBL: 0.68 clock drawing to 0.70 CVLT). Given this solid performance across cognitive tests, recall measures were selected for deriving predicted probabilities, based on consistency with prior research [11,32], relative ease of administration, and evidence that performance on recall tests is less affected by education compared with performance on other cognitive tests [33]. Because the AUC was the same whether the algorithm used immediate or delayed recall (e.g., ADNI AUC = 0.75 for either immediate or delayed; AIBL AUC = 0.72 for either), the immediate recall test was selected for time efficiency. The available recall measures differed in ADNI and AIBL (e.g., ADNI uses RAVLT [34], whereas AIBL uses CVLT) [35]. To allow input of different measures, raw scores were transformed to their normalized z-scores. The final variables included age, APOE ε4 status, and immediate recall z-score.

Probability distribution development in ADNI

To derive the predicted probabilities of Aβ positivity for each combination of predictors, 1000 optimal decision trees were run for each algorithm: algorithm 1 using only age and immediate recall z-score (RAVLT) and algorithm 2 also using APOE ε4 carrier status. For a display of the final probability distributions based on patient characteristics of age, immediate recall test z-score, and, if desired, APOE ε4 status, the algorithms were expressed as heat maps (Fig. 2). For both algorithms, the probability increased with increasing age and decreasing recall z-score. For algorithm 2, probability also increased in APOE ε4 carriers, a strong predictor such that all adult carriers over age 50 y had probability ≥0.5. With the heat map, individuals can be mapped to a combination of their age and recall z-score to obtain the estimated probability; if a certain probability threshold (e.g., ≥0.5) was deemed appropriate given available resources for referral or Aβ confirmation, then that threshold could be overlaid onto the heat map (e.g., dashed red lines in Fig. 2A, representing the ≥0.5 probability threshold). Alternatively, the inputs for a given patient could be entered into a clinical calculator that outputs the predicted probability with CIs. For example, for a patient aged 70 years with a recall z-score of −1.25, the estimated probability is 0.49 (interquartile range 0.35–0.63); if it is known that the patient is an APOE ε4 carrier, the estimated probability shifts to 0.70 (interquartile range 0.68–0.78).

Fig. 2

Average heat maps for predicted Aβ positive status based on 1000 optimal decision trees. Algorithms generated using immediate recall test z-score and age without (A) or with (B) consideration of APOE ε4 status. Red indicates higher probability of Aβ positivity, blue indicates higher probability of Aβ negativity. The hatched red line indicates the threshold for a probability of >0.5 to be considered predicted positive; different probability thresholds can be applied as appropriate depending on the clinical context and available resources. Abbreviations: Aβ, amyloid β; APOE, apolipoprotein E

Validation

Performance metrics of each algorithm were consistent in the ADNI, AIBL, and MCSA populations, indicating a robust performance across these settings. Fig. 3 shows the performance metrics when a 0.5 probability was applied as the threshold of interest to predict Aβ positivity in the two validation data sets: with just age and recall z-score, the algorithm achieved PPV 66% in MCSA and 67% in AIBL; including APOE ε4 input, the PPV was 69% in MCSA and 76% in AIBL. The 0.5 probability threshold resulted in the best AUC, at 71%; however, given the high cost of Aβ testing, a higher PPV might be more helpful. Table 2 shows the performance when different probability thresholds were applied to the MCSA validation data set. For example, a probability threshold of 0.70 provided PPV 83%, with NPV 60%. Using a higher probability threshold also improved the positive likelihood ratio, from 2.3 to 5.0, indicating that the potential impact of the algorithm on clinical decision-making was increased.

Fig. 3

Table 2

Impact of varying probability thresholds: MCSA validation data set∗

Probability	≥0.7	≥0.6	≥0.5	≥0.4
PPV	83%	79%	69%	64%
Specificity	93%	89%	67%	52%
NPV	60%	61%	73%	80%
Sensitivity	35%	43%	75%	87%
Likelihood ratio positive (95% CI)	5.01 (3.32–7.56)	3.84 (2.77–5.31)	2.25 (1.92–2.65)	1.81 (1.61–2.04)
AUC	64%	66%	71%	69%

Abbreviations: AUC, area under the curve; MCSA, Mayo Clinic Study of Aging; NPV, negative predictive value; PPV, positive predictive value.

Data shown for algorithm using age, recall test z-score, and apolipoprotein ε4 carrier status.

Performance metrics of algorithm 1 (age and immediate recall) and algorithm 2 (age, immediate recall, and apolipoprotein ε4 status) based on 0.5 probability for predicting positivity in the validation data sets. ADNI validation was an internal validation using resampling for n = 250 over 1000 iterations; AIBL validation was a semi-independent validation using resampling of n = 91 over 1000 iterations; and MCSA was a fully independent validation (n = 711 in algorithm 1 validation; n = 695 in algorithm 2 validation). Abbreviations: ADNI, Alzheimer's Disease Neuroimaging Initiative; AIBL, Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing; AUC, area under the curve; MCSA, Mayo Clinic Study of Aging; NPV, negative predictive value; PPV, positive predictive value. Impact of varying probability thresholds: MCSA validation data set∗ Abbreviations: AUC, area under the curve; MCSA, Mayo Clinic Study of Aging; NPV, negative predictive value; PPV, positive predictive value. Data shown for algorithm using age, recall test z-score, and apolipoprotein ε4 carrier status. The algorithms were further evaluated in subgroups, stratifying the MCSA data by SCD, MCI, or age (50–64.9, 65–74.9, 75–84.9, ≥85.0 years) (Supplementary Table 1). For both algorithms, PPV was higher among individuals with MCI (71% or 76% with APOE ε4 status) compared with those with SCD (58% or 63%), although specificity was low in MCI (28% or 42%). For both algorithms, PPV increased with increasing age. Additional sensitivity analysis stratified by sex or education showed that AUCs were similar for males and females, although PPV was higher in females and NPV was higher in males, and algorithm 2 performed more consistently across education levels than did algorithm 1 (Supplementary Table 1). In secondary analysis that explored the performance among 1723 nondemented (including cognitively unimpaired without SCD) MCSA participants, AUC decreased slightly (algorithm 1: 66% to 64%; algorithm 2: 71% to 67%), NPV increased, and PPV decreased (Supplementary Table 2). In separate analysis of a small sample of MCSA participants with mild dementia (n = 23 for algorithm 1 and n = 22 for algorithm 2), both algorithms performed with 100% sensitivity and high PPV (82%–83%).

Substitution of recall tests

Performance remained robust to substitution of different recall measure z-scores, with the AUC, PPV, and NPV stable within 1%–2% of the original result for algorithm 2 and within 4%–5% for algorithm 1 (Supplementary Fig. 4).

Potential impact on projected health care system constraints

On availability of a new AD-modifying therapy, approximately 14 to 15 million patients with possible MCI due to AD are estimated to be eligible for referral or biomarker testing in the US or select European health care systems [8,9]. Applying the algorithms as observed in the MCSA validation to this projected population could prevent an estimated 1.0 to 2.8 million negative Aβ confirmation tests while helping identify 0.1 to 3.4 million Aβ-positive symptomatic patients, depending on the desired probability threshold (Table 3).

Table 3

Projected impact of applying Aβ probability algorithms for the 14.9 million US patients aged ≥55 years projected to screen positive for MCI∗

Scenario	RAND report projected number	Applying algorithm†
	RAND report projected number	≥0.6 Probability threshold		≥0.5 Probability threshold		≥0.4 Probability threshold
	(No algorithm)	Age, recall	With APOE ε4	Age, recall	With APOE ε4	Age, recall	With APOE ε4
Send to Aβ confirmation	6.7 M	5.1 M	4.0 M	7.7 M	8.0 M	9.9 M	10.0 M
Confirmed (true + sent)	3.0 M	3.6 M	3.1 M	5.0 M	5.5 M	6.1 M	6.4 M
Not confirmed (false + sent)	3.7 M	1.5 M	0.9 M	2.7 M	2.5 M	3.8 M	3.6 M

Abbreviations: Aβ, amyloid β; APOE, apolipoprotein E.

Projected numbers obtained from the RAND report for US health care system readiness for an Alzheimer's disease–modifying therapy; projections for five European countries were of similar magnitude, with an estimated 14.3 M patients in those health care systems screening positive for mild cognitive impairment (data not shown) [8,9].

Algorithm listed as “age, recall” uses age and recall z-scores. Algorithm listed as “with APOE ε4” uses age, recall z-score, and APOE ε4 positive status. Values in the “send to Aβ confirmation” row refer to patients who would be predicted positive with the algorithm for a given threshold for probability (e.g., as displayed in table: 0.6, 0.5, and 0.4 probability). Values are derived from the performance of the algorithms in the Mayo Clinic Study of Aging validation data set using Rey Auditory Verbal Learning Test immediate recall z-score.

Projected impact of applying Aβ probability algorithms for the 14.9 million US patients aged ≥55 years projected to screen positive for MCI∗ Abbreviations: Aβ, amyloid β; APOE, apolipoprotein E. Projected numbers obtained from the RAND report for US health care system readiness for an Alzheimer's disease–modifying therapy; projections for five European countries were of similar magnitude, with an estimated 14.3 M patients in those health care systems screening positive for mild cognitive impairment (data not shown) [8,9]. Algorithm listed as “age, recall” uses age and recall z-scores. Algorithm listed as “with APOE ε4” uses age, recall z-score, and APOE ε4 positive status. Values in the “send to Aβ confirmation” row refer to patients who would be predicted positive with the algorithm for a given threshold for probability (e.g., as displayed in table: 0.6, 0.5, and 0.4 probability). Values are derived from the performance of the algorithms in the Mayo Clinic Study of Aging validation data set using Rey Auditory Verbal Learning Test immediate recall z-score.

Discussion

We developed and validated two practical algorithms to determine the probability of Aβ positivity in patients with SCD or MCI, using a rigorous statistical framework for probability estimation in both clinical and population-based data sets. Feature selection was guided by the principle that to increase efficiency of biomarker testing, an algorithm ideally should be based on inputs that are quickly administered and readily available while still performing with high test characteristics. As such, algorithm 1 was developed requiring only inputs of age and an immediate recall test, which may be administered in approximately 5 minutes. Algorithm 2 also considered APOE ε4 carrier status, a quick and often easily accessible genetic test. Both algorithms were robust across clinic-based populations (ADNI, AIBL) and the population-based sample participants (MCSA). A strength of this study was the creation of a rigorous statistical framework as a foundation for the probability estimation. By using nested cross-validation with stratified subsampling procedures, problems caused by heterogeneity among data sets were reduced and modeling for the specific target population was improved. This framework prevents overfitting and increases reproducibility and model robustness. Indeed, the algorithms' performance metrics were largely similar across ADNI, AIBL, and MCSA, despite the differences in study settings and designs. This statistical structure is generalizable and could easily be extended to apply to different target populations or biomarkers. Compared with other published practical algorithms for Aβ probability in SCD or MCI, the predictive performance of the current algorithms was similar, while carrying the added advantage of flexibility for the required inputs and validation in an epidemiologic data set. Although AUC was just slightly lower—at best 0.71 in the validation data set using age, recall z-score, and APOE ε4, compared with 0.75 to 0.82 for other models [11,18,19]—other models were tested only in specialized clinical sites. The AUCs we observed during feature selection were in the same higher range as other models (e.g., 78% for age, APOE ε4, and cognitive test, Supplementary Fig. 2), and after we applied nested cross-validation with stratified subsampling over 1000 iterations to derive probabilities, the AUCs decreased. This observation supports the notion that the performance of the algorithms derived here is tempered to yield more stable performance in various settings. Furthermore, AUC is not necessarily the preferred performance metric when the confirmatory test (e.g., PET) is costly and has limited availability [36]. Rather, PPV and positive likelihood ratio may be most relevant because a higher PPV more directly reduces the number of Aβ tests returned as negative (reducing unnecessary cost and burden), and a higher positive likelihood ratio conveys a larger impact on the clinician's initial judgment [36,37]. While a 0.5 probability best balances sensitivity and specificity, the probability threshold best suited for a given clinical scenario depends on numerous factors that vary across clinics, such as patient volume and availability of PET scanners or specialists. With this in mind, our analysis considered alternative probability thresholds that may be relevant in different settings based on resource availability and provider preferences. These algorithms were developed to maintain flexible inputs for application in clinical practice. As such, unlike previously developed algorithms, the algorithms do not require the use of specific cognitive and genetic tests [11,18,19]. Although APOE ε4 status is a strong predictor of Aβ pathology, there may be scenarios in which genetic counseling is problematic or not easily attainable. Probability values were derived both with and without APOE ε4 information, resulting in different probability distributions across the two algorithms; APOE ε4 information is not simply an additive component. Another strength is that the algorithms do not specify which recall test must be used, as a variety of recall tests are effective at detecting MCI in clinical settings [38], with episodic memory most consistently and strongly related to cognitive decline due to AD pathology [[39], [40], [41]]. Recall tests are one of the most commonly documented cognitive assessments in current primary care [42], indicating that these algorithms can fit comfortably into current clinical practice. The algorithms are also not prescriptive for the assessment used for SCD, in line with the 2017 Gerontological Society of America and 2018 Alzheimer's Association tool kits, which have flexible guidelines for ascertaining SCD [[43], [44], [45]]. In clinical practice, these algorithms may be useful to increase the confidence of primary care providers or specialists in their clinical decision-making and furthermore improve efficiency by reducing the number of patients sent for Aβ testing. For patients with MCI, the use of these algorithms could shift the estimated probability of Aβ positivity from a prior probability of 0.45 to 0.50 [8,9,46,47] to approximately 0.65 to 0.75 (Fig. 3). For patients with SCD, the estimated probability may shift from approximately 0.20 to 0.30 [48] to approximately 0.60 (Fig. 3). Confidence intervals provide reassurance on the estimated probability. In light of limited resources and high costs of confirmatory testing, providers could consider a patient's probability of Aβ positivity and send only those patients above a given probability threshold for confirmatory testing. Patients below the threshold might be appropriate for close monitoring (i.e., “watchful waiting”) and reassessment at follow-up visits. Such targeted referrals to specialists or Aβ testing may be necessary to reduce burden and increase access to those patients who are most likely to benefit [8,9]. Although these algorithms are designed to help clinical decision-making, they are not perfect predictors of Aβ PET. That is, while decreasing the number of false positives, there will inevitably be patients with Aβ pathology who do not meet the selected probability threshold. For this reason, the algorithms best serve as an adjunct to other considerations in the decision for specialist referral, confirmation testing, or watchful waiting. Follow-up assessments to monitor cognitive decline are important for patient care. The moderately good predictive performance of these algorithms reflects the best of what is currently achievable for practical and low-cost inputs (lacking validated blood-based biomarkers and other potentially emerging technologies). Should a new therapy become approved for AD intervention, an estimated 14.9 million patients over age 55 years may screen positively for MCI in a single year in the US, with a health care system ill equipped for confirming pathology in this large population, and similar problems in other countries [8,9]. Application of either of these algorithms to this projected population could help diagnose individuals with underlying Aβ pathology while preventing an estimated 1 to 2.8 million negative Aβ confirmation tests. By applying a practical algorithm, there is potential to minimize unnecessary costs and burdens to the patient, provider, and health care system. Systematic Review: We reviewed literature on predictive models for cerebral Aβ. Numerous factors, including age, cognitive impairment, APOE genotype, CSF inflammatory, or protein biomarkers have been associated with Aβ positivity. Available predictive models are limited by lacking external validation or requiring inputs that are burdensome or not universally available. Interpretation: We developed a multistep statistical framework to obtain robust probability estimates across clinical and nonclinical settings using two different data sources and independently validating in a third, nonclinical population-based cohort. Compared with other published practical algorithms for Aβ probability, the predictive performance of the current algorithms was similar, while carrying the advantage of flexibility regarding the selection of recall test and APOE ε4 test. Future directions: While these algorithms may help identify patients for biomarker testing, a validated blood-based or other low-cost, low-burden biomarker that can replace CSF or PET testing would critically improve Alzheimer's disease detection and diagnosis.

41 in total

1. Amyloid status imputed from a multimodal classifier including structural MRI distinguishes progressors from nonprogressors in a mild Alzheimer's disease clinical trial cohort.

Authors: Duygu Tosun; Yun-Fei Chen; Peng Yu; Karen L Sundell; Joyce Suhy; Eric Siemers; Adam J Schwarz; Michael W Weiner
Journal: Alzheimers Dement Date: 2016-04-22 Impact factor: 21.566

2. The Complexity of Subjective Cognitive Decline.

Authors: Rik Ossenkoppele; William J Jagust
Journal: JAMA Neurol Date: 2017-12-01 Impact factor: 18.302

3. Practice guideline update summary: Mild cognitive impairment: Report of the Guideline Development, Dissemination, and Implementation Subcommittee of the American Academy of Neurology.

Authors: Ronald C Petersen; Oscar Lopez; Melissa J Armstrong; Thomas S D Getchius; Mary Ganguli; David Gloss; Gary S Gronseth; Daniel Marson; Tamara Pringsheim; Gregory S Day; Mark Sager; James Stevens; Alexander Rae-Grant
Journal: Neurology Date: 2017-12-27 Impact factor: 9.910

4. Cerebrospinal fluid biomarkers of neurodegeneration, synaptic integrity, and astroglial activation across the clinical Alzheimer's disease spectrum.

Authors: Isabelle Bos; Stephanie Vos; Frans Verhey; Philip Scheltens; Charlotte Teunissen; Sebastiaan Engelborghs; Kristel Sleegers; Giovanni Frisoni; Olivier Blin; Jill C Richardson; Régis Bordet; Magda Tsolaki; Julius Popp; Gwendoline Peyratout; Pablo Martinez-Lage; Mikel Tainta; Alberto Lleó; Peter Johannsen; Yvonne Freund-Levi; Lutz Frölich; Rik Vandenberghe; Sarah Westwood; Valerija Dobricic; Frederik Barkhof; Cristina Legido-Quigley; Lars Bertram; Simon Lovestone; Johannes Streffer; Ulf Andreasson; Kaj Blennow; Henrik Zetterberg; Pieter Jelle Visser
Journal: Alzheimers Dement Date: 2019-03-08 Impact factor: 21.566

5. A short form of the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE): development and cross-validation.

Authors: A F Jorm
Journal: Psychol Med Date: 1994-02 Impact factor: 7.723

6. Non-Verbal Episodic Memory Deficits in Primary Progressive Aphasias are Highly Predictive of Underlying Amyloid Pathology.

Authors: Siddharth Ramanan; Emma Flanagan; Cristian E Leyton; Victor L Villemagne; Christopher C Rowe; John R Hodges; Michael Hornberger
Journal: J Alzheimers Dis Date: 2016 Impact factor: 4.472

7. Self-rated and informant-rated everyday function in comparison to objective markers of Alzheimer's disease.

Authors: Alicia D Rueda; Karen M Lau; Naomi Saito; Danielle Harvey; Shannon L Risacher; Paul S Aisen; Ronald C Petersen; Andrew J Saykin; Sarah Tomaszewski Farias
Journal: Alzheimers Dement Date: 2014-11-15 Impact factor: 21.566

8. Episodic Memory and Learning Dysfunction Over an 18-Month Period in Preclinical and Prodromal Alzheimer's Disease.

Authors: Jenalle E Baker; Yen Ying Lim; Judith Jaeger; David Ames; Nicola T Lautenschlager; Joanne Robertson; Robert H Pietrzak; Peter J Snyder; Victor L Villemagne; Christopher C Rowe; Colin L Masters; Paul Maruff
Journal: J Alzheimers Dis Date: 2018 Impact factor: 4.472

Review 9. Practical guidelines for the recognition and diagnosis of dementia.

Authors: James E Galvin; Carl H Sadowsky
Journal: J Am Board Fam Med Date: 2012 May-Jun Impact factor: 2.657

10. Age-related cognitive decline and associations with sex, education and apolipoprotein E genotype across ethnocultural groups and geographic regions: a collaborative cohort study.

Authors: Darren M Lipnicki; John D Crawford; Rajib Dutta; Anbupalam Thalamuthu; Nicole A Kochan; Gavin Andrews; M Fernanda Lima-Costa; Erico Castro-Costa; Carol Brayne; Fiona E Matthews; Blossom C M Stephan; Richard B Lipton; Mindy J Katz; Karen Ritchie; Jacqueline Scali; Marie-Laure Ancelin; Nikolaos Scarmeas; Mary Yannakoulia; Efthimios Dardiotis; Linda C W Lam; Candy H Y Wong; Ada W T Fung; Antonio Guaita; Roberta Vaccaro; Annalisa Davin; Ki Woong Kim; Ji Won Han; Tae Hui Kim; Kaarin J Anstey; Nicolas Cherbuin; Peter Butterworth; Marcia Scazufca; Shuzo Kumagai; Sanmei Chen; Kenji Narazaki; Tze Pin Ng; Qi Gao; Simone Reppermund; Henry Brodaty; Antonio Lobo; Raúl Lopez-Anton; Javier Santabárbara; Perminder S Sachdev
Journal: PLoS Med Date: 2017-03-21 Impact factor: 11.069

6 in total

1. Predicting Amyloid Positivity in Cognitively Unimpaired Older Adults: A Machine Learning Approach Using A4 Data.

Authors: Kellen K Petersen; Richard B Lipton; Ellen Grober; Christos Davatzikos; Reisa A Sperling; Ali Ezzati
Journal: Neurology Date: 2022-04-25 Impact factor: 11.800

2. Predicting amyloid status using self-report information from an online research and recruitment registry: The Brain Health Registry.

Authors: Miriam T Ashford; John Neuhaus; Chengshi Jin; Monica R Camacho; Juliet Fockler; Diana Truran; R Scott Mackin; Gil D Rabinovici; Michael W Weiner; Rachel L Nosheny
Journal: Alzheimers Dement (Amst) Date: 2020-09-24

Review 3. Using the Alzheimer's Disease Neuroimaging Initiative to improve early detection, diagnosis, and treatment of Alzheimer's disease.

Authors: Dallas P Veitch; Michael W Weiner; Paul S Aisen; Laurel A Beckett; Charles DeCarli; Robert C Green; Danielle Harvey; Clifford R Jack; William Jagust; Susan M Landau; John C Morris; Ozioma Okonkwo; Richard J Perrin; Ronald C Petersen; Monica Rivera-Mindt; Andrew J Saykin; Leslie M Shaw; Arthur W Toga; Duygu Tosun; John Q Trojanowski
Journal: Alzheimers Dement Date: 2021-09-28 Impact factor: 16.655

4. Detecting Alzheimer's disease biomarkers with a brief tablet-based cognitive battery: sensitivity to Aβ and tau PET.

Authors: Elena Tsoy; Amelia Strom; Leonardo Iaccarino; Sabrina J Erlhoff; Collette A Goode; Anne-Marie Rodriguez; Gil D Rabinovici; Bruce L Miller; Joel H Kramer; Katherine P Rankin; Renaud La Joie; Katherine L Possin
Journal: Alzheimers Res Ther Date: 2021-02-08 Impact factor: 6.982

5. Detection of β-amyloid positivity in Alzheimer's Disease Neuroimaging Initiative participants with demographics, cognition, MRI and plasma biomarkers.

Authors: Duygu Tosun; Dallas Veitch; Paul Aisen; Clifford R Jack; William J Jagust; Ronald C Petersen; Andrew J Saykin; James Bollinger; Vitaliy Ovod; Kwasi G Mawuenyega; Randall J Bateman; Leslie M Shaw; John Q Trojanowski; Kaj Blennow; Henrik Zetterberg; Michael W Weiner
Journal: Brain Commun Date: 2021-02-02

6. Machine learning approaches to predicting amyloid status using data from an online research and recruitment registry: The Brain Health Registry.

Authors: Jack Albright; Miriam T Ashford; Chengshi Jin; John Neuhaus; Gil D Rabinovici; Diana Truran; Paul Maruff; R Scott Mackin; Rachel L Nosheny; Michael W Weiner
Journal: Alzheimers Dement (Amst) Date: 2021-06-09

6 in total