Literature DB >> 28345017

Proof of concept demonstration of optimal composite MRI endpoints for clinical trials.

Steven D Edland¹, M Colin Ard², Jaiashre Sridhar³, Derin Cobia⁴, Adam Martersteck⁵, M Marsel Mesulam⁶, Emily J Rogalski³.

Abstract

BACKGROUND: Atrophy measures derived from structural MRI are promising outcome measures for early phase clinical trials, especially for rare diseases such as primary progressive aphasia (PPA), where the small available subject pool limits our ability to perform meaningfully powered trials with traditional cognitive and functional outcome measures.
METHODS: We investigated a composite atrophy index in 26 PPA participants with longitudinal MRIs separated by two years. Rogalski et al. [Neurology 2014;83:1184-1191] previously demonstrated that atrophy of the left perisylvian temporal cortex (PSTC) is a highly sensitive measure of disease progression in this population and a promising endpoint for clinical trials. Using methods described by Ard et al. [Pharmaceutical Statistics 2015;14:418-426], we constructed a composite atrophy index composed of a weighted sum of volumetric measures of 10 regions of interest within the left perisylvian cortex using weights that maximize signal-to-noise and minimize sample size required of trials using the resulting score. Sample size required to detect a fixed percentage slowing in atrophy in a two-year clinical trial with equal allocation of subjects across arms and 90% power was calculated for the PSTC and optimal composite surrogate biomarker endpoints.
RESULTS: The optimal composite endpoint required 38% fewer subjects to detect the same percent slowing in atrophy than required by the left PSTC endpoint.
CONCLUSIONS: Optimal composites can increase the power of clinical trials and increase the probability that smaller trials are informative, an observation especially relevant for PPA, but also for related neurodegenerative disorders including Alzheimer's disease.

Entities: Disease Species

Keywords: Alzheimer’s Disease; Clinical Trial; Composite Endpoint; MRI; PPA; Power Calculations; Primary Progressive Aphasia; Region of Interest; Sample Size; Structural Magnetic Resonance Imaging

Year: 2016 PMID： 28345017 PMCID： PMC5363955 DOI： 10.1016/j.trci.2016.05.002

Source DB: PubMed Journal: Alzheimers Dement (N Y) ISSN： 2352-8737

Introduction

Clinical trials of interventions to slow the course of chronic progressive neurodegenerative diseases typically use cognitive neuropsychometric and functional (activities of daily living) outcome measures to demonstrate efficacy. Treatment efficacy is difficult to demonstrate with these endpoints, because decline is subtle during the relatively short span of observation of a clinical trial, and because there is substantial random variability in these measures from person to person and from observation to observation within a person. Volumetric magnetic resonance imaging (MRI) on the other hand has been shown to have good signal-to-noise properties in this context. For example, for amnestic dementia of the Alzheimer type (AD), a substantial literature has consistently demonstrated that volumetric MRI endpoints could reduce required sample size in AD treatment trials and secondary prevention trials of mild cognitive impairment by 75% or more compared to a standard cognitive function outcome [1]. The need for efficient endpoints is especially critical for rare subtypes of disease where the pool of subjects available for recruitment limits our ability to even perform large-scale phase 3 trials using neuropsychometric and functional outcome measures. Primary progressive aphasia (PPA) is a clinical dementia syndrome characterized by an initially isolated and progressive decline in language function and is associated with peak atrophy within the left hemisphere perisylvian language network [2], [3], [4]. Rogalski et al. [5] demonstrated that atrophy of the left perisylvian temporal cortex in particular is a highly sensitive measure of disease progression and a promising endpoint for clinical trials. Using this endpoint, clinical trials as small as 10 participants per arm would have 80% power to detect a 40% slowing of atrophy [5]. Efficiency of trials may be improved beyond these impressive levels by efficient utilization of the richness of data obtained by MRI. Xiong et al. [6] proposed that “composite” endpoints calculated as weighted averages of volumetric region of interest (ROI) substructures may outperform simple sums. These methods were operationalized by Ard et al. [7], who derived algorithms for determining optimal composite measures that maximize statistical power when used as an endpoint for clinical trials. In this brief communication, we demonstrate the potential utility of composite atrophy measures for clinical trials of neurodegenerative diseases with a prominent atrophy component.

Methods

Study subjects and imaging techniques have been previously described [5]. Briefly, study subjects included 26 individuals with a root diagnosis of PPA [2], [3], [4] (8 PPA logopenic, 10 PPA agrammatic, 8 PPA semantic). For the purposes of this article, the three clinical subtypes of PPA were combined to insure sufficient sample size to estimate parameters required for calculating weights. Hence, this analysis is best interpreted as a proof-of-concept demonstration of the potential utility of composite volumetric measures, rather than derivation of an endpoint appropriate for use in a clinical trial. Mean age at baseline was 63.7 years (SD = 6.7), 58% were women, mean Boston Naming Test score was 39.5 (SD = 20.9), and mean Western Aphasia Battery Aphasia Quotient score was 86.8 (SD = 8.0). All subjects received a baseline structural MRI and follow-up MRI approximately 2 years later (mean interval 2.0 years). Structural MRIs were processed using the cross-sectional [8] and longitudinal [9] pipelines from FreeSurfer, version 5.1.0. Ten regions of interest (ROIs) within the left perisylvian temporal cortex region (Fig. 1) taken from the automated Desikan–Killiany cortical parcellation atlas were the components of a composite outcome measure [10]. The composite was calculated as the optimally weighted sum of these ROIs using weights that maximize the signal-to-noise ratio of rate of change on the composite, as previously described [7]. Clinical trail endpoints with high signal-to-noise ratio, also called the mean to standard deviation ratio (MSDR), are more sensitive to treatment effects and optimize the power of a trial. We used relative efficiency to compare the performance of different outcome measure, where relative efficiency is defined as the ratio of sample size required for trials using the respective outcomes calculated using the standard formula for a two-sample t test:where is the within group variance of the outcome measure being compared across treatments, in this case, the change from baseline to 2-year follow-up, Δ is the treatment effect size under the alternative, and z1− and z1− are the usual quantiles of the standard normal distribution, with α equal to the type I error rate of a two-sided test, typically set to 0.05, and (1−β) equal to the power of the trial, typically set to 0.8 or 0.9.

Fig. 1

Regions of interest used to examine longitudinal cortical atrophy in PPA. Top: The perisylvian temporal cortex region of interest defined in Rogalski et al.[5]. Bottom: the regions of interest used to create the composite outcome measure. Indexing two trial outcome measures to be compared as A and B, the relative efficiency of outcome A to outcome B is defined as Let μ and μ represent the mean change under placebo for outcome measures A and B, respectively. We can express effect sizes as proportional slowing of mean rate of change. For example, to power for a treatment effect that slows atrophy by 25%, we would set Δ = 0.25 × μ and Δ = 0.25 × μ. Expressing effect sizes in this way, the proportions and the terms involving α and β drop out of the relative efficiency formula, leading to a simple function of MSDRs of the two instruments being compared: For example, a relative efficiency of 50% means a trial using instrument A would require half as many subjects as a trial using instrument B to detect the same percent slowing in rate of decline.

Results

Mean rate of decline and person-to-person variability in rate of decline are summarized in the Table 1. The MSDR for the referent total left perisylvian temporal cortical volume endpoint is 3.90. The MSDR for the component subregions of the left perisylvian temporal cortex range from 1.58 to 3.36, consistently below the MSDR of the left perisylvian temporal cortex, meaning that components individually would be less sensitive to atrophy than the full perisylvian ROI. The MSDR of the composite atrophy measure is 4.95, over 25% larger than the left perisylvian temporal cortex MSDR. In terms of relative efficiency, the optimal composite endpoint requires 38% fewer subjects than the total perisylvian temporal cortex volume measure to detect the same percent slowing in atrophy. For example, with 90% power, only 15 subjects per arm would be required to detect a 25% slowing in rate of progression in the composite outcome measure assuming the distribution of decline observed in our pilot study.

Table 1

Test characteristics and relative efficiency of various potential clinical trial outcomes measures

Measure	FreeSurfer region of interest	Mean 2 year decline	Standard deviation of 2-year decline	MSDR	Relative efficiency	N/arm to detect 25% slowing∗
Boston Naming Test (maximum score: 60)		15.7	14.2	1.10	11.08	277/arm
Western aphasia battery -revised, aphasia quotient (maximum score: 100)		22.9	18.2	1.26	8.60	215/arm
Total cortical volume (mm³)	rh.cortex + lh.cortex	29,397	10,526	2.79	1.40	35/arm
Left perisylvian temporal cortex (mm³)		7811	2004	3.90	1.00	25/arm
Components, optimal composite
Left superior temporal gyrus (mm³)	lh.superiortemporal	1041	310	3.36
Left middle temporal gyrus (mm³)	lh.middletemporal	1128	403	2.79
Left inferior temporal gyrus (mm³)	lh.inferiortemporal	955	297	3.22
Left banks, sup. temp. sulcus (mm³)	lh.bankssts	214	71	3.00
Left fusiform gyrus (mm³)	lh.fusiform	810	287	2.82
Left transverse temporal gyrus (mm³)	lh.transversetemporal	80	50	1.58
Left temporal pole (mm³)	lh.temporalpole	180	102	1.77
Left inferior frontal gyrus (mm³)	†multiple regions	768	345	2.23
Left inferior parietal gyrus (mm³)	lh.inferiorparietal	1075	417	2.58
Left Supramarginal gyrus (mm³)	lh.supramarginal	880	352	2.50
Optimal composite index		252	51	4.93	0.62	15/arm

Two-year clinical trial comparing change baseline to year 2 in treatment versus control, two-sided test, α = 0.05, power = 90%.

lh.parsopercularis + lh.parstriangularis + lh.parsorbitalis.

Test characteristics and relative efficiency of various potential clinical trial outcomes measures Two-year clinical trial comparing change baseline to year 2 in treatment versus control, two-sided test, α = 0.05, power = 90%. lh.parsopercularis + lh.parstriangularis + lh.parsorbitalis. For comparison, we also report relative efficiency and sample size projections for clinical trials using total cortical volume or specific neuropsychometric instruments as the primary outcome measure. Relative to the full perisylvian temporal cortex ROI, the total cortical volume endpoint would require 40% more subjects, and the neuropsychometric outcomes would require more than ten times more subjects per arm (Table 1).

Discussion

We have described a relatively intuitive and accessible volumetric composite index defined simply as the (optimally) weighted sum of ROI volumes. In our example, the resulting outcome measure substantially improved the efficiency of clinical trials in PPA, reducing required sample size relative to the total perisylvian cortical volume outcome by 38%. Other summaries of MRI data for this purpose have been proposed. Less intuitive and accessible perhaps are mathematically derived atrophy indexes, for example, the weighted average of vertices summarizing ventricular morphometry [11]. In the other extreme, using the single ROI most sensitive to disease, as example the perisylvian temporal cortex in PPA [5] or the hippocampus [1] or frontal lobe [11] in AD is perfectly intuitive and accessible. Determining the relative efficiency of these various approaches, as we have demonstrated here, will be a useful tool for clinicians weighting the tradeoffs of accessibility versus power when selecting endpoints for clinical trials. There are limitations to this report. The relatively small sample size in this cohort and lack of information about the underlying neuropathology required pooling of etiologically disparate disease entities for the purpose of demonstrating the optimal compositing methodology. Hence, sample size estimates from this report are only for illustrative purposes. We emphasize that meaningful estimation of optimal weighting parameters will require substantially larger, representative samples than used in this proof-of-concept demonstration. To simplify presentation, we ignored the influence of “normal” age-associated atrophy. Age-associated atrophy may not respond to treatment, and ignoring the influence of age-associated atrophy may lead to under estimation of detectible effect size and overstatement of power. This is a substantial concern for typical amnestic AD [12], [13]; but is less of an issue for PPA, where onset age is typically <65 years, and there is a rapid rate of disease-associated atrophy relative to normal aging. Thus age-associated atrophy is likely to have a negligible effect on the relative efficiency calculations that are the focus of this manuscript. Finally, an implicit assumption of the optimal compositing method is that treatment slows the rate of atrophy proportionally in all ROIs. This is a plausible assumption, but one that cannot be formally tested until an effective treatment is identified. Biomarker endpoints have clear limitations. There is no guarantee that treatments positively affecting biomarkers will have corresponding effects on cognitive and functional outcomes, and biomarkers will have been validated as surrogates for clinical endpoints before they will be approved as primary endpoints for phase 3 clinical trials [14]. However, surrogate endpoints including volumetric MRI are currently being used in phase 2 trials, to demonstrate target engagement, and to guide the choice of compounds to move forward to phase 3 [15]. To this end, volumetric endpoints are certainly suggested for diseases like PPA with a prominent atrophy component. Clinical trials of chronic progressive disease are prohibitively expensive. In AD research, this has limited our ability to test new treatments and find a cure for the disease. For less common phenotypes such as the subtypes of PPA, the need for more efficient endpoints is even more pressing because the available participant pool for clinical trials is limited. Every participant enrolled in a clinical trial is a precious resource, and methods to optimally use all information obtained from participants enrolled in clinical trials and increase the probability that effective treatments are identified should be fully investigated. Optimal weighting maximizes signal-to-noise of endpoints and statistical efficiency of trials. To our knowledge, this is the first meaningful application of optimal weighting to volumetric MRI measures. The real world context is need for more cost-effective and informative clinical trials to speed the development of treatments for neurodegenerative diseases. Primary progressive aphasia, a relatively rare (limited subject pool) disease with prominent atrophy component, is perhaps the perfect laboratory for investigating the performance of alternative surrogate volumetric MRI biomarker endpoints for clinical trials. Systematic review: Rogalski et al. [5] previously demonstrated that atrophy of the left perisylvian temporal cortex is a highly sensitive measure of disease progression in primary progressive aphasia and a promising endpoint for clinical trials. Using methods described by Ard et al. [1], we constructed a composite atrophy index composed of an optimally weighted linear combination of focal volumetric measures from 10 ROIs within the left perisylvian temporal cortex. Optimal weighting maximizes signal-to-noise and statistical efficiency of clinical trials, and in this application e.g., reduced sample size requirements by 38%. Interpretation: Optimal composite outcome measures show promise as a way improved efficiency of trials. More cost-effective and informative clinical trials would speed the development of treatments for neurodegenerative diseases. Future directions: This proof of concept analysis demonstrated the potential utility of composite volumetric measures to improve the efficiency of trials. We will need larger datasets representative of future clinical trials to more definitively establish the utility of these methods.

14 in total

Review 1. Primary progressive aphasia--a language-based dementia.

Authors: M-Marsel Mesulam
Journal: N Engl J Med Date: 2003-10-16 Impact factor: 91.245

2. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.

Authors: Rahul S Desikan; Florent Ségonne; Bruce Fischl; Brian T Quinn; Bradford C Dickerson; Deborah Blacker; Randy L Buckner; Anders M Dale; R Paul Maguire; Bradley T Hyman; Marilyn S Albert; Ronald J Killiany
Journal: Neuroimage Date: 2006-03-10 Impact factor: 6.556

3. Primary progressive aphasia and kindred disorders.

Authors: Marsel Mesulam; Sandra Weintraub
Journal: Handb Clin Neurol Date: 2008

4. Classification of primary progressive aphasia and its variants.

Authors: M L Gorno-Tempini; A E Hillis; S Weintraub; A Kertesz; M Mendez; S F Cappa; J M Ogar; J D Rohrer; S Black; B F Boeve; F Manes; N F Dronkers; R Vandenberghe; K Rascovsky; K Patterson; B L Miller; D S Knopman; J R Hodges; M M Mesulam; M Grossman
Journal: Neurology Date: 2011-02-16 Impact factor: 9.910

5. Asymmetry of cortical decline in subtypes of primary progressive aphasia.

Authors: Emily Rogalski; Derin Cobia; Adam Martersteck; Alfred Rademaker; Christina Wieneke; Sandra Weintraub; M-Marsel Mesulam
Journal: Neurology Date: 2014-08-27 Impact factor: 9.910

6. Neuroimaging enrichment strategy for secondary prevention trials in Alzheimer disease.

Authors: Linda K McEvoy; Steven D Edland; Dominic Holland; Donald J Hagler; J Cooper Roddey; Christine Fennema-Notestine; David P Salmon; Alain K Koyama; Paul S Aisen; James B Brewer; Anders M Dale
Journal: Alzheimer Dis Assoc Disord Date: 2010 Jul-Sep Impact factor: 2.703

Review 7. Power calculations for clinical trials in Alzheimer's disease.

Authors: M Colin Ard; Steven D Edland
Journal: J Alzheimers Dis Date: 2011 Impact factor: 4.472

8. Combining Multiple Markers to Improve the Longitudinal Rate of Progression-Application to Clinical Trials on the Early Stage of Alzheimer's Disease.

Authors: Chengjie Xiong; Gerald van Belle; Kewei Chen; Lili Tian; Jingqin Luo; Feng Gao; Yan Yan; Ling Chen; John C Morris; Paul Crane
Journal: Stat Biopharm Res Date: 2013-01-01 Impact factor: 1.452

Review 9. Dominantly Inherited Alzheimer Network: facilitating research and clinical trials.

Authors: Krista L Moulder; B Joy Snider; Susan L Mills; Virginia D Buckles; Anna M Santacruz; Randall J Bateman; John C Morris
Journal: Alzheimers Res Ther Date: 2013-10-17 Impact factor: 6.982

10. Optimal composite scores for longitudinal clinical trials under the linear mixed effects model.

Authors: M Colin Ard; Nandini Raghavan; Steven D Edland
Journal: Pharm Stat Date: 2015-07-30 Impact factor: 1.894

7 in total

1. Clinical and cortical decline in the aphasic variant of Alzheimer's disease.

Authors: Emily Joy Rogalski; Jaiashre Sridhar; Adam Martersteck; Benjamin Rader; Derin Cobia; Anupa K Arora; Angela J Fought; Eileen H Bigio; Sandra Weintraub; Marek-Marsel Mesulam; Alfred Rademaker
Journal: Alzheimers Dement Date: 2019-02-11 Impact factor: 21.566

2. Prospective longitudinal atrophy in Alzheimer's disease correlates with the intensity and topography of baseline tau-PET.

Authors: Renaud La Joie; Adrienne V Visani; Suzanne L Baker; Jesse A Brown; Viktoriya Bourakova; Jungho Cha; Kiran Chaudhary; Lauren Edwards; Leonardo Iaccarino; Mustafa Janabi; Orit H Lesman-Segev; Zachary A Miller; David C Perry; James P O'Neil; Julie Pham; Julio C Rojas; Howard J Rosen; William W Seeley; Richard M Tsai; Bruce L Miller; William J Jagust; Gil D Rabinovici
Journal: Sci Transl Med Date: 2020-01-01 Impact factor: 17.956