Literature DB >> 29804057

Evaluation of person-level heterogeneity of treatment effects in published multiperson N-of-1 studies: systematic review and reanalysis.

Gowri Raman1, Ethan M Balk2, Lana Lai3, Jennifer Shi4, Jeffrey Chan5, Jennifer S Lutz3, Robert W Dubois6, Richard L Kravitz7, David M Kent3.   

Abstract

OBJECTIVE: Individual patients with the same condition may respond differently to similar treatments. Our aim is to summarise the reporting of person-level heterogeneity of treatment effects (HTE) in multiperson N-of-1 studies and to examine the evidence for person-level HTE through reanalysis. STUDY
DESIGN: Systematic review and reanalysis of multiperson N-of-1 studies. DATA SOURCES: Medline, Cochrane Controlled Trials, EMBASE, Web of Science and review of references through August 2017 for N-of-1 studies published in English. STUDY SELECTION: N-of-1 studies of pharmacological interventions with at least two subjects. DATA SYNTHESIS: Citation screening and data extractions were performed in duplicate. We performed statistical reanalysis testing for person-level HTE on all studies presenting person-level data.
RESULTS: We identified 62 multiperson N-of-1 studies with at least two subjects. Statistical tests examining HTE were described in only 13 (21%), of which only two (3%) tested person-level HTE. Only 25 studies (40%) provided person-level data sufficient to reanalyse person-level HTE. Reanalysis using a fixed effect linear model identified statistically significant person-level HTE in 8 of the 13 studies (62%) reporting person-level treatment effects and in 8 of the 14 studies (57%) reporting person-level outcomes.
CONCLUSIONS: Our analysis suggests that person-level HTE is common and often substantial. Reviewed studies had incomplete information on person-level treatment effects and their variation. Improved assessment and reporting of person-level treatment effects in multiperson N-of-1 studies are needed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

Entities:  

Keywords:  N-of-1 studies; heterogeneity of treatment effect; perseonalized medicine; systematic review

Mesh:

Year:  2018        PMID: 29804057      PMCID: PMC5988083          DOI: 10.1136/bmjopen-2017-017641

Source DB:  PubMed          Journal:  BMJ Open        ISSN: 2044-6055            Impact factor:   2.692


Our analysis suggests that person-level heterogeneity of treatment effects (HTE) is common and often substantial. Our analysis was limited by the paucity of N-of-1 studies in the literature and by the low statistical power in the available studies. Multiperson N-of-1 studies are the best design to estimate individual patient treatment effects and compare the variation in effects between individuals to variation within individuals across different periods.

Introduction

Clinicians commonly observe that individual patients given the same treatment for the same condition appear to respond differently from one another. This observation, combined with our understanding of the complex mechanisms of diseases and therapies and the potential importance of myriad patient-specific factors (eg, age, sex, illness severity, comorbidities, co-treatments and molecular differences influencing pharmacokinetics and dynamics), has led to a widely held assumption that the observed variation in treatment response seen between individuals is not merely random, but stable and potentially predictable. This assumption underpins the field of personalised medicine, which aims to determine the best treatment for an individual patient, as opposed to treating all patients with the intervention found to be most effective for the ‘average’ patient. Nevertheless, statistical analyses aimed at discovering heterogeneity of treatment effects (HTE) among groups of individuals (eg, subgroup analyses of parallel arm randomised trials) typically fail to find compelling and reliable evidence for the presence of such heterogeneity. For example, statistically significant differences in treatment effects between men and women are often reported, but a systematic review indicates that the frequency of these interactions across studies suggests that the vast majority occur by chance.1 Similarly, the field of pharmacogenetics, also built on the assumption of stable variation in treatment responses, has largely failed to live up to its promise to broadly improve the targeting of drugs—particularly outside the special case of oncology (where studies generally depend on the subclassification of tumour tissue not on variation in germ line polymorphisms).2 3 This failure to find reproducible HTE has supported the contrarian notion that true individual effects may be a ‘myth’, an overinterpretation of random noise.4 To distinguish between these two possibilities, Kalow et al5 have suggested that carefully designed series of N-of-1 studies could be performed for those chronic conditions amenable to this design (ie, where the disease process is relatively stable over time, treatment effects are transient and outcomes vary and are observable over time). By estimating individual patient treatment effects and comparing the variation in effects between individuals to variation within individuals across different periods, it is possible to determine the non-random component of heterogeneity in individual treatment effects—even if one is unable to identify the variables that predict this variation (ie, even in the absence of group-level HTE, such as men vs women or old vs young). A recent review summarised N-of-1 studies reported in the literature—including multiperson N-of-1 studies—but did not examine whether and how these studies provide information on person-level HTE. Therefore, our objectives are (1) to summarise the conduct and reporting of assessments of variation in person-level treatment effects from N-of-1 studies and (2) to extract, reanalyse and report the results from the subset of studies that provided adequate data in their published reports to examine the extent of the evidence for person-level HTE (ie, participant-level outcomes or effects).6

Methods

This review was conducted in accordance with the highest standards for conducting systematic reviews.7 8 We defined N-of-1 studies as crossover trials in which each patient receives two or more treatments in a predefined, often randomised, sequence.

Data sources and searches

We used two separate searches because N-of-1 studies can be indexed differently: (1) a search in Medline, Cochrane Central and EMBASE using terms related to repeated crossover studies (for publications indexed from inception to 17 August 2017) and (2) a Medline, Cochrane Central, EMBASE and Web of Science search using terms that are related to N-of-1 (for publications indexed from 2011 to 17 August 2017). For N-of-1 studies indexed before 2011, we used studies included in a prior published systematic review by Gabler et al.6 Our searches combined terms and Medical Subject Headings for N-of-1, single-subject, single-patient, randomised trials, crossover, multiperiod crossover and rotated or repeated period crossover (see online Supplementary appendix tables 1 and 2 for detailed search terms). The searches were not restricted by disease, condition, organ system or treatment.

Study selection

We selected eligible multiperson N-of-1 studies to describe the frequency of reporting of individual outcomes and effects and of documented HTE in these studies. We required a minimum of two individual subjects per study for evaluation of HTE. We excluded studies that included non-pharmacological interventions, reviews, abstracts and protocols. We included studies with placebo or ‘no treatment’ interventions. Citations were double screened by reviewers using an open-source, online software Abstrackr (http://abstrackr.cebm.brown.edu/). Full-text articles of potentially relevant studies were again double screened for eligibility. Person-level outcomes were defined as outcomes for each person at each point in time when they were measured, reported in tables, text or graphs. Person-level treatment effect was defined as contrasts of outcomes in individuals on one treatment versus the comparator. Person-level HTE was defined as quantified variation in the person-level treatment effects, whereas HTE more broadly includes any type of subgroup analysis (eg, males vs females; older vs younger) as outlined in figure 1.
Figure 1

Schematic description of person-level outcomes (outcomes for each patient during each treatment period); person-level effects (contrasts of the outcomes for each patient in one treatment condition vs another) and person-heterogeneity of treatment effects (between patient contrasts of effects).

Schematic description of person-level outcomes (outcomes for each patient during each treatment period); person-level effects (contrasts of the outcomes for each patient in one treatment condition vs another) and person-heterogeneity of treatment effects (between patient contrasts of effects).

Data extraction and quality assessment

One of the four reviewers extracted data from each publication; a second reviewer verified all numerical information and basic descriptors of the study design and analysis. Operational definitions for extraction items were discussed in weekly project meetings and discrepancies between extractors were resolved by consensus with senior authors (DK, GR, EB). From each study, we extracted bibliographic information, details related to study design (number of patients enrolled, selection criteria, interventions evaluated, randomisation methods, outcomes assessed, follow-up duration), information on patient characteristics and person-level measurements of outcomes or estimates of person-level treatment effects (with corresponding measures of their uncertainty). When necessary, we extracted data by digitising the graphs and the values were estimated using Engauge Digitizer V.2.14 (http://digitizer.sourceforge.net/). We assessed the methodological quality of each study based on predefined criteria, in accordance with the Agency for Healthcare Research and Quality suggested methods and the Cochrane risk of bias for clinical trials.9 10 We generated graphs showing the trajectory of response for each patient in each study and compared them against the published information. We also generated scatterplots of measurements over time for studies that did not present their data in graphical format to help us identify aberrant data points (eg, errors in data extraction). We verified potentially aberrant data points by re-examining the published data and made corrections, when needed.

Data synthesis and analyses

We examined the degree to which studies reported person-level data. This was described using the following items for each reported outcome: (1) qualitative descriptions of HTE (eg, ‘there were eight responders and four non-responders’); (2) details of person-level outcomes (ie, outcomes with each treatment within each period); (3) details of person-level treatment effect (ie, a point estimate of contrasts of outcomes in individuals on one treatment vs the comparator); (4) reporting of person-level statistical effect estimate (eg, SD, exact p values or CIs for treatment effects within individuals); (5) description of statistical tests examining HTE (ie, tests evaluating the contrast of treatment effects between individuals or groups in the study) and (6) claims of HTE. Note that qualitative descriptions of HTE for item 1 would include any description that implied that treatment effects varied, whereas item six required a more definite study conclusion (eg, ‘our results demonstrate significant variation across individuals in response to treatment X’), whether or not these conclusions were based on robust statistical tests.

Statistical HTE analysis of extracted study results

We performed statistical analysis testing for person-level HTE on all studies presenting person-level data. We used a consistent analytic strategy across studies, to the extent permitted by the reporting in published papers. Our strategy was different for studies that reported person-level outcome measurements and those that reported estimates of person-level treatment effects with their sampling variances (or adequate information to approximately calculate these statistics). For studies that only reported (or allowed the calculation of) estimates of person-level treatment effects, we obtained an average effect using a fixed effect inverse variance model and estimated the variance of the person-level treatment effects using DerSimonian and Laird method of moments estimator.11 12 In addition to a fixed effect model, we also obtained an average effect using a random-effects model. Finally, we tested the hypothesis that all person-level treatment effects were equal using Cochran’s χ2 test and quantified the proportion of observed variation due to ‘true’ person-level effect heterogeneity with the I2 statistic.13 For studies that reported person-level outcomes, we developed a linear model (for continuous outcomes) or generalised linear model (for binary or count outcomes) using the outcome of interest as the response, the intervention(s) as a covariate and indicator variables for different study participants.14 This model estimates a common treatment effect across participants. We also derived a similar model with treatment-by-participant interactions. This model allows each patient to have a different effect. The statistical significance of person-level HTE was assessed by a likelihood ratio test comparing the two models. In addition to a fixed effect model, we also fit a hierarchical linear or generalised linear mixed model with a random intercept and a random slope (for the treatment effect) to estimate the average treatment effect across all patients (assuming person-level HTE). We tested the hypothesis that all person-level treatment effects were equal and quantified the proportion of observed variation due to ‘true’ person-level effect heterogeneity with the I2 statistic.13 For modelling within-patient variance, we used a common variance with an uncorrelated covariance structure, as was used in a prior N-of-1 study.14 Person-level treatment effect was assumed to be equal across time periods. For the treatment effect, we used more than one random slope when more than two treatments were compared.

Patient and public involvement

Patients and the public were not involved in the design or analysis of this study.

Results

The searches for repeated crossover studies identified 11 891 citations and those for N-of-1 studies identified 3819 citations (indexed from 2011 onwards). Of these, we retrieved 407 full-text articles for review plus 100 N-of-1 trial articles (indexed before 2011) from an existing systematic review.5 On full-text screening, 62 studies (58 multiperson N-of-1 studies and four repeated period crossover studies) met eligibility criteria (online supplementary appendix tables 3) and are reported multiperson N-of-1 studies throughout the article. An outline of the search and study selection flow is provided in figure 2.
Figure 2

Study flow diagram represents the flow of eligible studies included in this review.

Study flow diagram represents the flow of eligible studies included in this review.

Description of studies

Table 1 summarises the 62 multiperson N-of-1 studies that were published between 1986 and 2017 reporting a total of 1974 patients. The most common clinical domains in the multiperson N-of-1 studies were neurology (16%), arthritis/rheumatology (10%) and psychiatry (9%). Most studies were described as ‘double blind’ but details about the methods for blinding were often unclear; similarly studies often provided unclear information about the generation of the randomisation sequence and allocation concealment (online supplementary appendix tables 4). Among the studies, 93% compared a pair of treatment strategies, 5% compared three strategies and 2% compared four strategies. Studies had between three and 16 treatment periods and obtained an average of 1–42 outcome measurements per period. Across reported outcomes, 89% of the assessed outcomes were patient reported and 11% were investigator assessed.
Table 1

Evidence map of multiperson N-of-1 and repeated period crossover studies

DescriptionMulti-person N-of-1 studies (n=62)
Publication years1979–2017
SubjectsTotal N (median, IQR)
  Enrolled2153 (16, 9–42)
  Completed1705 (12, 7–32)
Intervention and comparisons
  Head-to-head active drugs10
  Placebo47
  Active drug and placebo1
Population
  Paediatric12
  Adults50
Major systems studied
  Arthritis/rheumatology10
  Cardiovascular3
  Gastrointestinal7
  Hypertension1
  Psychiatry9
  Neurology16
  Respiratory9
  Miscellaneous* 7
Top 5 disease conditions
  ADHD6
  Angina3
  Chronic pain5
  GORD5
  Obstructive airway6
  Osteoarthritis6

*Sleep disorders, allergy, cancer, muscular, vascular (for multiperson N-of-1); pain, urology, GYN, Heme/Onc, allergy, dermatology, drug abuse, endocrine, lipids, nephrology, ophthalmology, respiratory (for repeated crossover studies).

ADHD, attention-deficit hyperactivity disorder; GORD, gastro-oesophageal regurgitation disorder; n, number of participants.

Evidence map of multiperson N-of-1 and repeated period crossover studies *Sleep disorders, allergy, cancer, muscular, vascular (for multiperson N-of-1); pain, urology, GYN, Heme/Onc, allergy, dermatology, drug abuse, endocrine, lipids, nephrology, ophthalmology, respiratory (for repeated crossover studies). ADHD, attention-deficit hyperactivity disorder; GORD, gastro-oesophageal regurgitation disorder; n, number of participants.

Reporting person-level outcomes, effects and HTE

While most studies (92%) had some qualitative acknowledgement that the treatment effects appeared to vary across individuals, formal reporting at the participant level was variable (table 2). Person-level outcomes under each treatment were reported in 52% of multiperson N-of-1 studies. Person-level treatment effects with quantitative data (comparing outcomes on each treatment) for each individual who completed the trial was available in 32%; and details on the statistical evaluation of these effects (as SD or exact pvalues or confidence intervals) were available in 13 (21%) multiperson N-of-1 studies. Only five (8%) studies described statistical tests examining any HTE. However, only two studies (3%) reported person-level HTE, whereas the others examined group-level HTE using conventional subgroup analysis based on observable characteristics.
Table 2

Survey of HTE assessment in multiperson N-of-1 studies

HTE reportingMultiperson N-of-1 studies (n=62)
Qualitative description92%
Person-level outcomes52%
Person-level treatment effects32%
Statistical analysis of person-level effects (eg, p values)21%
Any statistical test for HTE8%*
Claims of heterogeneity15%

*Only two studies reported person-level HTE, the remaining three studies reported group level effect.

HTE, heterogeneity of treatment effects.

Survey of HTE assessment in multiperson N-of-1 studies *Only two studies reported person-level HTE, the remaining three studies reported group level effect. HTE, heterogeneity of treatment effects.

Reanalysis of person-level data

Of the 62 studies, there were 36 studies that provided person-level data, either as outcomes in each treatment period or as person-level treatment effects (table 3). Of these, only 25 studies provided person-level data sufficient to support re-analysis: 14 studies provided person-level outcomes; 13 studies provided person-level treatment effects (two studies provided both). The remaining 11 studies reported either medians or means without data on variance or did not provide sufficient information on completers, so they could not be reanalysed for treatment effect or HTE.
Table 3

Characteristics of studies reporting person-level data

Author, YearDiseaseNumber enrolled (analysed)InterventionComparatorCross-over periodsTotal intervention durationOutcome measures per period
Studies with reanalysable person-level outcomes
Camfield, 1996Mental retardation with fragmented sleep6 (6)MelatoninPlacebo710 weeks14
Hinderer, 1990Traumatic spinal cord injury5 (5)BaclofenPlacebo39 weeks2
Langer, 1993Gastro-oesophageal reflux2 (2)CisapridePlacebo36 weeks5
Lashner, 1990Ulcerative colitis7 (6)NicotinePlacebo48 weeks1
Maier, 1994Chronic depression10 (9)SulpiridePlacebo428 weeks42
Mandelcorn, 2004Brain injury4 (4)OndansetronPlacebo45 weeks1
McQuay, 1994Neuropathic pain19 (19)DextromethorphanPlacebo520 days1
Miyazaki, 1995Unstable angina22 (22)Isosorbide dinitrateIsosorbide dinitrate: intermittent injection39 days6
Nathan, 2006Paediatric brain tumour12 (7)Ondansetron and metopimazineOndansetron and placeboUnclear189 daysUnclear
Parodi, 1979Unstable angina12 (12)VerapamilPlacebo410 daysUnclear
Parodi, 1986Unstable angina10 (10)VerapamilPropranolol, placebo818 daysUnclear
Tison, 2012Levodopa-induced dyskinesia in patients with Parkinson’s disease10 (10)SimvastatinPlacebo696 days1
Studies with re-analyzable person-level treatment effects
Emmanuel, 2012Chronic intestinal pseudo-obstruction7 (4)PrucalopridePlacebo1648 weeks21
Haas, 2004Chronic tension type and migraine headache39 (16)DextroamphetamineEqui-stimulatory caffeine820 days20
Jaeschke, 1991Fibromyalgia22 (23)AmitriptylinePlacebo612 weeks2
Johannessen, 1992Dyspepsia68 (46)CimetidinePlacebo12184 days15
Lipka, 2017Autoimmune myasthenia gravis4 (4)EphedrinePlacebo46 weeks1
Mahon, 1996Irreversible chronic airflow limitation16 (14)TheophyllinePlacebo873 days1
March, 1994Osteoarthritis25 (15)DiclofenacParacetamol612 weeks14
Patel, 1991Non-reversible chronic airflow limitation26 (18)Ipratropium bromide/theophylline /salbutamol /beclomethasonePlacebo66 weeksUnclear
Wallace, 1994Attention deficit hyperactivity disorder11 (7)MethylphenidatePlacebo1414 days1
Woodfield, 2005Skeletal muscle cramps13QuininePlacebo614 weeks2
Zucker, 2006Fibromyalgia58Amitriptyline and placeboAmitriptyline and fluoxetine combination636 weeks1
Study with both person-level data
Pereira, 1995Atrial fibrillation/deep venous thrombosis7Generic warfarinCoumadin1030 weeks2
Joy, 2014Statin-related myalgia8 (7)StatinPlacebo633 weeks3
Study with insufficiently reported person-level data
Person-level outcome data
Denburg, 1994Systemic lupus erythematosus10PrednisonePlacebo630 weeks1
Mitchel, 2015Fatigue in advanced cancer43 (33)MethylphenidatePlacebo618 days6
Nikles, 2000Osteoarthritis14IbuprofenParacetamol; placebo612 weeks14
Nikles, 2015Dry mouth in advanced cancer17 (4)PilocarpinePlacebo618 days6
Nikles, 2017Acquired brain injury53 (38)Nervous system stimulantsPlacebo618 days6
Reitberg, 2002Allergic rhinitis36Loratadine and chlorpheniramine maleateloratadine with placebo832 days4
Sheather-Reid, 1998Chronic pain8Ibuprofen/codeinePlacebo612 weeks14
Person-level treatment effects
Huber, 2007Juvenile idiopathic arthritis6AmitriptylinePlacebo617 weeks12
Privitera, 1994Partial seizure16DezinamidePlacebo635 weeks6
Wegman, 2003Osteoarthritis13ParacetamolNon-steroidal anti-inflammatory drugs1020 weeks14
Wegman, 2005Regular temazepam users15TemazepamPlacebo1010 weeks7
Characteristics of studies reporting person-level data Of 13 studies (with 27 unique comparisons) that reported analysable person-level treatment effect data (table 3), 10 studies had a placebo comparator and three studies had an active comparator. The sample size ranged from 7 to 68; average crossover periods ranged from 6 to 16 days and average outcome measures per period ranged from 1 to 21. The average treatment duration ranged from 14 to 336 days. There were 14 studies (with 27 unique comparisons) that reported analysable person-level outcome data (table 3), including two studies also reporting person-level treatment effects. Of these, 11 compared the intervention with placebo and three studies compared two active interventions. The sample size ranged from 2 to 22; the average number of crossover periods ranged from 3 to 10 and the average number of outcome measures per period ranged from 1 to 42. The average treatment duration ranged from 9 to 210 days.

Reanalysis of studies reporting estimates of person-level treatment effects

Thirteen studies (including 27 comparisons, due to multiple outcomes in some studies) reported estimates of person-level treatment effects sufficient to analyse (online supplementary appendix figures 1–16 display graphs of the person-level treatment effect data). Average fixed effect estimates for each analysis are shown in table 4; random-effects estimates were generally similar (online supplementary appendix tables 5). In 8 of the 13 studies (62%) and 15 of the 27 total unique comparisons (56%), we found evidence of statistically significant HTE for at least one outcome (table 4). Generally, the magnitude in the variation of individual patient effects (as seen in the range) was very large compared with the average effects. Most studies (64%) showed person-level effects that differed qualitatively from one another. Most of the variation in the observed individual effects was attributable to ‘true’ (non-random) heterogeneity of person-level effects; 11 of 27 analyses had I2 >80%.
Table 4

Analysis results of studies reporting person-level treatment effects

Author, yearOutcomeRange of the scales (severity)Main effectPerson-level heterogeneity of treatment effect (HTE)
Treatment effect (CI)P for HTE*Treatment effect rangeI2 % (CI)
Emmanuel, 2012Bloating0–4 (0=absent to 4=worst)−0.344 (−0.619 to −0.069)<0.001−1.1 to −0.194 (88 to 97)
Pain0–4 (0=absent to 4=worst)−0.440 (−0.771 to −0.110)<0.001−0.2 to −1.496 (92 to 98)
Haas, 2004Chronic tension-type headache grade0–3 (0=none to 3=severe)0.772 (0.454 to 1.090)<0.0010.04 to 1.984 (76 to 90)
Chronic migraine headache grade0–3 (0=none to 3=severe)0.542 (0.354 to 0.731)0.0670.2 to 0.8337 (0 to 65)
Jaeschke, 1991Seven-point symptom scale1–7 (higher scores represent better function)0.427 (0.210 to 0.645)<0.001−1.02 to 3.1885 (79 to 89)
Tender point changes countNumber of tender points1.320 (0.404 to 2.236)<0.001−4.33 to 9.072 (57 to 82)
Johannessen, 1992Six-point symptom scale0–6 (0=NR to 6=NR)0.698 (0.466 to 0.931)<0.001−1.67 to 3.1766 (53 to 75)
Joy, 2014VAS myalgia score0–100 mm (0=none to 100=worst)0.119 (−2.283 to 2.521)0.996−8.10 to 9.450 (0 to 68)
Symptom-specific VAS0–100 mm (0=none to 100=worst)1.937 (0.179 to 3.696)0.797−8.0 to 18.050 (0 to 68)
Pain severity score0–10 (0=none to 10=worst)0.086 (−0.215 to 0.387)0.9860.0 to 1.00 (0 to 68)
Pain interference score0–10 (0=none to 10=worst)−0.016 (−0.095 to 0.064)0.917−0.02 to 0.750 (0 to 68)
Lipka, 2017Quantitative myasthenia gravis score0–3 (0=none to 3=severe)1.006 (0.215 to 1.797)0.8030.67 to 1.670 (0 to 85)
Myasthenia gravis (MG) composite0–502.891 (0.348 to 5.433)0.177−1.05 to 5.1239 (0 to 80)
MG-activities of daily living0–241.099 (−0.277 to 2.474)0.0470.03 to 3.062 (0 to 87)
VAS score0–10 (0=none to 100=worst)1.275 (−0.115 to 2.665)0.190−0.01 to 3.0237 (0 to 78)
Mahon, 1996Dyspnoea in Likert Scale1–7 (1=extremely short of breath to 7=no shortness)0.125 (−0.181 to 0.430)<0.001−0.57 to 0.8978 (58 to 88)
March, 1994Mean pain score on VAS5 point Likert scale (0–100 mm)−7.093 (−11.939 to −2.248)<0.001−33.8 to 4.198 (97 to 98)
Mean stiffness score on VAS5 point Likert scale (0–100 mm)−5.992 (−11.280 to −0.704)<0.001−36 to 10.797 (96 to 98)
Patel, 1991†Four-item symptom questionnaire (all compared with placebo)1–7 (1=extremely short of breath to 7=no shortness of breath)0.340 (0.253 to 0.422)<0.001−0.34 to 3.191 (87 to 94)
Four-item symptom questionnaire (use of ipratropium bromide)0.675 (0.264 to 1.085)<0.001−0.22 to 3.187 (78 to 92)
Four-item symptom questionnaire (use of salbutamol)0.865 (0.042 to 1.687)<0.0010.46 to 1.394 (NA)
Four-item symptom questionnaire (use of theophylline)0.025 (−0.434 to 0.484)0.172−0.34 to 0.1830 (0 to 93)
Pereira, 1995INR (diff)Target INR range of 2.0–3.00.027 (−0.155 to 0.209)0.477−0.28 to 0.370 (0 to 75)
Wallace, 1994Conners 15-item rating scale scores0–3 (NR)0.759 (0.341 to 1.178)0.7470.42 to 1.220 (0 to 79)
Woodfield, 2005Changes in the number of crampsNumber—mean difference−18.823 (−28.527 to −9.120)<0.001−77 to −292 (87 to 95)
Total days with crampsdays−6.181 (−9.798 to −2.563)<0.001−13 to −194 (90 to 96)
Zucker, 2006FIQ0–100 (0=best to 100=worst)−5.019 (−8.784 to −1.254)0.999−32.0 to 0.980 (0 to 37)

*The significance of person-level HTE was assessed by Cochran’s Χ2-based test.

†One subject had beclomethasone.

FIQ, Fibromyalgia Impact Questionnaire; INR, international normalised ratio; NA, not applicable; NR, not reported; VAS, Visual Analogue Scale.

Analysis results of studies reporting person-level treatment effects *The significance of person-level HTE was assessed by Cochran’s Χ2-based test. †One subject had beclomethasone. FIQ, Fibromyalgia Impact Questionnaire; INR, international normalised ratio; NA, not applicable; NR, not reported; VAS, Visual Analogue Scale.

Reanalysis of studies reporting person-level outcome measurements

Because some of the 14 studies providing analysable outcome data had multiple outcomes (or multiple outcomes scales), there were a total of 27 comparisons with analysable data. (The online supplementary appendix figures 17–42 displays graphs of the person level outcome results.) Average fixed effect estimates for each analysis are shown in table 5; random effects estimates were generally similar (online supplementary appendix tables 6). In eight of the 14 studies (57%) (17 of the 27 unique comparisons (63%)), there was statistically significant person-level HTE for at least one outcome. Again, the variation in individual effects was often large compared with the average effect. However, given the lower number of participants per study and periods per participant and also different analytic approach, estimates of I2 2 were much less precise in these studies.
Table 5

Studies reporting person-level outcomes

Author, yearOutcomeDefinition/range of the scales (severity)Main effectPerson-level heterogeneity of treatment effect (HTE)
Fixed treatment effectP for person treatment interaction* Treatment effect range (lower range (CI)–upper range (CI))I2 % (CI)
Camfield, 1996Nights without awakeningBetween 10:00 PM and 7:00 AM per day0.865 (0.215 to 1.516)0.4560.12 to 2.00 (0 to 79)
Hinderer, 1990AnxietyBeck Inventory-A anxiety scale 0–3 (0=never, 3=almost all the time)0.000 (0.000 to 0.000)<0.001−6.38 to 0.00091 (81 to 95)
Joy, 2014Myalgia scoreVisual Analogue Score for myalgia (0=none to 100=worst)3.3812 (−2.668 to 9.430)0.565−11.66 to 60.790 (0 to 68)
Langer, 1993VomitingNumber of episodes−1.204 (−2.494 to 0.086)0.136−1.34 to 0.1787 (NA)*
Lashner, 1990Symptom score: abdominal painSymptom scores 0–100 (0=best, 100=worst)−3.615 (−16.982 to 9.751)0.007−35.0 to 15.037 (0 to 73)
Symptom score: bowel movements/day−0.538 (−1.215 to 0.138)0.001−3.0 to 1.056.6 (0 to 81)
Symptom score: consistency of bowel movements7.000 (−7.551 to 21.551)0.013−25.5 to 33.028 (0 to 69)
Symptom score: haematochezia2.308 (−17.210 to 21.826)0.003−38.0 to 47.547 (0 to 78)
Symptom score: general sense of well-being−6.538 (−25.352 to 12.275)0.008−43.0 to 35.035 (0 to 73)
Maier, 1994SCL-90 subscales: depressed moodSelf-rating inventory to measure the effects of drug−3.536 (−6.718 to −0.354)<0.001−17.8 to 2.7458 (12 to 80)
SCL-90 subscales: anxiety−3.753 (−6.582 to −0.924)<0.001−17.4 to 2.566 (30 to 83)
SCL-90 subscales: somatisation−1.419 (−4.316 to 1.478)0.869−6.0 to 2.70 (0 to 65)
Mandelcorn, 2004Self-assessment score0–5 (0=worst, 5=best)−2.052 (−8.865 to 4.761)0.05−7.7 to 4.90 (0 to 85)
Lower extremity ataxiaFugl-Meyer: three point (0 cannot be performed to 2 can be fully performed)12.494 (−3.155 to 28.142)0.025−6.42 to 36.7635 (0 to 77)
Truncal ataxiaAMTI force plate: NR Berg Balance Scale 0–56, with a higher score indicating a better performance1.196 (−2.866 to 5.257)0.690−0.52 to 2.200 (0 to 85)
Upper extremity ataxiaPurdue Pegboard Test: pegs inserted into the board with each hand in 30 s Minnesota Placing Test: reach out, grasp, and place blocks in a specific order−0.498 (−3.546 to 2.550)0.382−3.68 to 1.420 (0 to 85)
McQuay, 1994VAS pain Intensity0–100 (0=no pain, 100=worst possible pain)−1.094 (−5.572 to 3.383)0.004−8.0 to 10.10 (0 to 49)
VAS relief Intensity0–100 (0=no relief, 100=complete pain relief)−3.913 (−11.729 to 3.903)0.038−28.4 to 5.150 (0 to 49)
Miyazaki, 1995Incidence of anginaEither ST segment elevation or depression at rest0.496 (−0.206 to 1.199)0.125−16.19 to 17.110 (0 to 60)
Nathan, 2006Emetic episodes per dayComplete response (0 episodes/day), major response (1–2 episodes/day) or failure (>2 episodes/day)−0.095 (−0.514 to 0.325)0.001−16.5 to 2.0859 (6 to 82)
Parodi, 1979Ischaemic attacksST elevation or depression (details NR)−1.544 (−1.838 to −1.251)0.007−16.21 to −0.3448 (0 to 73)
Parodi, 1986Asymptomatic ST elevation (after verapamil)0.1 mV of ST segment elevation measured 20 ms after the J point−1.637 (−1.994 to −1.279)0.110−2.37 to −1.306 (0 to 65)
Asymptomatic ST depression (after verapamil)More than 0.2 mV of ST segment depression measured 80 ms after the J point−1.083 (−1.903 to −0.262)0.401−17.42 to −0.900 (0 to 62)
Symptomatic ST elevation (after verapamil)−1.580 (−1.906 to −1.254)<0.001−15.40 to −1.450 (0 to 62)
Symptomatic ST depression (after verapamil)−0.990 (−1.411 to −0.569)0.002−2.53 to −0.526 (0 to 64)
Asymptomatic ST elevation (after propranolol)0.100 (−0.086 to 0.286)0.006−0.77 to 1.3862 (25 to 81)
Asymptomatic ST depression (after propranolol)0.339 (−0.168 to 0.845)0.964−18.3 to 0.830 (0 to 62)
Symptomatic ST elevation (after propranolol)−0.002 (−0.177 to 0.173)0.063−14.9 to 0.6846 (0 to 74)
Symptomatic ST depression (after propranolol)−0.374 (−0.709 to −0.039)0.023−17.1 to −0.734 (0 to 64)
Pereira, 1995INRTarget INR range of 2.0–3.0−0.126 (−0.312 to 0.060)0.433−0.42 to 0.160 (0 to 71)
Tison, 2012Troublesome dyskinesia7 points scale (1=extremely uncomfortable, 7=not at all uncomfortable)0.167 (−0.449 to 0.783)0.593−0.67 to 1.830 (0 to 62)

*The significance of person-level HTE was assessed by a likelihood ratio test comparing the two models—model with common treatment effect and model with treatment-by-participant interactions.

INR, international normalised ratio; NR, not reported; SCL, Symptom Checklist.

Studies reporting person-level outcomes *The significance of person-level HTE was assessed by a likelihood ratio test comparing the two models—model with common treatment effect and model with treatment-by-participant interactions. INR, international normalised ratio; NR, not reported; SCL, Symptom Checklist.

Discussion

This review documents that multiperson N-of-1 studies rarely examine HTE. Only 8% of 62 multiperson N-of-1 studies described statistical tests examining HTE, but these generally involved comparisons of treatment effects among groups of patients (eg, based on age or sex) rather than across individuals. Only two studies in the whole of the literature tested for person-level HTE.15 16 Nevertheless, analysable person-level results are sometimes reported in multiperson N-of-1 studies, as outcomes or as treatment effects, suitable for the analysis of person-level HTE. Our reanalyses of the totality of available data from these studies (n=25) suggested the presence of substantial non-random variation in treatment effects across individuals in most studies. This was evident when considering statistical tests for the variation of treatment effects among patients and also by qualitative assessment of the magnitude of effect variation. This represents the first broad empirical examination with reanalysis of person-level HTE across multiperson N-of-1 studies, and it provides some general support for the a priori assumption of individual patient variation in treatment response that broadly motivates personalised medicine. In contrast to parallel-group studies that establish efficacy in a group of patients with a common condition, N-of-1 studies establish the effects of an intervention in an individual.17 In this respect, N-of-1 studies can be thought of as adjuncts to clinical care, where the goal is to select the right treatment for a particular patient, rather than as a research tool, where the goal is to create new generalisable knowledge.18 19 Indeed, the results of traditional N-of-1 studies may be generalisable only to the future treatment response of the patient in the trial, not to other patients. Nevertheless, using Bayesian meta-analytic techniques, Zucker et al showed how the average treatment effect at the population level can also be estimated by combining multiperson N-of-1 studies testing similar interventions in similar patients with the same outcome measures.14 Similar Bayesian methods have also been suggested for analysis of group-level HTE.20 Herein, we demonstrate yet a new application of N-of-1 studies, to explore person-level HTE. This application has important research and clinical implications, even when the determinants of HTE remain unidentified. It is particularly of interest that there was apparent variation in the degree of person-level HTE found across conditions and treatments. Since the degree of variation across individuals sets the upper bound for the amount of HTE that might be explainable by observable characteristics, such as clinical or genomic variables, searching for subgroup effects in the absence of person-level HTE is a futile exercise.4 21 22 An interesting example of how person-level HTE can vary across different conditions comes from the study of Johannessen et al (figure 3).15 These investigators conducted N-of-1 patient studies comparing cimetidine to placebo for patients presenting with dyspeptic symptoms and reported person-level effects by subgroups of disease categories. Among 46 trial completers, cimetidine had a significant effect for most patients (57%), as it did at the aggregate level. However, not only was there substantial person-level HTE, but person-level HTE varied across conditions, being much more pronounced in non-ulcer dyspepsia (I2=75%) compared with peptic ulcer disease (I2=35%) (figure 3)—despite the very similar overall effects seen in these two conditions.
Figure 3

Person-level variation across different disease conditions. This figure depicts the results of 46 different N-of-1 trials of cimetidine as reported by Johanessen et al.12 The effect of cimetidine versus placebo was measured in each subject across 12 crossover periods over the span of 184 days. While cimetidine had a similar average effect regardless of the index condition, there was far greater consistency of effect in patients with peptic ulcer disease and much more variation in effect among patients with non-ulcer dyspepsia.

Person-level variation across different disease conditions. This figure depicts the results of 46 different N-of-1 trials of cimetidine as reported by Johanessen et al.12 The effect of cimetidine versus placebo was measured in each subject across 12 crossover periods over the span of 184 days. While cimetidine had a similar average effect regardless of the index condition, there was far greater consistency of effect in patients with peptic ulcer disease and much more variation in effect among patients with non-ulcer dyspepsia. Finding variation in person-level response in multiperson N-of-1 studies identifies those conditions for which N-of-1 studies are likely to be clinically relevant. For condition-treatment combinations shown to have low person-level HTE, single subject studies are highly unlikely to be clinically informative, and the average results from trials (ie, ‘one-size-fits-all’ effects) are more apt to be applicable to individuals.23 24 On the other hand, N-of-1 studies may be highly clinically informative for condition-treatments with a high degree of person-level HTE. These conditions would also be potentially higher yield for examining predictors of HTE (genomic or otherwise). Our findings also have implications for clinical practice and formulary design. For conditions marked by high person-level HTE, even when trials show that one treatment is better on average than others, having a variety of medication options would be useful to optimise outcomes across all patients, particularly for chronic conditions such as those studied here where empiric trials of alternative medications to find the best treatment for an individual might be feasible. For example, the study by March et al25 shows that while patients with osteoarthritis on average had less pain and less stiffness with diclofenac, some patients had improved symptoms on paracetemol. This person-level HTE may not be detectable in conventional parallel-arm trials employing conventional subgroup analysis.21 While more studies combining N-of-1 studies are needed to understand the extent of person-level HTE, future studies need to apply greater methodological rigour to improve the state-of-the-science on evaluation of individual treatment effects.26 While the recently published Consolidated Standards of Reporting Trials Extension for N-of-1 trials may help improve reporting, a tabulation of all information (possibly electronically available) appears the most straightforward way to facilitate the clinical interpretation of these studies.27 Such reporting allows the inspection of trajectories over time and may reveal patterns that are not captured by regression models. Complete reporting would also facilitate the development and evaluation of methods for the analysis of single subject experiments, particularly its use to better understand the extent and importance of person-level HTE. The limitations of this review reflect, to a large extent, the limitations of the data in primary studies. Many conditions are not amenable to the N-of-1 design (eg, because treatment effects are cumulative or because outcomes are observed only once). Further, even for conditions and treatment that are potentially amenable to this design, many important disease categories lacked published N-of-1 studies. We relied on published studies only and our analytic cohort may be an underestimation of the true prevalence of these studies—particularly for N-of-1 studies, which may frequently be conducted without the intention of future publication. In addition, our conclusions regarding the ubiquity of HTE in the data we reanalysed should be interpreted in the context of several important limitations. First, there were only a limited number of available studies that reported data sufficient to analyse, and therefore we present only a very partial picture of the full scope of interindividual variation in effects across clinical conditions. Furthermore, among the studies that did have data, only fairly small number of patients were observed over a small number of treatment periods and we frequently had to rely on data summaries provided by the authors (eg, person-level treatment effects and their sampling variance); these data limitations precluded the use of more complex models, for example, models that account for period effects or other effects of time on the outcome.3 Our review has demonstrated that HTE remains almost totally unexplored in multiperson N-of-1 studies, which are uniquely capable of exploring variations in individual (person-level) treatment effects. Our reanalysis of the data from these studies represents the first systematic attempt to obtain empirical support for the a priori argument that treatment effects vary across individual patients, an assumption which underpins all efforts to personalise treatment selection. In this sample, person-level HTE appears to be common and large enough to be clinically meaningful; the degree of person-level HTE appears to vary across conditions and outcomes. Thus, multiperson N-of-1 studies are an under-utilised tool to identify where person-level HTE may be substantial and where efforts to find molecular or clinical predictors of response heterogeneity should be focused. In such conditions, parallel arm studies might yield results that are over-generalised for patient level decision-making.
  24 in total

1.  Quantifying heterogeneity in a meta-analysis.

Authors:  Julian P T Higgins; Simon G Thompson
Journal:  Stat Med       Date:  2002-06-15       Impact factor: 2.373

Review 2.  Individual response to treatment: is it a valid assumption?

Authors:  Stephen Senn
Journal:  BMJ       Date:  2004-10-23

3.  Principles for evidence-based drug formulary policy.

Authors:  Gregory E Simon; Bruce M Psaty; Jennifer Berg Hrachovec; Marc Mora
Journal:  J Gen Intern Med       Date:  2005-10       Impact factor: 5.128

4.  Meta-analysis in clinical trials.

Authors:  R DerSimonian; N Laird
Journal:  Control Clin Trials       Date:  1986-09

5.  Investigating variability in patient response to treatment--a case study from a replicate cross-over study.

Authors:  Stephen Senn; Katie Rolfe; Steven A Julious
Journal:  Stat Methods Med Res       Date:  2010-08-25       Impact factor: 3.021

6.  N of 1 randomized trials for investigating new drugs.

Authors:  G H Guyatt; A Heyting; R Jaeschke; J Keller; J D Adachi; R S Roberts
Journal:  Control Clin Trials       Date:  1990-04

Review 7.  N-of-1 trials in the medical literature: a systematic review.

Authors:  Nicole B Gabler; Naihua Duan; Sunita Vohra; Richard L Kravitz
Journal:  Med Care       Date:  2011-08       Impact factor: 2.983

8.  CONSORT extension for reporting N-of-1 trials (CENT) 2015 Statement.

Authors:  Sunita Vohra; Larissa Shamseer; Margaret Sampson; Cecilia Bukutu; Christopher H Schmid; Robyn Tate; Jane Nikles; Deborah R Zucker; Richard Kravitz; Gordon Guyatt; Douglas G Altman; David Moher
Journal:  BMJ       Date:  2015-05-14

9.  Personalised medicine: not just in our genes.

Authors:  Georgios D Kitsios; David M Kent
Journal:  BMJ       Date:  2012-04-03

10.  Sex based subgroup differences in randomized controlled trials: empirical evidence from Cochrane meta-analyses.

Authors:  Joshua D Wallach; Patrick G Sullivan; John F Trepanowski; Ewout W Steyerberg; John P A Ioannidis
Journal:  BMJ       Date:  2016-11-24
View more
  6 in total

Review 1.  Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects.

Authors:  David M Kent; Ewout Steyerberg; David van Klaveren
Journal:  BMJ       Date:  2018-12-10

2.  What is Evidence-Based Functional Medicine in the 21st Century?

Authors:  Jeffrey S Bland
Journal:  Integr Med (Encinitas)       Date:  2019-06

3.  The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration.

Authors:  David M Kent; David van Klaveren; Jessica K Paulus; Ralph D'Agostino; Steve Goodman; Rodney Hayward; John P A Ioannidis; Bray Patrick-Lake; Sally Morton; Michael Pencina; Gowri Raman; Joseph S Ross; Harry P Selker; Ravi Varadhan; Andrew Vickers; John B Wong; Ewout W Steyerberg
Journal:  Ann Intern Med       Date:  2019-11-12       Impact factor: 25.391

4.  Heterogeneous treatment effect analysis based on machine-learning methodology.

Authors:  Xiajing Gong; Meng Hu; Mahashweta Basu; Liang Zhao
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2021-10-30

5.  Introducing national osteopathy practice-based research networks in Australia and New Zealand: an overview to inform future osteopathic research.

Authors:  Amie Steel; Wenbo Peng; David Sibbritt; Jon Adams
Journal:  Sci Rep       Date:  2020-01-21       Impact factor: 4.379

6.  Biased Survival Predictions When Appraising Health Technologies in Heterogeneous Populations.

Authors:  Daniel Gallacher; Peter Kimani; Nigel Stallard
Journal:  Pharmacoeconomics       Date:  2021-09-28       Impact factor: 4.981

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.