Ceyda Oksel1, Raquel Granell2, Osama Mahmoud2, Adnan Custovic3, A John Henderson2. 1. Section of Paediatrics, Department of Medicine, Imperial College London, London, United Kingdom. 2. Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom. 3. Section of Paediatrics, Department of Medicine, Imperial College London, London, United Kingdom. Electronic address: a.custovic@imperial.ac.uk.
Abstract
BACKGROUND: Latent class analysis (LCA) has been used extensively to identify (latent) phenotypes of childhood wheezing. However, the number and trajectory of discovered phenotypes differed substantially between studies. OBJECTIVE: We sought to investigate sources of variability affecting the classification of phenotypes, identify key time points for data collection to understand wheeze heterogeneity, and ascertain the association of childhood wheeze phenotypes with asthma and lung function in adulthood. METHODS: We used LCA to derive wheeze phenotypes among 3167 participants in the ALSPAC cohort who had complete information on current wheeze recorded at 14 time points from birth to age 16½ years. We examined the effects of sample size and data collection age and intervals on the results and identified time points. We examined the associations of derived phenotypes with asthma and lung function at age 23 to 24 years. RESULTS: A relatively large sample size (>2000) underestimated the number of phenotypes under some conditions (eg, number of time points <11). Increasing the number of data points resulted in an increase in the optimal number of phenotypes, but an identical number of randomly selected follow-up points led to different solutions. A variable selection algorithm identified 8 informative time points (months 18, 42, 57, 81, 91, 140, 157, and 166). The proportion of asthmatic patients at age 23 to 24 years differed between phenotypes, whereas lung function was lower among persistent wheezers. CONCLUSIONS: Sample size, frequency, and timing of data collection have a major influence on the number and type of wheeze phenotypes identified by using LCA in longitudinal data. Crown
BACKGROUND: Latent class analysis (LCA) has been used extensively to identify (latent) phenotypes of childhood wheezing. However, the number and trajectory of discovered phenotypes differed substantially between studies. OBJECTIVE: We sought to investigate sources of variability affecting the classification of phenotypes, identify key time points for data collection to understand wheeze heterogeneity, and ascertain the association of childhood wheeze phenotypes with asthma and lung function in adulthood. METHODS: We used LCA to derive wheeze phenotypes among 3167 participants in the ALSPAC cohort who had complete information on current wheeze recorded at 14 time points from birth to age 16½ years. We examined the effects of sample size and data collection age and intervals on the results and identified time points. We examined the associations of derived phenotypes with asthma and lung function at age 23 to 24 years. RESULTS: A relatively large sample size (>2000) underestimated the number of phenotypes under some conditions (eg, number of time points <11). Increasing the number of data points resulted in an increase in the optimal number of phenotypes, but an identical number of randomly selected follow-up points led to different solutions. A variable selection algorithm identified 8 informative time points (months 18, 42, 57, 81, 91, 140, 157, and 166). The proportion of asthmatic patients at age 23 to 24 years differed between phenotypes, whereas lung function was lower among persistent wheezers. CONCLUSIONS: Sample size, frequency, and timing of data collection have a major influence on the number and type of wheeze phenotypes identified by using LCA in longitudinal data. Crown
Wheeze is a common symptom in the early years of life, with nearly one third of children experiencing it at least once before their third birthday.1, 2, 3 Although the symptoms of most infants with wheeze seem to remit by the time the child reaches school age, infantile wheeze can also persist into later childhood and adulthood after a period of remission.5, 6 Conversely, the majority of patients with persistent asthma start wheezing in early childhood. However, at the onset of symptoms, patients with “transient wheeze” and “persistent wheeze” look very similar, and it is difficult to predict which of the early childhood wheezers will stop wheezing (and when) and which will have persistent wheezing and asthma.Understanding the heterogeneity of wheezing disorders and distinguishing wheeze phenotypes in early childhood is critical to developing interventions targeted at those who will persist with wheezing into later childhood and to avoid overtreatment of patients with transient wheeze. Over the last 2 decades, substantial effort has been devoted to understanding the heterogeneity of childhood wheezing illness.7, 8, 9, 10 In general, population-based birth cohorts are regarded as optimal data sources for understanding temporal patterns of wheezing and relating them to different risk factors because the information is collected prospectively and therefore free from recall bias.The initial approach of hypothesis testing using data on wheezing collected at the ages of 3 and 6 years in the Tucson Children's Respiratory Study (TCRS) described 3 wheeze phenotypes: transient early, late onset, and persistent. This finding was confirmed in several independent cohorts.3, 12, 13 Subsequently, the methodology to discover “wheeze phenotypes” was extended to the use of unsupervised data-driven approaches, such as latent class analysis (LCA).1, 14, 15, 16, 17, 18 These analyses revealed a different structure within the data and suggested the existence of 119, 20 or 2 further intermediate phenotypes.1, 17, 18 It is important to emphasize that although wheeze phenotypes derived from different analyses tend to share the same nomenclature, phenotypes with the same assignment often differ substantially in terms of the age of onset, temporal trajectory, distributions within a population, and associated risk factors, making comparison between studies difficult and clinical application uncertain.8, 10 For example, late-onset wheezers were reported to start experiencing symptoms after the age of 3, 4, or 5 years in different studies. Inconsistencies between studies can be partly attributed to differences in study design or could be due to true differences between different populations. However, this seems unlikely because most evidence comes from broadly similar population-based studies with comparable ethnic mixes.If we are to understand factors associated with patterns of wheezing with different long-term consequences, then phenotypes must be consistent and reproducible. Despite the widespread use of LCA, little is known about the external factors that influence the outcomes of LCA models in phenotype identification. We propose that sample size and the timing and frequency of data collection affect the number and type of discovered wheeze phenotypes in LCA and that not all time points carry useful information (and therefore some might be redundant or even cause uncertainty in the results).To provide a better understanding of the influence of input data characteristics on the identified longitudinal trajectories of wheezing, we investigated the effect of the number of data points, age at which information was collected, and sample size on the number and/or nature of wheeze phenotypes discovered by LCA. We also sought to identify data collection points, which are most informative in distinguishing wheeze phenotypes.
Methods
Study design, setting, and participants
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a population-based birth cohort established in 1991 in Avon, United Kingdom. It recruited 14,701 children born between April 1, 1991, and December 31, 1992. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and local research ethics committees. Details of the study protocol can be found elsewhere. The study Web site contains details of all the data that are available through a fully searchable data dictionary at www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/.
Data sources and definition of outcomes
Participating mothers were sent a self-completion questionnaire about the health of their children at 14 time points from birth to age 16½ years: months 6, 18, 30, 42, 57, 69, 81, 91, 103, 128, 140, 157, 166, and 198. Current wheezing was defined as a positive answer to the following question: “In the last 12 months has he/she had any periods when there was wheezing or wheezing with whistling on his/her chest when he/she breathed?”Study subjects attended a research clinic at age 23 to 24 years in which lung function was measured by using spirometry.23, 24 Postbronchodilator FEV1 was ascertained 15 minutes after administration of 400 μg of salbutamol. We expressed FEV1 as percent predicted values against Global Lung Function Initiative curves.
Self-reported asthma ever was defined as a positive answer to the following question: “Have you ever had asthma?” Self-reported current asthma was defined at age 23 years as asthma ever together with a positive answer to either of the following questions: “Have you had any wheezing or whistling in the past 12 months?” or “Have you taken asthma medication in the last 12 months?”
Statistical analysis
Children with complete reports of wheezing at all 14 time points from birth to age 16½ years (n = 3167) were included in the analysis to obtain better representation of the latent structure. We performed LCA to investigate how latent class subpopulation structure varied by the timing and frequency of observations. Starting with a latent model including 4 phenotypes, we compared models with varying sample sizes (3167, 2500, 2000, 1500, 1000, and 500), numbers of latent classes (4-6), and numbers of time points (14, 11, 8, and 6) based on their statistical fit, including the Akaike information criterion, Bayesian information criterion (BIC), Lo–Mendell–Rubin and bootstrapped likelihood ratio, model quality (model entropy), and interpretability. The best-fitting model in each run was selected based on the lowest BIC. We then repeated our analyses among 12,290 participants with at least 2 questionnaire responses. We identified critical data collection points for identification of distinct phenotypes of wheezing based on stochastic evolutionary search through a genetic algorithm (see the Methods section in this article's Online Repository at www.jacionline.org for more details on the methodology for selection of informative data collection points). The adjusted Rand index (ARI) was used as a similarity measure when comparing different clustering results. Variable specific entropy values were used to show how well individual data collection points identify the latent classes. We calculated CIs for the difference of population proportions to compare the frequency of participants with asthma at age 23 years between different phenotypes. Differences in lung function were tested by using 1-way ANOVA and the Tukey honestly significant difference test. All analyses were performed in Stata software (version 15), Mplus 8, and R software by using the packages poLCA, DiagrammeR, and LCAvarsel.
Results
A total of 3167 participants had complete reports of wheeze at all 14 time points. In line with our previous results,17, 18 the best-fitting model resulted in 6 distinct wheeze phenotypes: never/infrequent wheezing; persistent wheezing; 2 early-onset transient classes (early-onset preschool remitting and early-onset midchildhood remitting); and 2 late-onset persisting classes (school-age onset and late-childhood onset).
Influence of sample size
We varied the sample size from 3167 to 500 and developed 11 different models based on randomly selected subsamples of 6 different sizes (n = 500, 1000, 1500, 2000, 2500, and 3167), holding all else constant. Fig 1, A, shows the best-fitting models based on different sample sizes and the prevalence of each phenotype based on the estimated model. Four phenotypes (never/infrequent, persistent, transient early, and late onset) were identified with a sample size of 500. The best-fitting model based on 1000 participants resulted in 4 to 5 phenotypes.
Fig 1
Optimal number, shape, and prevalence of wheeze phenotypes identified by using LCA. A, Eleven latent models based on randomly selected subsamples of 6 different sample sizes (n = 500, 1000, 1500, 2000, 2500, and 3167) while maintaining a constant number of follow-up points (14 time points). B, Ten latent models based on randomly selected time points (6, 8, 11, and 14 time points) while maintaining a constant sample size (n = 3167). C, Twelve latent models based on randomly selected subsamples of 4 different sample sizes (n = 500, 1500, 2500, and 3167) and different numbers of time points (6, 8, and 11 time points).
Optimal number, shape, and prevalence of wheeze phenotypes identified by using LCA. A, Eleven latent models based on randomly selected subsamples of 6 different sample sizes (n = 500, 1000, 1500, 2000, 2500, and 3167) while maintaining a constant number of follow-up points (14 time points). B, Ten latent models based on randomly selected time points (6, 8, 11, and 14 time points) while maintaining a constant sample size (n = 3167). C, Twelve latent models based on randomly selected subsamples of 4 different sample sizes (n = 500, 1500, 2500, and 3167) and different numbers of time points (6, 8, and 11 time points).Larger sample sizes (≥2000 participants) were needed to detect smaller phenotypes (<5% frequency). LCA identified 6 latent wheeze phenotypes in samples of 2000 or more children with complete data (Fig 1, A) and in samples of 5000 or more children with incomplete data (see Fig E1 in this article's Online Repository at www.jacionline.org).
Fig E1
Optimal number, shape, and prevalence of wheeze phenotypes with different (12,290, 7,500, 5,000, 2,500, 1,500, and 500) sample sizes based on children with at least 2 observations of wheezing (the optimal phenotype was chosen based on the lowest BIC).
Influence of data collection frequency
We then varied the frequency of data collection time points from 6 to 14 and developed 10 different models based on randomly selected time points while maintaining a constant sample size (n = 3167). Adding more time points to the latent model increased the number of wheeze phenotypes that were identified (Fig 1, B). However, in some cases an identical number of (randomly selected) data collection points (eg, 11 time points) resulted in different optimal numbers of phenotypes, depending on intervals between time points. This suggests that, in addition to sampling frequency, timing and distribution of time points at which data are collected can influence wheeze phenotype identification and that there might be critical data collection points that are more informative in distinguishing wheeze phenotypes.
Combined effects of sample size and data collection frequency
To examine how both the frequency of data collection (number of time points) and the size of the studied population affects the optimal number, trajectory, and frequency of the identified phenotypes, we varied the number of data collection points from 6 to 11 and randomly selected subsamples of 4 different sizes, resulting in a total of 12 data conditions (Fig 1, C).Models with small sample sizes (n < 2500) did not identify low-frequency phenotypes (<5%), regardless of the frequency of data sampling. However, there was a clear link between sample size, number of data points, and optimal number of wheeze phenotypes. Models with sample sizes of 2500 or greater identified 6 phenotypes when the number of data collection points included in the analysis was relatively high. However, models with decreasing numbers of data points were unable to detect 6 phenotypes, and models with the same sample size did not identify small phenotypes (<5% frequency) under certain conditions (eg, number of time points < 11).
Selection of the most informative data collection points
Fig E2 in this article's Online Repository at www.jacionline.org shows the correlations (phi coefficients) between wheeze reports at different time points. Time points close to each other were moderately correlated (eg, months 157 and 166 and months 81 and 91), suggesting that some of the adjacent time points convey similar information. To discard the noninformative data collection points, we performed stochastic evolutionary search through a genetic algorithm, which retained 8 informative time points (months 18, 42, 57, 81, 91, 140, 157, and 166), and 6 were dropped as uninformative (months 6, 30, 69,103, 128, and 198). Comparing the clustering of the models using 8 time points with the clustering from the model using the full data set showed a satisfactory level of agreement, with a Rand index and ARI of 82 and 64%, respectively (Table I).
Fig E2
Heat map showing the phi coefficient of pairwise comparison between data collection points.
Table I
Clustering summary of the LCA model fitted to the data subset (8 time points identified through a genetic algorithm search) and its comparison with the model fitted to the full data set (14 data collection points) based on 3167 participants with complete information on current wheeze recorded at 14 time points
Model characteristics
Variable selection (stochastic search)
No. of classes
6
Selected time points (mo)
Univariate entropy
BIC
15,508
18
0.502
42
0.581
Entropy
0.87
57
0.590
81
0.578
Rand Index
0.82
91
0.588
ARI
0.64
140
0.549
Jaccard index
0.70
157
0.576
166
0.582
Clustering summary of the LCA model fitted to the data subset (8 time points identified through a genetic algorithm search) and its comparison with the model fitted to the full data set (14 data collection points) based on 3167 participants with complete information on current wheeze recorded at 14 time points
Latent transition probabilities with increasing numbers of classes
To understand how the trajectories and estimated phenotypes changed over a sequence of increasing numbers of classes and how children move from one class to another in models with increasing numbers of classes, we developed 3 LCA models with 4, 5, and 6 classes. Persistent and never/infrequent wheezing classes had similar patterns in all 3 models, with a slight decrease in estimated prevalence from a 4- to 6-class solution (Fig 2, A). With the addition of a fifth latent class, transient early wheezing was divided into 3 remitting classes (preschool and midchildhood resolution; Fig 2, B), whereas late-onset wheezing remained almost identical. The addition of a sixth class resulted in division of late-onset wheezing into 2 similarly sized subgroups (school-age and late-childhood onset; Fig 2, C). We then assigned participants to the most likely phenotype based on the maximum membership probability and calculated transition probabilities reflecting the proportion of participants moving from one phenotype to another when the number of phenotypes increased from 4 up to 6. Fig 3 shows whether members of distinct phenotypes remained in the same phenotype or shift into another one (either existing or newly formed) with increasing numbers of phenotypes. The figure also demonstrates from where the intermediate phenotypes arise and which phenotypes become separated or remain undivided with increasing numbers of latent classes. The results based on analysis of participants with incomplete reports of wheezing (12,290 participants with ≥2 responses to questionnaires about wheezing) did not materially differ from those obtained among children with a complete data set and are presented in Fig E1, Fig E3, Fig E4, Fig E5, Fig E6 in this article's Online Repository at www.jacionline.org.
Fig 2
Estimated prevalence of wheezing for each wheeze phenotype in 4, 5, and 6 latent class solutions identified by using LCA.
Fig 3
Assignment of children into distinct wheeze phenotypes over a sequence of latent class models with 4, 5, and 6 classes based on most likely class membership (cohort of 3167 children with complete reports of wheezing at 14 time points). Ellipse nodes show class membership (most likely phenotype), whereas values along the arrow represent the percentage of children moving from one class to another in models with an increasing number of classes.
Fig E3
Optimal number, shape, and prevalence of wheeze phenotypes with different numbers of data collection points (6, 8, 11, and 14) based on children with at least 2 observations of wheeze.
Fig E4
Optimal number, shape, and prevalence of wheeze phenotypes with combined effects of sample size and data collection frequency based on children with at least 2 observations of wheeze.
Fig E5
Estimated prevalence of wheeze for each wheeze phenotype in 4, 5, and 6 latent class solutions identified by using LCA based on 12,290 children with at least 2 observations of wheeze.
Fig E6
Assignment of children into distinct wheeze phenotypes over a sequence of latent class models with 4, 5, and 6 classes based on most likely class membership (12,290 children with at least 2 observations of wheezing). Ellipse nodes show class membership (most likely phenotype), whereas values along the arrow represent the percentage of children moving from one class to another in models with an increasing number of classes.
Estimated prevalence of wheezing for each wheeze phenotype in 4, 5, and 6 latent class solutions identified by using LCA.Assignment of children into distinct wheeze phenotypes over a sequence of latent class models with 4, 5, and 6 classes based on most likely class membership (cohort of 3167 children with complete reports of wheezing at 14 time points). Ellipse nodes show class membership (most likely phenotype), whereas values along the arrow represent the percentage of children moving from one class to another in models with an increasing number of classes.
Asthma and lung function in adulthood in patients with different wheeze phenotypes
Of 3797 participants who attended follow-up at age 23 to 24 years, 1492 had complete reports of wheezing (14 points), of whom 240 (16%) reported current asthma; 1345 had valid lung function. The proportion of subjects with current asthma was greatest in the persistent wheeze phenotype (99.7%, Table II). In the 2 early-onset transient phenotypes, the proportion of asthmatic patients was significantly greater in midchildhood-remitting (60.4%) compared with the preschool-remitting (6.4%) phenotypes (mean difference, 0.5; 95% CI, 0.40-0.68; P < .0001). In the 2 late-onset phenotypes the proportion of asthmatic patient was significantly greater in the school-age onset (88.4%) compared with late-childhood onset (68.1%) phenotypes (mean difference, 0.20; 95% CI, 0.05-0.36; P < .02). Prebronchodilator and postbronchodilator lung function differed significantly across phenotypes (P = .005 and P = .04, respectively, ANOVA) and was significantly less in the persistent wheezing and early-onset preschool remitting wheeze phenotypes compared with the never/infrequent wheeze phenotype, with little evidence of differences between other phenotypes (Table III and see Table E1, Table E2, Table E3 in this article's Online Repository at www.jacionline.org). The preschool-onset remitting phenotype mostly overlapped with no asthma (94%), but prebronchodilator and postbronchodilator lung function at age 24 years was significantly less in this class compared with the never/infrequent wheeze phenotype.
Table II
Proportion of asthmatic patients at age 23 to 24 years in each phenotype
Wheeze phenotypes, 0-16½ y
Self-reported asthma ever
Current asthma at age 23 y
Asthma medication use at age 23 y
No. of asthmatic patients/total
Percentage∗
No. of asthmatic patients/total
Percentage∗
No. of medication users/total
Percentage∗
Never infrequent
105/1111
9.4
50/985
5.1
33/985
3.3
Transient early
Preschool remitting
54/355
15.1
19/295
6.4
9/295
3.2
Midchildhood remitting
72/95
75.3
30/49
60.4
14/49
29.5
Late onset
School-age onset
56/61
91.3
38/43
88.4
25/43
58.3
Late-childhood onset
58/82
70.0
38/55
68.1
25/55
45.3
Persistent wheeze
81/82
98.5
65/65
99.7
53/65
82.1
The percentage is estimated from weighted cross-tabulations.
Table III
Lung function at age 24 years by wheeze phenotype (restricted to 1343 participants with FEV1 percent predicted data and 1351 with FEV1/FVC data)
Wheeze phenotypes, 0-16½ y
Baseline lung function at 24 y
Postbronchodilator lung function at 24 y
FEV1 (% predicted)
FEV1/FVC ratio
FEV1 (% predicted)
FEV1/FVC ratio
No.
Mean (SD)
No.
Mean (SD)
No.
Mean (SD)
No.
Mean (SD)
Never infrequent
1004
95.0 (11.7)
1009
0.84 (0.06)
830
97.9 (11.7)
834
0.86 (0.06)
Transient early
Preschool remitting
329
93.4 (11.4)
330
0.82 (0.07)
274
96.8 (10.9)
275
0.85 (0.06)
Midchildhood remitting
89
93.5 (11.4)
91
0.82 (0.06)
71
97.5 (11.8)
73
0.84 (0.05)
Late onset
School-age onset
61
95.4 (11.2)
61
0.81 (0.08)
47
100.8 (10.8)
47
0.86 (0.06)
Late-childhood onset
79
94.0 (12.1)
80
0.82 (0.07)
62
98.7 (10.8)
63
0.85 (0.05)
Persistent wheeze
80
91.6 (12.4)
80
0.79 (0.09)
59
96.5 (11.1)
59
0.83 (0.07)
FVC, Forced vital capacity.
Table E1
Differences in lung function (FEV1 percent predicted) at 24 years between wheeze phenotypes: ANOVA and Tukey HSD test of pairwise comparisons
ANOVA
Variation source
df
Sum square
Mean square
F value
Pr(>F)
Pre-BD FEV1 % predicted
Post-BD FEV1 % predicted
Pre-BD FEV1 % predicted
Post-BD FEV1 % predicted
Pre-BD FEV1 % predicted
Post-BD FEV1 % predicted
Pre-BD FEV1 % predicted
Post-BD FEV1 % predicted
Pre-BD FEV1 % predicted
Post-BD FEV1 % predicted
Phenotypes
5
5
2,243
1,463
448.6
292.6
3.313
2.252
0.005561
0.04709
Residual
1,651
1,351
223,565
175,521
135.4
129.9
Significant differences are in boldface.
HSD, Honestly significant difference.
Table E2
Differences in lung function (FEV1/FVC ratio) at 24 years between wheeze phenotypes: ANOVA and Tukey HSD test of pairwise comparisons
FEV1 reversibility at age 24 years by phenotype (restricted to 1364 participants with FEV1 reversibility data)
Wheeze phenotypes, 0-16½ y
FEV1 reversibility at 24 y
P value∗
Percent positive FEV1 reversibility (optional)
No.
Mean (SD)
Never-infrequent
898
2.87 (6.2)
Baseline
80.0
Transient early
Preschool remitting
224
4.02 (5.4)
0.104
81.3
Midchildhood remitting
69
4.03 (5.3)
0.627
84.1
Late onset
School-age onset
49
4.94 (4.9)
0.171
89.8
Late-childhood onset
65
4.22 (5.4)
0.490
83.1
Persistent wheeze
59
6.29 (5.9)
0.0003
91.5
Tukey test.
Proportion of asthmatic patients at age 23 to 24 years in each phenotypeThe percentage is estimated from weighted cross-tabulations.Lung function at age 24 years by wheeze phenotype (restricted to 1343 participants with FEV1 percent predicted data and 1351 with FEV1/FVC data)FVC, Forced vital capacity.
Discussion
Key results
Our results suggest that the number and nature of wheeze phenotypes from infancy to adolescence identified by using LCA are dependent on several factors, including sample size, frequency, timing and distribution of data collection time points, model dimensionality, and the combination of these factors. Transition analysis revealed that subjects assigned to the never or persistent wheeze phenotypes tend to stay in these phenotypes, whereas most of the switching goes on in the intermediate classes. Given the strong interplay between the birth cohort design (including the number of participants, data collection frequency, and distribution) and the optimal number of phenotypes identified by means of developmental trajectory modeling, care should be taken when interpreting wheeze phenotypes emerging from small studies with few data collection points. When the sample size is small, a wheeze phenotype that exists in the population might be unidentifiable, whereas excessive data collection can result in identification of trivial or clinically irrelevant phenotypes. In general, increasing data collection frequency helps detect more complex structure and larger numbers of phenotypes by capturing less frequently observed subgroups. However, it also increases the risk of violating the fundamental assumption of LCA modeling where indicator variables (eg, presence/absence of wheezing at subsequent ages) are independent of each other. When frequent data collection and large sample sizes are not obtainable, collecting data at critical time points might help counterbalance the effects of suboptimal conditions (eg, smaller sample size and infrequent data collection). In our study the time points that proved most informative in distinguishing wheeze phenotypes were months 18, 42, 57, 81, 91, 140, 157, and 166.
Limitations
There are several limitations to our findings. Despite latent models' usefulness in disentangling disease complexity, 1 unresolved issue in the application of LCA is that there is not one commonly accepted statistical indicator for deciding on the number of subgroups in a study population. The limitation of this study is that we do not know how many true phenotypes there are, and we assumed that the classification obtained on the largest sample and using all time points corresponded to the best available approximation of the “true classification.” In the absence of clear statistical requirements for identifying clinically important groups of small size, validation of the phenotypes with late asthma outcomes provides the only clues about their clinical relevance. However, we acknowledge that in our study information on asthma and lung function measures at age 23 to 24 years was available for approximately 45% of participants used to derive wheeze phenotypes.Another limitation is that we could only vary conditions using the sampling framework that was available to us, which was fixed by the study design, and therefore this analysis has limited direct application to other studies that have used different sampling frames. We also acknowledge that the definition of current wheezing, which we used in our models, is based on parental reporting using validated questionnaires (as in most other epidemiologic studies) and that this might lead to overestimation of the true prevalence.As most previous studies, we used information on current wheeze for our modeling. It is possible that a more holistic examination of other features (eg, frequency and severity of wheezing) and/or other symptoms (eg, cough, atopic dermatitis, and rhinitis) and lung function might allow better distinction of the underlying pathophysiologic mechanisms.The key advantage of our study is the large sample size with complete data on wheezing collected frequently and prospectively. Another advantage is that participants were followed from birth to late adolescence, covering a longer period compared with most prior studies.1, 13, 18, 19, 30Finally, it is worth noting that subtypes discovered by using data-driven methods are not observed but are latent by nature and ideally should not be referred to as “phenotypes” (ie, observable characteristics). However, because the term “phenotype” has been used in this context for more than a decade, we have maintained this nomenclature.
Interpretation
A number of previous studies (including our own) embarked on identifying wheeze phenotypes from birth to mid–school age (summarized in Table E4 in this article's Online Repository at www.jacionline.org). However, the inconsistency of findings has led to a debate on the validity and clinical value of phenotyping studies,10, 31, 32 hampering the discovery of pathophysiologic endotypes and translation into clinically actionable insights. The 4 phenotypes of persistent, never, transient early, and late-onset wheezing have been long postulated in descriptive and data-driven studies. We found that when the sample size is relatively small, a particular wheeze phenotype that exists in the population might be undetectable. Therefore relatively smaller sample size in some studies might have contributed to the inability to detect intermediate wheeze phenotypes with a relatively low prevalence. Using more time points allowed identification of less common phenotypes (<5% frequency) by increasing possible response patterns. When the data collection was frequent (>11 time points), a sample size of approximately 2500 was found to be sufficiently large to distinguish 6 phenotypes. However, even a larger sample size of 3167 might be insufficient to detect uncommon phenotypes (<5% frequency) under certain conditions (eg, data collection points <11). Our findings suggest that increasing data collection frequency might help compensate for a modest sample size in phenotype identification. In line with this finding, Depner et al identified an intermediate phenotype in the PASTURE cohort that existed during the first 6 years of life by using a similar sample size but more data collection points than those used in the TCRS. However, the selection of follow-up points needs careful thought. Our analyses have shown that although adding more time points to the latent model increased the number of identified phenotypes with distinguishable interpretations, in some cases the same number of randomly selected data collection points resulted in a different optimal solution. This suggests that the timing and distribution of follow-up is important and that there might be critical data collection points that are more informative than others. A variable selection method that we applied to the data identified 6 time points that were not carrying additional useful information (months 6, 30, 69, 103, 128, and 198).
Table E4
Wheeze phenotypes from birth up to 9 years of age identified based on temporal pattern
Cohort
Sample Size
No. of time points
Years covered
No. of phenotypes
CCCEHE7
689
15
9
4
TUSCONE8
826
2
6
4
PASTUREE9
953
6
6
5
MAASE10
1184
8
8
5
PIAMAE11
2810
8
8
5
ALSPACE11
5760
8
8
6
ALSPACE12
6265
7
7
6
CCCEH, Columbia Center for Children’s Environmental Health; MAAS, Manchester Asthma and Allergy Study; PASTURE, Protection against Allergy—Study in Rural Environments; PIAMA, Prevention and Incidence of Asthma and Mite Allergy; TUCSON, Tucson Children’s Respiratory Study.
The proportion of asthmatic patients was greatest in the persistent wheeze phenotype (98.5%), and subjects in this phenotype had diminished prebronchodilator and postbronchodilator lung function (at the time of maximally attained physiologic lung function plateau) compared with all other phenotypes. The proportion of asthmatic patients differed between intermediate phenotypes (15.1% and 75.3% in 2 transient early phenotypes, preschool remitting and midchildhood remitting, respectively; 91.3% and 70.0% in 2 late-onset phenotypes, late childhood and school-age onset, respectively). These findings suggest that all phenotypes are distinct and that this might be a true classification. However, we acknowledge that the observed associations might also be a proxy of severity.The preschool-onset remitting phenotype mostly overlapped with no asthma (94%), but the prebronchodilator and postbronchodilator lung function at age 24 years was significantly lower in this class compared with the never/infrequent wheeze phenotype. Although this can be seen as a contradiction, we would stress that diminished lung function does not equate to asthma. There is evidence that early transient wheezing is associated with low lung function34, 35, 36, 37; as lungs/airways grow, the symptoms regress, but lung function impairment can persist. In TCRS the lowest infant lung function test values were associated with low lung function at 22 years, and therefore early wheezing that remits might be a marker of low lung function in early life that persists to adulthood but without the development of airway inflammation or asthma.In conclusion, our findings add to the understanding of childhood wheeze phenotypes by extending the knowledge on potential causes of variability in classification of wheezing. Sample size, frequency, and timing of data collection have a major influence on the number and type of phenotypes identified by using data-driven techniques. Our results, which include information on the most informative follow-up points, are important to interpret (or reanalyze) existing studies and to inform better design of future cohorts. However, we wish to note that these data collection points should not be regarded as absolute; rather, they should be treated as relative values with respect to our population and considerations for investigators when designing future studies.The number and nature of wheeze phenotypes identified by using LCA are dependent on the sample size, frequency, timing and distribution of data collection time points; model dimensionality; and combinations of these factors.Not all data collection points carry useful information in distinguishing wheeze phenotypes.
Authors: Wayne J Morgan; Debra A Stern; Duane L Sherrill; Stefano Guerra; Catharine J Holberg; Theresa W Guilbert; Lynn M Taussig; Anne L Wright; Fernando D Martinez Journal: Am J Respir Crit Care Med Date: 2005-08-18 Impact factor: 21.405
Authors: M R Miller; J Hankinson; V Brusasco; F Burgos; R Casaburi; A Coates; R Crapo; P Enright; C P M van der Grinten; P Gustafsson; R Jensen; D C Johnson; N MacIntyre; R McKay; D Navajas; O F Pedersen; R Pellegrino; G Viegi; J Wanger Journal: Eur Respir J Date: 2005-08 Impact factor: 16.671
Authors: Nicole Beydon; Stephanie D Davis; Enrico Lombardi; Julian L Allen; Hubertus G M Arets; Paul Aurora; Hans Bisgaard; G Michael Davis; Francine M Ducharme; Howard Eigen; Monika Gappa; Claude Gaultier; Per M Gustafsson; Graham L Hall; Zoltán Hantos; Michael J R Healy; Marcus H Jones; Bent Klug; Karin C Lødrup Carlsen; Sheila A McKenzie; Francçois Marchal; Oscar H Mayer; Peter J F M Merkus; Mohy G Morris; Ellie Oostveen; J Jane Pillow; Paul C Seddon; Michael Silverman; Peter D Sly; Janet Stocks; Robert S Tepper; Daphna Vilozni; Nicola M Wilson Journal: Am J Respir Crit Care Med Date: 2007-06-15 Impact factor: 21.405
Authors: Paul A Harris; Robert Taylor; Robert Thielke; Jonathon Payne; Nathaniel Gonzalez; Jose G Conde Journal: J Biomed Inform Date: 2008-09-30 Impact factor: 6.317
Authors: R J Kurukulaaratchy; M H Fenn; L M Waterhouse; S M Matthews; S T Holgate; S H Arshad Journal: Clin Exp Allergy Date: 2003-05 Impact factor: 5.018
Authors: L Lowe; C S Murray; L Martin; J Deas; E Cashin; G Poletti; A Simpson; A Woodcock; A Custovic Journal: Arch Dis Child Date: 2004-06 Impact factor: 3.791
Authors: J Henderson; R Granell; J Heron; A Sherriff; A Simpson; A Woodcock; D P Strachan; S O Shaheen; J A C Sterne Journal: Thorax Date: 2008-08-04 Impact factor: 9.102
Authors: Priscila Weber; Ana Maria Baptista Menezes; Helen Gonçalves; Rogelio Perez-Padilla; Deborah Jarvis; Paula Duarte de Oliveira; Fernando César Wehrmeister Journal: ERJ Open Res Date: 2020-08-25
Authors: Ceyda Oksel; Raquel Granell; Sadia Haider; Sara Fontanella; Angela Simpson; Steve Turner; Graham Devereux; Syed Hasan Arshad; Clare S Murray; Graham Roberts; John W Holloway; Paul Cullinan; John Henderson; Adnan Custovic Journal: Ann Am Thorac Soc Date: 2019-07