Literature DB >> 35172984

Reference equations for pulmonary diffusing capacity using segmented regression show similar predictive accuracy as GAMLSS models.

Abstract

PURPOSE: To determine whether generalised additive models of location, scale and shape (GAMLSS) developed for pulmonary diffusing capacity are superior to segmented (piecewise) regression models, and to update reference equations for pulmonary diffusing capacity for carbon monoxide (DLCO) and nitric oxide (DLNO), which may be affected by the equipment used for its measurement.
METHODS: Data were pooled from five studies that developed reference equations for DLCO and DLNO (n=530 F/546 M; 5-95 years old, body mass index 12.4-39.0 kg/m2). Reference equations were created for DLCO and DLNO using both GAMLSS and segmented linear regression. Cross-validation was applied to compare the prediction accuracy of the two models as follows: 80% of the pooled data were used to create the equations, and the remaining 20% was used to examine the fit. This was repeated 100 times. Then, the root-mean-square error was compared between both models.
RESULTS: In males, GAMLSS models were 7% worse to 3% better compared to segmented regression for DLCO and DLNO. In females, GAMLSS models were 2% worse to 5% better compared to segmented linear regression for DLCO and DLNO. The Hyp'Air Compact measured DLNO and alveolar volume (VA) that was approximately 16-20 mL/min/mm Hg and 0.2-0.4 L higher, respectively, compared to the Jaeger MasterScreen Pro. The measured DLCO was similar between devices after controlling for altitude.
CONCLUSIONS: For the development of pulmonary function reference equations, we propose that segmented linear regression can be used instead of GAMLSS due to its simplicity, especially when the predictive accuracy is similar between the two models, overall. © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Keywords: equipment evaluations; lung physiology; respiratory measurement

Mesh：

Substances：

Year: 2022 PMID： 35172984 PMCID： PMC8852756 DOI： 10.1136/bmjresp-2021-001087

Source DB: PubMed Journal: BMJ Open Respir Res ISSN： 2052-4439

Do segmented (piecewise) regression provide similar predictive accuracy compared to more complicated generalised additive models of location, scale and shape (GAMLSS) that the Global Lung Function Initiative Network uses for pulmonary function reference equations? Segmented linear regression for pulmonary function show similar predictive accuracy as GAMLSS models. Furthermore, the pooled data from five previously published studies (total pooled subjects=1076) demonstrate that the Hyp'Air Compact device measured pulmonary diffusing capacity for nitric oxide (DLNO) and alveolar volume (VA) that was approximately 18 mL/min/mm Hg and 0.3 L higher, respectively, compared with the Jaeger MasterScreen Pro (CareFusion, Germany; now Vyaire Medical). However, it is not known which device measures DLNO and VA more correctly. The Hyp'Air Compact (Medisoft, Sorinnes, Belgium) measured DLNO and VA that was systematically higher than that of the Jaeger MasterScreen Pro. Thus, the two manufacturers should come together to resolve these between-machine differences. In the meantime, more comprehensive reference equations are updated here, accounting for the lung function testing device, where applicable.

Introduction

In 2000, a new approach for the development of reference equations for spirometry was described that allowed for a smooth transition between childhood and adulthood in a continuous fashion.1 This modelling technique prevented discontinuities between paediatric and adult reference equations at the transition point, preventing misinterpretation. This methodology was based on a semiparametric regression approach of generalised additive models for location, scale and shape (GAMLSS) and was discussed again in the same journal in 2008.2 GAMLSS allowed for age-related differences in between-subject variability, improving the definition of the lower limits of normal (LLN).2 Subsequently, in 2010, a European Respiratory Society (ERS) Task Force was created to create multiethnic, all-age reference equations for lung function for world use using GAMLSS models.3 This allowed for a single reference source that would be able to monitor patients from childhood into old age.3 As there were over 400 published reference equations describing healthy lung function changes with age, sex and height, professionals were left with a decision of which equation to use.4 Thus, the Global Lung Function Initiative (GLI) Network was created to address discrepancies in lack of standardisation.4 These new ‘Global’ reference equations, using healthy subjects’ data from around the world, were developed to model changes in lung size with age and height from childhood to adulthood.3 These complex growth patterns were modelled using GAMLSS that smoothed centile curves.5 Since 2012, the GLI Network produced three significant papers that were endorsed by the ERS and the American Thoracic Society and published in the European Respiratory Journal, which provided global reference equations for spirometry,6 pulmonary diffusing capacity for carbon monoxide (DLCO)7 8 and static lung volumes.9 Those articles presented reference equations using GAMLSS models. GAMLSS were introduced initially in 200510 and updated in 2018,11 allowing for a variety of smoothing functions. Besides pulmonary medicine, GAMLSS have been used in several fields such as exercise science,12 13 chemistry,14 hydrological science,15 genomics16 and psychology,17 to name a few; thus, GAMLSS models have pertinence across many disciplines. However, GAMLSS are highly complex and challenging to implement (ie, see www.gamlss.com). One needs to understand distributions of a variable (and its properties), then decisions need to be made regarding the distribution of the response variable, the choice of explanatory variables, the link function (ie, monotonic functions of the distribution parameters) and the amount of smoothing and random effects.15 Thus, the application of GAMLSS models estimates time-varying quantiles, which are distribution dependent, so the selection of a suitable distribution is important.15 As such, there is a sophisticated understanding of physiology, statistics and computer programming that is involved in producing a proper model using GAMLSS. However, segmented or ‘broken-line’ models are regression models that are simpler to use and should be the model of choice for the development of reference equations for lung function across the whole lifespan. Segmented regression is less complex, easier to comprehend and can be applied more readily applied as the formulas are easier to understand. Segmented regression allows for predictions to be made without experiencing discontinuities due to transitions from one prediction equation to the next. This is especially important in lung function prediction equations, in which one prediction equation is developed for children, and then another separate equation is developed for adults. Furthermore, once the equations are developed, a simple calculator can be used to obtain the predicted value without the use of splines. Fitting piecewise or segmented terms in regression models for pulmonary function use age as the non-linear covariate with two-line segments connected at one breakpoint.18 19 From visual observation, this breakpoint occurs somewhere around 20 years of age forced vital capacity (FVC), forced expiratory volume in one second (FEV1)18 20 and DLCO.21 Thus, it is the premise of this article to demonstrate segmented (piecewise) linear regression can be used more easily with similar prediction errors as GAMLSS models. We also believe that segmented regression models are more parsimonious compared with GAMLSS models, meaning that segmented regression could achieve goodness of fit using as few explanatory variables as possible. This reasoning comes from the idea of ‘Occam’s razor’, which says that the simplest explanation is probably correct. As such, the primary purpose of this study was to determine whether pulmonary diffusing capacity modelled using segmented linear regression with one breakpoint provides similar prediction accuracies as GAMLSS but without the use of complicated splines. It is our assumption that DLCO, pulmonary diffusing capacity for nitric oxide (DLNO) and alveolar volume (VA) could be modelled using separate segmented linear equations for each sex, which would be less complex compared to GAMLSS while providing similar prediction errors as GAMLSS. A secondary purpose was to update the pulmonary diffusing capacity prediction equations published by an ERS Task Force in 2017.22 Nearly 80% of the subjects used in the development of reference equations for the ERS task force in 2017 had pulmonary diffusing capacity measured by the Hyp'Air Compact device (Medisoft, Sorrines, Belgium). However, evidence suggests that the predicted DLNO is varied depending on the reference equation applied,23 24 which can be due to the different pulmonary function devices used between studies.25 Thus, with a much larger pooled dataset to draw on, we also sought to evaluate between device discrepancies.

Materials and methods

Five previous studies that developed reference equations for DLNO in white individuals without cardiopulmonary disease were pooled and used in this study.26–30 Institutional Review Board approval was not needed as the deidentified data were obtained from previously published work. Data from three separate studies26–28 were obtained from a 2017 ERS task force on the technical standards of DLNO22; another set was publicly available online,29 and the fifth dataset set was created based on the anthropometric characteristics of another paper.30 (Note: Munkholm et al30 declined to provide us with their data after multiple repeated attempts. As such, we created simulated data that was statistically tested to be similar to their data using a statistical method called truncation. The procedures on how this fifth dataset was created are discussed in the online supplementary material).

Segmented (piecewise) linear regression models

Reference equations were created for DLCO, DLNO and VA using the ‘R’ language environment (http://www.r-project.org). The ‘segmented’ package was originally developed in 2008,31 based on previous work on piecewise fitting of at least on breakpoint32 (V.1.3–4, April 2021) generated the segmented models.33 The covariate ‘age squared’ (Age2) was used to estimate the single breakpoint for the entire age range of the data (5–95 years of age). Based on a visual plot between age2 and either DLCO, DLNO or VA, an estimated starting value for the breakpoint is provided, and then an iterative procedure in R is used to estimate the breakpoint32 and the 95% CI of the breakpoint.34 Other covariates used in the models were height (cm) or height2, sex (1=male; 0=female), altitude (0–300 m), weight (kg) and the pulmonary function device. The brand of pulmonary function system was listed as a potential predictor of the model since there are discrepancies in DLNO depending on which equipment is used.35 The devices used to measure pulmonary diffusing capacity were the Jaeger MasterScreen PFT Pro (CareFusion, Hochberg, Germany), Jaeger Masterlab Pro (Erich Jaeger, Würzburg, Germany) with NO chemiluminescence (77AM, Eco Physics, Switzerland) and the Hyp'Air Compact device (Medisoft).

Generalised additive models of location, scale and shape

The GAMLSS models developed here are implemented in a series of CRAN packages in the R language environment and are currently available for download at http://wwwr-projectorg.10 The Lamda-Mu-Sigma (LMS) method of Cole and Green was applied as an extension of the normal distribution that adjusts for skewness5 and is embedded in GAMLSS. The LMS method is equivalent to Box-Cox Cole and Green distribution (BCCG), BCCG (µ, σ, υ) and parameters µ, σ, υ are the approximate median, approximate coefficient of variation and approximate skewness parameters of the distribution of the response variable.11 That is, µ controls the location, σ controls the scale and υ controls the skewness of the distribution as people grow and age.11 The complex effects for the predictor variables on the dependent variable were modelled using splines, which allow the dependent variable to vary smoothly (non-linearly) as a function of a predictor. Thus, a continuous, smooth fit over the entire age range can be obtained using splines. The goodness of fit was assessed by Akaike’s Information Criterion,36 Bayesian Information Criterion,37 Quantile–Quantile (Q–Q) plots38 and worm plots.39 The between-individual variability across age was assessed by obtaining the predicted SD divided by the predicted mean multiplied by 100. The predicted mean was determined by taking the median height at each age from the white US population40 and applying a zero altitude for each model. The predicted SD was the residual SD (RSD) obtained from the segmented linear regression models and the sigma value obtained from GAMLSS.

Prediction accuracy between models

To assess the prediction accuracy of the segmented linear regression and GAMLSS models, repeated random subsampling using the Holdout method was used that randomly sampled the complete dataset into two mutually exclusive subsets, a training set and a test set (also called a validation or Holdout set), repeated over several times.41 Eighty per cent of the pooled data was used to fit both models (training set), and then the fitted equation predicted the remaining 20% of the test subjects (validation set). This process was implemented for 100 replicates. The median, minimum, maximum and 95% CI of the root-mean-square error (ie, the square root of the average of the squared errors) from the 100 random samplings of the pooled data were compared between both models. The average correlation coefficients between each predicted value and the actual values obtained for 20% of the test data were also reported. The results of the repeated sampling would demonstrate whether GAMLSS or segmented linear regression models would be systematically favoured. The LLN for both models was chosen as the fifth percentile. The LLN is the value below which there is only a 5% probability that the value from a population is normal. This was calculated by subtracting from the model the product of the one-sided area under the curve and the equation’s RSD (–1.645 RSD).

Other analyses

Correlations were used to examine associations between variables. The GLI equations for DLCO7 8 were also used to compare DLCO and VA against both segmented linear regression and GAMLSS models. A 2×3 repeated measures analysis of variance (RmANOVA) compared fitted z-scores between the three different types of prediction models (segmented linear regression, GAMLSS and GLI GAMLSS) for DLCO and VA and the pulmonary function device used. A 2×2 RmANOVA did the same for DLNO. A Passing-Bablok linear regression42 and Bland-Altman Plots43 were used to examine the agreement of the LLN between models. To determine whether there was agreement in determining whether the measured value was below the LLN between models, a Kappa statistic was performed where 1 is less than the LLN and 0 ≥LLN. The strength of for the Kappa statistic was: ≤0.20 = none; 0.21–0.39=minimal, 0.40–0.59=weak; 0.61–0.80=moderate; ≥ 0.80–0.90 = strong; ≥0.90 almost perfect.44 Receiver-operating characteristic (ROC) analysis for evaluating performance DLCO, DLNO and VA between models was also examined. To classify the impairment in DLNO, DLCO and VA based on z-scores, a linear regression analysis was performed between the average per cent predicted for DLNO, DLCO and VA that correspond to the average fitted z-scores for both models. This would allow an examination of the variability in per cent predicted values matched to z-score classifications.

Patient and public involvement

Neither patients nor members of the public were involved in the design, conduct, reporting or dissemination of this research study.

Results

Pooled data from five studies were used to produce reference equations for DLCO, DLNO and VA.26–30 Age groups are displayed in figure 1 for a visual representation of the number of subjects in each age category. The five studies used three different pulmonary function machines. The numbers of subjects that were tested on each of these pulmonary function machines are presented in figure 2. The two Jaeger pulmonary function systems were combined into one pulmonary function system since there were no meaningful differences between them.

Figure 1

The pooled data used in the analysis display the number of subjects per age group. After removing outliers, 1076 subjects remained for analysis.

Figure 2

A representative breakdown of the pooled data and the equipment used in the development of reference equations for pulmonary diffusing capacity.

The pooled data used in the analysis display the number of subjects per age group. After removing outliers, 1076 subjects remained for analysis. A representative breakdown of the pooled data and the equipment used in the development of reference equations for pulmonary diffusing capacity. Outliers were screened and removed from the analysis. About 7% of the complete dataset was eliminated during initial screening, in which multiple linear regression models were used to examine studentised residuals. Any raw data point that had a studentised residual ≥3.0 was eliminated. There were a similar number of males and females with wide age ranges and heights, totalling 1076 never-smokers. Fractional age was not available in the datasets. As DLNO is minimally affected by haemoglobin concentration,45 DLNO was not adjusted for haemoglobin concentration. As well, DLCO was not adjusted for haemoglobin concentration since correcting for it does not improve the model fit for DLCO.7 There was a 2%–5% shared variance between breath-hold time and DLCO or DLNO (and no shared variance with VA). As such, breath-hold time was also not included as a covariate in the models. The subjects are presented in table 1.

Table 1

Pooled anthropometric data previously published studies from which reference equations were made26–30

	Males (n=546)	Females (n=530)	Combined (n=1076)
Age (years)	38 (23)(5 to 95)	38 (23)(5 to 95)	38 (23)(5 to 95)
Weight (kg)	68.3 (19.5)(18.1 to 110.0)	57.1 (15.0)(14.8 to 101.2)	62.8 (18.3)(14.8 to 110.0)
Height (cm)	170 (17)(105 to 200)	159 (14)(109 to 182)	165 (16)(105 to 200)
Body mass index (kg/m²)	23.0 (4.0)(14.0 to 35.5)	22.0 (3.9)(12.4 to 39.0)	22.5 (4.0)(12.4 to 39.0)
DLNO(mL/min/mm Hg)	138 (42)(36 to 235)	101 (27)(40 to 179)	120 (40)(36 to 235)
DLCO(mL/min/mm Hg)	29.3 (8.7)(8.5 to 49.9)	21.9 (5.7)(9.1 to 36.8)	25.7 (8.3)(8.5 to 49.9)
VA (L)	5.95 (1.71)(1.60 to 9.22)	4.52 (1.12)(1.70 to 7.50)	5.25 (1.62)(1.60 to 9.22)
KCO mL/min/mm Hg/L	5.0 (0.8)(2.1 to 7.2)	4.9 (0.8)(2.7 to 6.9)	5.0 (0.8)(2.1 to 7.2)
KNO mL/min/mm Hg/L	23.6 (4.0)(9.6 to 34.9)	22.6 (3.4)(10.8 to 31.5)	23.1 (3.7)(9.6 to 34.9)
DLNO/DLCO ratio	4.73 (0.56)(2.92 to 7.63)	4.63 (0.52)(2.64 to 6.98)	4.69 (0.54)(2.64 to 7.63)
Breath-hold time (s)	6.2 (1.4)(4.6 to 10.0)	6.2 (1.3)(4.8 to 10.0)	6.2 (1.3)(4.6 to 10.0)
Altitude of testing (m)	88 (114)(0 to 300)	86 (112)(0 to 300)	87 (113)(0 to 300)

*Mean (SD). Brackets represent ranges. The correlation (Spearman’s rho) between height and weight was 0.66 for females and 0.72 for males.

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume.

Pooled anthropometric data previously published studies from which reference equations were made26–30 *Mean (SD). Brackets represent ranges. The correlation (Spearman’s rho) between height and weight was 0.66 for females and 0.72 for males. DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume. Simulated raw data were created from the anthropometric characteristics of Munkholm et al,30 as that group was unwilling to provide us with the actual raw data. The simulated data represented 24% of the total data set and resembled the actual data (online supplemental tables S1, S2A, B, S3); thus, the simulated data were used in the overall analysis. Measured DLCO and measured DLNO were highly correlated with each other. The Jaeger MasterScreen Pro produced a correlation of 0.922 between DLCO and DLNO (R2=0.85), and the Hyp'Air Compact produced a correlation of 0.951 between DLCO and DLNO (R2=0.90) (combined R2 using both machines=0.87). For the Jaeger MasterScreen Pro, DLNO=4.20‧(DLCO)+8.42, (adjusted R2=0.85, p<0.001, with a residual SE=14.1 mL/min/mm Hg). The 95% CI of the slope 4.07 to 4.33. For the Hyp’Air Compact, DLNO=4.69‧(DLCO)+4.78, (adjusted R2=0.90, p<0.001, with a residual SE=11.9 mL/min/mm Hg). The 95% CI of the slope 4.54 to 4.85. Measured VA was correlated to measured DLCO (r=0.88, Jaeger MasterScreen Pro; r=0.80, Hyp’Air Compact). The DLNO to DLCO ratio was relatively stable from 5 to 95 years of age (online supplemental figure S1). However, the Jaeger MasterScreen Pro yielded an approximal 0.29 units lower ratio compared with the Hyp'Air Compact due to its systematically larger DLNO values, with DLCO values being relatively unchanged between machine types. Prediction equations for the DLNO to DLCO ratio were not developed as the pulmonary function testing device (6.6% shared variance), altitude (2.2% shared variance), age (1.3% shared variance) and sex (0.6% shared variance) accounted for only 10% of the total shared variance. Segmented linear reference equations and GAMLSS equations separated by sex are presented in tables 2 and 3. Segmented regression equations that include sex as a covariate are presented in online supplemental table S4. Weight was not a factor in any prediction equation since there was only a 1% shared variance between weight and DLCO or DLNO and 5% shared variance between weight and VA when controlling for height. The influence of the pulmonary function testing (PFT) device on DLCO was minor and therefore was not included in segmented reference equations. The Hyp'Air Compact PFT device produced an approximate 18 mL/min/mm Hg (15%) higher DLNO compared with the Jaeger MasterScreen Pro when all other variables were controlled for (online supplemental table S4). Controlling for all other variables, VA was found to be 0.76 L larger in men compared with females (online supplemental table S4). The Hyp'Air Compact PFT device was also found to produce a 0.28 L (5%) larger VA compared with the Jaeger MasterScreen PFT device. When standardising for the mean height (online supplemental table S10) and PFT device, both models show similar predicted values (figure 3A–C) and similar LLN (figure 4A–C).

Figure 3

Figure 4

(A) pulmonary diffusing capacity for carbon monoxide (DLCO) versus age at the lower limits of normal (LLN), (B) pulmonary diffusing capacity for nitric oxide (DLNO) versus age at the LLN, (C) alveolar volume (VA) versus age at the LLN. The various fitted curves/lines are based on the median height for age and sex in the white US population,40 an altitude of 0 m, and the Jaeger MasterScreen Pro equipment was used. Online supplemental table S10 in the supplement lists the heights with each age and sex. For DLCO and VA, the updated Global Lung Function Initiative (GLI) generalised additive models of location, scale and shape (GAMLSS) reference equations were included as a comparison.8 Notice that the (GLI) DLCO curves (grey in females, and purple in males) are lower compared to both GAMLSS and segmented regression models. The GLI GAMLSS prediction model is based on a 10 s breath-hold, which allows for a more homogenous inspired gas penetration in the lung, and thus a lower DLCO compared with the 5–6 s breath-hold manoeuvres. The segmented linear regression lines for DLNO and DLNO tend to show a lower LLN compared with the GAMLSS models, especially after 60 years of age for DLNO and after 80 years of age for DLCO.

(A) predicted pulmonary diffusing capacity for carbon monoxide (DLCO) versus age, (B) predicted pulmonary diffusing capacity for nitric oxide (DLNO) versus age, (C) predicted alveolar volume (VA) versus age. The various fitted curves/lines are based on the median height for age and sex in the white US population,40 an altitude of 0 m and the Jaeger MasterScreen Pro equipment was used. Online supplemental table S10 in the supplement lists the heights with each age and sex. For DLCO and VA, the updated Global Lung Function Initiative (GLI) generalised additive models of location, scale and shape (GAMLSS) reference equations were included as a comparison.8 Notice that the (GLI) DLCO curves (grey in females, and purple in males) are lower compared with both GAMLSS and segmented regression models. The GLI GAMLSS prediction model is based on a 10 s breath-hold, which allows for a more homogenous inspired gas penetration in the lung, and thus a lower DLCO compared with the 5–6 s breath-hold manoeuvres. The GAMLSS and segmented linear regression curves/lines for DLCO, DLNO and VA are comparable. (A) pulmonary diffusing capacity for carbon monoxide (DLCO) versus age at the lower limits of normal (LLN), (B) pulmonary diffusing capacity for nitric oxide (DLNO) versus age at the LLN, (C) alveolar volume (VA) versus age at the LLN. The various fitted curves/lines are based on the median height for age and sex in the white US population,40 an altitude of 0 m, and the Jaeger MasterScreen Pro equipment was used. Online supplemental table S10 in the supplement lists the heights with each age and sex. For DLCO and VA, the updated Global Lung Function Initiative (GLI) generalised additive models of location, scale and shape (GAMLSS) reference equations were included as a comparison.8 Notice that the (GLI) DLCO curves (grey in females, and purple in males) are lower compared to both GAMLSS and segmented regression models. The GLI GAMLSS prediction model is based on a 10 s breath-hold, which allows for a more homogenous inspired gas penetration in the lung, and thus a lower DLCO compared with the 5–6 s breath-hold manoeuvres. The segmented linear regression lines for DLNO and DLNO tend to show a lower LLN compared with the GAMLSS models, especially after 60 years of age for DLNO and after 80 years of age for DLCO. Reference equations using segmented regression For the PFT equipment, 1=Hyp’Air Compact, and 0=Jaeger Masterscreen. For example, for a man who is 26.9 years old with the same height and equipment used, the predicted alveolar volume (VA) (L)=0.0027‧(26.92)+0.06 ‧(180)+0.24–5.64=7.35 L with a lower limits of normal (LLN) of 7.35 – (0.46‧1.645)=6.59 L. For a man 27 years old, 180 cm tall, and who had the measurement performed on the Hyp’Air, the predicted VA (L) = –0.00013‧(272)+0.06‧(180)+0.24–3.61=7.34 L with the LLN=7.34 – (0.73‧1.645)=6.14 L. DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; RSE, residual SE. Reference equations using generalised additive models of location, scale and shape models Height is in cm, age in years; Machine=1 for Hyp’Air Compact and 0 for the Jaeger Masterscreen; lower limits of normal (fifth percentile)=exp(ln(M)+ln(1–1.645‧L‧S)/L); Per cent predicted = (measured/M)‧100; Z-score = ((measured value/M)L – 1)/(L·S); exp ()=natural exponential; ln()=natural logarithm; Mspline and Sspline correspond to the age-varying coefficients provided in the supplementary materials. Model is valid from ages 5–95 years of age and an altitude of 0–300 m. Note: If pulmonary diffusing capacity for carbon monoxide (DLCO) is measured at an altitude that is more than 300 m, we recommend converting the measured DLCO to sea level first, based on the data by Gray et al,80 and then omitting the altitude covariate from the equation (as the converted DLCO will be at an altitude of 0 m). Adjusted DLCO to sea level (mL/min/mm Hg)=measured DLCO at altitude·(0.505+0.00065·barometric pressure in mm Hg at altitude). The formula to estimate barometric pressure at altitude in mm Hg is: 760·exp(– 0.284·altitude in m / (8.314·Temperature in Kelvin)), where Kelvin = °C+273.15. (see: https://planetcalc.com/938/). DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume. Both segmented linear regression and GAMLSS models were fitted to the raw data (online supplemental table S5, online supplemental figures S2-S8). The mean (SD) z-scores of the fitted data in both segmented regression and GAMLSS models were alike. For DLCO and VA (online supplemental figures S6, S8), the fitted z-scores made by the DLCO and VA GLI GAMLSS reference equations7 were affected using the Hyp'Air Compact device. For DLNO, the fitted scores were similar between models and pulmonary function devices used. There were no GLI reference equations made for DLNO. Q–Q plots demonstrate that the fitted z-scores for DLNO, DLCO and VA can be approximated by a normal distribution in both models (online supplemental figures S4 and S5); however, there were some outliers remaining when the per cent predicted values were fitted to the segmented regression models (online supplemental figure S4). Between subject variability across age for (A) pulmonary diffusing capacity for carbon monoxide (DLCO) versus age, (B) pulmonary diffusing capacity for nitric oxide (DLNO) versus age, (C) alveolar volume (VA) versus age. The C.V. (%) = (predicted SD/predicted mean) ×100. The predicted mean and SD for each age were calculated using the median height of the US population at each age.40 The graph is also standardised for altitude (0 m). For DLCO and VA, the between subject variability unaffected by pulmonary function testing device. However, when measuring DLNO in females, those that were tested with the Hyp'Air Compact device showed a between subject variation of 14% compared with 10.5% in males throughout all ages (not shown). When using the Jaeger MasterScreen Pro, the between subject variation for DLNO was similar (panel B), 10.5%, females, 11.2% males). GAMLSS, generalised additive models of location, scale and shape. A correlational matrix of fitted z-scores between models shows strong associations in z-scores between models for DLNO and DLCO (online supplemental table S6). The predicted VA obtained from all models is highly associated with the measured VA (online supplemental table S7). The coefficient of variation between subjects was larger in the segmented regression models at <10 years of age for DLCO, DLNO and VA (figure 5). Segmented regression also had a larger variability for DLNO at >60 years of age (figure 5). The variability was greater in those <10 and >70 years of age when using the segmented regression models (figure 5).

Figure 5

Between subject variability across age for (A) pulmonary diffusing capacity for carbon monoxide (DLCO) versus age, (B) pulmonary diffusing capacity for nitric oxide (DLNO) versus age, (C) alveolar volume (VA) versus age. The C.V. (%) = (predicted SD/predicted mean) ×100. The predicted mean and SD for each age were calculated using the median height of the US population at each age.40 The graph is also standardised for altitude (0 m). For DLCO and VA, the between subject variability unaffected by pulmonary function testing device. However, when measuring DLNO in females, those that were tested with the Hyp'Air Compact device showed a between subject variation of 14% compared with 10.5% in males throughout all ages (not shown). When using the Jaeger MasterScreen Pro, the between subject variation for DLNO was similar (panel B), 10.5%, females, 11.2% males). GAMLSS, generalised additive models of location, scale and shape.

Both models had similar prediction accuracies (table 4). There was no clear model winner. Both models were comparable as the 95% CI of improvement overlapped zero for all cases. The average correlation coefficients of the predicted values associated with the actual values were similar between the two models (table 5).

Table 4

Prediction accuracy between both models

	GAMLSS models				Segmented linear regression				Per cent improvement(95% CI)
	AIC	BIC	Median	Range	AIC	BIC	Median	Range	Per cent improvement(95% CI)
Males
DLNO	4602	4649	17.7	15.3–20.5	4667	4697	17.4	15.2–19.9	−2% (−7% to 3%)
DLCO	2984	3053	4.0	3.4–4.8	3040	3070	3.9	3.3–4.7	−2% (−7% to 3%)
VA	959	1002	0.64	0.53–0.72	1082	1112	0.65	0.55–0.75	−2% (−4% to 8%)
Females
DLNO	4104	4151	12.1	10.0–13.8	4161	4191	12.3	10.1–14.0	2% (−1% to 5%)
DLCO	2538	2602	2.8	2.4–3.3	2608	2638	2.8	2.4–3.4	1% (−2% to 5%)
VA	670	717	0.50	0.39–0.59	788	818	0.50	0.41–0.61	4% (−1% to 9%)

Note: a better model fit is usually indicated by a lower Akaike information criterion (AIC) or Bayesian information criterion (BIC). Thus, it may seem that generalised additive models of location, scale and shape (GAMLSS) are a better fit to the data. However, notice that this may not be correct. Under the per cent of improvement column, a positive percentage suggests that GAMLSS is the better model, a negative percentage value suggests segmented linear regression is the better model. One can see that both models are comparable because the 95% CI of the per cent of improvement overlaps zero. The 95% CI was developed after 100 random samplings of 80% of the pooled data.

Under the Median and Range columns, the square root of the average of the squared errors is presented after 100 samplings of 80% of the pooled data.

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume.

Table 5

The correlation coefficients from the 100 samplings are compared between both models

	GAMLSS		Segmented regression
	Average	95% CI	Average	95% CI
Males
DLNO	0.91	0.88 to 0.93	0.91	0.88 to 0.94
DLCO	0.89	0.86 to 0.92	0.90	0.86 to 0.92
VA	0.93	0.91 to 0.95	0.92	0.90 to 0.94
Females
DLNO	0.89	0.85 to 0.92	0.89	0.85 to 0.92
DLCO	0.87	0.83 to 0.90	0.86	0.82 to 0.90
VA	0.89	0.85 to 0.93	0.88	0.84 to 0.92

Eighty per cent of the pooled data was sampled 100 times, and the remaining 20% was used to test the fit of each model 100 times.

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; GAMLSS, generalised additive models of location, scale and shape; VA, alveolar volume.

Prediction accuracy between both models Note: a better model fit is usually indicated by a lower Akaike information criterion (AIC) or Bayesian information criterion (BIC). Thus, it may seem that generalised additive models of location, scale and shape (GAMLSS) are a better fit to the data. However, notice that this may not be correct. Under the per cent of improvement column, a positive percentage suggests that GAMLSS is the better model, a negative percentage value suggests segmented linear regression is the better model. One can see that both models are comparable because the 95% CI of the per cent of improvement overlaps zero. The 95% CI was developed after 100 random samplings of 80% of the pooled data. Under the Median and Range columns, the square root of the average of the squared errors is presented after 100 samplings of 80% of the pooled data. DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume. The correlation coefficients from the 100 samplings are compared between both models Eighty per cent of the pooled data was sampled 100 times, and the remaining 20% was used to test the fit of each model 100 times. DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; GAMLSS, generalised additive models of location, scale and shape; VA, alveolar volume. There was a moderate agreement for DLCO and DLNO between both models (table 6). In the same vein, the Youden Index J (sensitivity+specificity–1) was determined from ROC analyses and described the overall diagnostic accuracy46 (table 7). Diagnostic accuracy was the highest for DLCO, then DLNO and then VA when segmented regression was compared against GAMLSS, given that the estimated prevalence of abnormal results (values below the LLN) is 5%. Another ROC analysis was performed comparing DLNO to DLCO when the estimated prevalence of abnormal results is 5% in a population (online supplemental table S8). All characteristics of the ROC curve were similar between models for DLCO and DLNO (online supplemental table S8).

Table 6

A breakdown of the percentage of subjects below the lower limits of normal (LLN), including the agreement between the two models for each variable

	DLCO	DLNO	VA
Number and percentage of the fitted data below the LLN (z score < –1.645)
GAMLSS	60 (5.7%)	81 (7.5%)	54 (5.0%)
Segmented linear regression	71 (6.6%)	57 (5.3%)	40 (3.7%)
Percentage below the LLN by age group
GAMLSS (5–49 years of age) (n=727)	42 (5.8%)	28 (3.9%)	37 (5.1%)
GAMLSS (50–95 years of age) (n=349)	18 (5.2%)	33 (9.5%)	17 (4.9%)
Segmented linear regression (5–49 years of age) (n=727)	53 (7.2%)	39 (5.4%)	31 (4.3%)
Segmented linear regression (50–95 years of age) (n=349)	18 (5.2%)	18 (5.2%)	9 (2.6%)
Agreement between the two models (Kappa statistic)	0.67 [0.57 to 0.76]	0.64 [0.54 to 0.74]	0.58 [0.46 to 0.70]

Agreement between models for each variable was determined by the Kappa statistic where 1 is less than the LLN and 0≥LLN. Strength of agreement: ≤0.20=none; 0.21–0.39=minimal, 0.40–0.59=weak; 0.61–0.80=moderate; ≥0.80–0.90=strong; ≥0.90 almost perfect.44 Brackets represent the 95% CI of the Kappa statistic.

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; GAMLSS, generalised additive models of location, scale and shape; VA, alveolar volume.

Table 7

Receiver-operating characteristic (ROC) analysis for evaluating the performance of both statistical models for pulmonary diffusing capacity for carbon monoxide (DLCO), pulmonary diffusing capacity for nitric oxide (DLNO) and alveolar volume (VA) when the estimated prevalence of an abnormal result in a population is 5% (ie, when 5% of the population is below the lower limits of normal (LLN))

	DLCO	DLNO	VA
Area under the ROC curve (AUC)	0.86 (0.84, 0.88)	0.81 (0.79, 0.83)	0.75 (0.73, 0.78)
Youden’s J statistic	0.72 (0.59, 0.82)	0.62 (0.50, 0.72)	0.51 (0.38, 0.64)
Sensitivity	0.75 (0.62, 0.85)	0.64 (0.51, 0.76)	0.52 (0.38, 0.66)
Specificity	0.97 (0.96, 0.98)	0.98 (0.97, 0.99)	0.99 (0.98, 0.99)
Positive predictive value	61% (51%, 70%)	66% (54%, 76%)	70% (56%, 81%)
Negative predictive value	99% (98%, 99%)	98% (97%, 99%)	98% (97%, 98%)
Positive likelihood ratio	29.3 (19.5, 44.0)	34.5 (22.0, 59.2)	44.2 (23.8, 82.0)
Negative likelihood ratio	0.26 (0.17, 0.40)	0.37 (0.26, 0.51)	0.49 (0.37, 0.64)

Youden’s J statistic (sensitivity+specificity–1): measures the effectiveness of using the segmented regression models as a diagnostic test compared with generalised additive models of location, scale and shape (GAMLSS).

Sensitivity (true positive rate): probability of an abnormal result (ie, DLCO, DLNO or VA

Specificity (true negative rate): probability of a normal test result (ie, DLCO, DLNO or VA≥LLN) as identified by segmented regression when GAMLSS also show a normal test result (≥LLN) in that same variable.

Positive predictive value (precision): probability of an abnormal result (

Negative predictive value: probability of a normal result (≥LLN) in one variable as identified by GAMLSS when segmented regression also show a normal result in that same variable.

Positive likelihood ratio (true positive rate ÷ false positive rate): the ratio between the probability of an abnormal result (

Negative likelihood ratio (false-negative rate ÷ true negative rate): the ratio between the probability of a normal test result identified by segmented regression (≥LLN) when there is an abnormal test result as identified by GAMLSS (

AUC, The per cent chance that when GAMLSS is used detect abnormal results (values

A breakdown of the percentage of subjects below the lower limits of normal (LLN), including the agreement between the two models for each variable Agreement between models for each variable was determined by the Kappa statistic where 1 is less than the LLN and 0≥LLN. Strength of agreement: ≤0.20=none; 0.21–0.39=minimal, 0.40–0.59=weak; 0.61–0.80=moderate; ≥0.80–0.90=strong; ≥0.90 almost perfect.44 Brackets represent the 95% CI of the Kappa statistic. DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; GAMLSS, generalised additive models of location, scale and shape; VA, alveolar volume. Receiver-operating characteristic (ROC) analysis for evaluating the performance of both statistical models for pulmonary diffusing capacity for carbon monoxide (DLCO), pulmonary diffusing capacity for nitric oxide (DLNO) and alveolar volume (VA) when the estimated prevalence of an abnormal result in a population is 5% (ie, when 5% of the population is below the lower limits of normal (LLN)) Youden’s J statistic (sensitivity+specificity–1): measures the effectiveness of using the segmented regression models as a diagnostic test compared with generalised additive models of location, scale and shape (GAMLSS). Sensitivity (true positive rate): probability of an abnormal result (ie, DLCO, DLNO or VA Specificity (true negative rate): probability of a normal test result (ie, DLCO, DLNO or VA≥LLN) as identified by segmented regression when GAMLSS also show a normal test result (≥LLN) in that same variable. Positive predictive value (precision): probability of an abnormal result ( Negative predictive value: probability of a normal result (≥LLN) in one variable as identified by GAMLSS when segmented regression also show a normal result in that same variable. Positive likelihood ratio (true positive rate ÷ false positive rate): the ratio between the probability of an abnormal result ( Negative likelihood ratio (false-negative rate ÷ true negative rate): the ratio between the probability of a normal test result identified by segmented regression (≥LLN) when there is an abnormal test result as identified by GAMLSS ( AUC, The per cent chance that when GAMLSS is used detect abnormal results (values The derived LLN obtained from segmented linear regression models was compared with the derived LLN from GAMLSS models (online supplemental figures S9-S14). There were systematic and proportional differences between models. The impairment in DLNO, DLCO and VA was classified based on z-scores (table 8). As the per cent predicted matched to the LLN (z-score = – 1.645) varies with age (online supplemental table S9), and throughout a wide range of z-score values (online supplemental figure S15A, B), the classification of impairment is best defined via the z-scores. However, the per cent predicted value along with its variability is also provided in table 8 as it not only may be more intuitive than z-scores, but it can be an easier way for clinicians to assess the severity of a pulmonary function abnormality.

Table 8

	Severe decrease	Moderate decrease	Mild decrease	Normal	Increased
Suspected or prior evidence of lung disease
z-score	−5.01 and below	−5.00 to −3.51	−3.50 to −1.65	−1.645 to +1.645	>+1.645
% Predicted	≤41 (3)%	42 (3) to 59 (3)%	60 (3) to 80 (3)%	81 (3) to 119 (3)%	>119 (3)%
Screening and case finding purposes only
z-score	−5.01 and below	−5.00 to −3.51	−3.50 to −1.961	−1.96 to +1.96	>+1.96
% Predicted	≤41 (3)%	42 (3) to 59 (3)%	60 (3) to 77 (3)%	78 (3) to 123 (3)%	>123 (3)%

The z-scores should ultimately be used for the classification of diffusion impairment or low alveolar volume. The advantage of using z-scores to define the lower limits of normal (LLN) (as opposed to per cent predicted) is that the z-scores apply to all populations. However, the per cent (%) predicted may be more intuitive than z-scores, and the % predicted may be an easier way for clinicians to assess the severity of a pulmonary function abnormality. Nevertheless, there is a large SD of 2.8% (rounded to 3%) for each per cent predicted value that is matched to each z-score category, and the SD is depicted within the parentheses. The LLN is normally at the fifth percentile (z = –1.645). However, if interpreting multiple related lung function tests, there is an increase in false-positive rates when using the fifth percentile.81 As such, the LLN at a z-score of −1.96 is recommended for screening and case-finding purposes.6

Classification of impairment in pulmonary diffusing capacity for nitric oxide, pulmonary diffusing capacity for carbon monoxide and alveolar volume, using the modified one-step NO–CO technique (4–6 s breath-hold manoeuvres) The z-scores should ultimately be used for the classification of diffusion impairment or low alveolar volume. The advantage of using z-scores to define the lower limits of normal (LLN) (as opposed to per cent predicted) is that the z-scores apply to all populations. However, the per cent (%) predicted may be more intuitive than z-scores, and the % predicted may be an easier way for clinicians to assess the severity of a pulmonary function abnormality. Nevertheless, there is a large SD of 2.8% (rounded to 3%) for each per cent predicted value that is matched to each z-score category, and the SD is depicted within the parentheses. The LLN is normally at the fifth percentile (z = –1.645). However, if interpreting multiple related lung function tests, there is an increase in false-positive rates when using the fifth percentile.81 As such, the LLN at a z-score of −1.96 is recommended for screening and case-finding purposes.6

Discussion

GAMLSS have been used by the GLI Network to develop reference equations for lung function for the world to use,6–9 but they are too complicated to implement (see online supplemental table S11 for a worked example). The first purpose of this study was to examine the accuracy of complicated GAMLSS models compared with simpler segmented (piecewise) linear regression models when developing reference equations for pulmonary diffusing capacity. We showed that segmented regression models are comparable to GAMLSS models in terms of prediction accuracy (tables 4 and 5). When identifying subjects below the LLN, there was a 61% and 66% true positive rate for DLCO and DLNO, respectively, when segmented regression was compared with GAMLSS, for which the estimated prevalence of abnormal results is 5% (table 7). When evaluating reference equations for lung function indices, there are limited studies comparing regression to GAMLSS. All the comparisons involve comparing FVC and FEV1 between models, and none compared pulmonary diffusing capacity. Martinez-Briseňo et al47 compared spirometric reference equations between similar models and determined that while GAMLSS displayed a slightly better fit overs multiple linear regression, they were minimal. Brisman et al48 used a piecewise regression approach as discussed by Lubiński and Gólczewski,49 and that the mean square errors of the models were similar to GAMLSS developed the GLI. In a follow-up study by Brisman et al,50 they further determined that segmented linear regression should be used for the development of spirometric reference equations as the GLI GAMLSS equations identified too few subjects below the LLN.50 Kubota et al also compared multiple linear regression against GAMLSS for FVC and FEV1 in Japanese subjects.51 In that study, they claimed that their GAMLSS models more accurately reflected the transition in pulmonary function during young adulthood. However, they did not provide any information on prediction accuracy between models, nor did the study include children, and there was no real transition between adolescence and adulthood. Therefore, the results of this current study are particularly novel as we show similarity in prediction errors DLCO, DLNO and VA between GAMLSS and segmented linear regression. Nevertheless, the Q–Q plots for per cent predicted generated by GAMLSS demonstrate a better fit to the normal distribution compared with segmented regression at the extreme ends of the plot. The Q–Q plot for VA, for example, shows that when the observed values are ≥140% predicted, the expected normal value is much different; hence about 12 values deviate off the linear line (online supplemental figure S4). Similarly, there are 1–2 subjects for DLCO and DLNO in which the expected normal value was much different compared with the observed per cent predicted values. In comparison, there were no subjects that strayed off the per cent predicted Q–Q plot line when GAMLSS were used for DLNO, DLCO or VA, even at the extreme ranges (online supplemental figure S5). However, these instances are rare (≤1% of the subject pool), and when comparing models (table 4), the overall prediction accuracies were similar. As the validity of different reference sets for DLNO has been questioned,23 24 the second purpose of this study was to update predictions equations from the ERS 2017 Technical standards document22 based on more available data so that between-machine comparisons could be verified. We confirmed that the Hyp'Air Compact measured DLNO values that were larger than the Jaeger MasterScreen Pro by 16–20 mL/min/mm Hg (13%–16%) (online supplemental table S4). These data agree with another study that demonstrated similar findings, although the differences between devices were slightly larger, at 22–26 mL/min/mm Hg (17%).25 The slightly lower difference between devices observed in the current study is because our models include children, and their study did not. This study pooled all the available reference sets for DLNO that were published in the literature for white subjects from Europe27–30 and North America26 and confirmed a systematic increase in DLNO when the Hyp'Air Compact PFT device was used. The pooled data also demonstrate a 0.2–0.4 L (6%–8% larger VA when the Hyp'Air Compact was used, which is slightly smaller than the between machine differences from Radtke et al.35 The discrepancy estimating VA and the rates of alveolar uptake for nitric oxide per unit time and pressure (KNO) between the two systems could explain the discrepancy in DLNO and VA between devices.35 Furthermore, as the Jaeger MasterScreen Pro uses a demand valve, whereas the Hyp'Air Compact uses a reservoir bag from which the mixture of gases is inspired, this would alter the expired to inspired nitric oxide ratio. The results presented here are concerning since the lung function testing device is now an important covariate to consider when measuring DLNO and VA. A 2017 ERS Task Force Report on the standardisation of DLNO22 presented reference equations based on pooled data of three studies. However, about 75% of the pooled data from those three studies were based on DLNO data determined by the Hyp'Air Compact PFT system26–28; yet 36% of the current pooled data was determined by the Hyp'Air Compact device. Thus, the results present a more balanced view of the between device findings, and we have updated the prediction equations here. This study did not determine which pulmonary function testing device was more accurate, only that the two devices were different. For us to determine which is a more accurate device, a comparison would have to be made against a gold standard device. Chemiluminescence NO analysers are considered the gold standard of NO analysers, but it is highly costly. Even so, van der Lee et al used a nitric oxide chemiluminescence analyser (along with the Jaeger Masterlab Pro system) in its development of reference equations for DLNO.28 Our analysis showed no meaningful differences between DLNO measured by van der Lee et al28 versus the studies that used the Jaeger MasterScreen PFT Pro with the NO electrochemical cell.30 However, both the Jaeger Masterlab Pro system (with NO chemiluminescence) and Jaeger MasterScreen PFT Pro displayed lower DLNO values than the Hyp'Air Compact system.26 27 This would suggest that (1) either the Jaeger MasterScreen PFT Pro provides more accurate diffusing capacity values or (2) the software calculations provided by Jaeger were different compared with the calculations of the Hyp'Air Compact device. We also examined agreement between models using a kappa statistic and a ROC analysis. The kappa statistic showed moderate agreement between models for DLCO and DLNO and a weak agreement for VA (table 6). When comparing against GAMLSS, segmented regression demonstrated ≥97% specificity (true negative rate) when the prevalence of an abnormal result in a population is 5% (ie, when 5% of the population is below the LLN). Moreover, when compared against GAMLSS, segmented regression was able to identify 75% of abnormal results for DLCO, 64% of abnormal results for DLNO, and 52% of abnormal results for VA, considering the prevalence of abnormal results in a population is 5%. This is termed the true positive rate. The precision between both models was between 61% and 70%. That is, the probability that an actual abnormal result (ie, Segmented (piecewise) regression makes a series of assumptions: linearity (the relationship between X and the mean of Y is linear), homoscedasticity (the variance of the residual is the same for any value of X), independence (observations are independent of each other) and normality (for any fixed value of X, Y is normally distributed). Overall, there was linearity (table 2, (online supplemental table S4), homoscedasticity (online supplemental figure S2), independence (each subject is tested only once) and normality (online supplemental figure S4). Still, the Q–Q plot for the per cent predicted VA from segmented regression is not perfect; it has about 10 outliers (online supplemental figure S4).

Table 2

Reference equations using segmented regression

	Estimate	SE	95% CI	Adjusted R²	RSE
DLCO, females (n=530) (mL/min/mm Hg)Breakpoint=24.3 (95% CI 22.7 to 25.8) years old
Intercept₁ (for 5.0–24.2 years old)	−11.82	1.87	-15.5 to 8.1	0.76	2.12₁
Intercept₂ (for 24.3–95.0 years old)	−1.54				3.13₂
Age²₁ (for 5.0–24.2 years old)	0.01534	0.00183	0.012 to 0.019
Age²₂ (for 24.3–95.0 years old)	−0.0018	0.000081	−0.002 to −0.002
Height (cm)	0.183	0.014	0.156 to 0.210
Altitude (m)	0.0041	0.0012	0.002 to 0.006
DLCO, males (n=546) (mL/min/mm Hg)Breakpoint=22.7 (95% CI 21.2 to 24.0) years old
Intercept₁ (for 5.0–22.6 years old)	−15.22	2.3	−19.7 to 10.7	0.80	2.62₁
Intercept₂ (for 22.7–95.0 years old)	2.5				4.35₂
Age²₁ (for 5.0–22.6 years)	0.0323	0.0038	0.025 to 0.039
Age²₂ (for 22.7–95.0 years old)	−0.00246	0.00011	−0.009 to −0.008
Height (cm)	0.206	0.017	0.173 to 0.239
Altitude (m)	0.0041	0.0016	0.001 to 0.007
DLNO, females (n=530) (mL/min/mm Hg)Breakpoint=22.6 (95% CI 20.6 to 24.5) years old
Intercept₁ (for 5.0–22.5 years old)	−66.43	8.4	-82.9 to 50.0	0.79	8.60₁
Intercept₂ (for 22.6–95.0 years old)	−30.74				13.63₂
Age²₁ (for 5.0–22.5 years old)	0.0616	0.01	0.042 to 0.082
Age²₂ (for 22.6–95.0 years old)	−0.00832	0.00034	−0.028 to 0.011
Height (cm)	0.947	0.063	0.824 to 1.070
PFT equipment	15.17	1.31	12.6 to 17.7
DLNO, males (n=546) (mL/min/mm Hg)Breakpoint=22.2 (95% CI 20.7 to 23.5) years old
Intercept₁ (for 5.0–22.1 years old)	−87.15	10.5	-107.7 to 66.6	0.83	11.81₁
Intercept₂ (for 22.2–95.0 years old)	−14.02				19.25₂
Age²₁ (for 5.0–22.1 years old)	0.1375	0.018	0.103 to 0.173
Age²₂ (for 22.2–95.0 years old)	−0.012	0.00048	−0.013 to −0.011
Height (cm)	1.086	0.08	0.93 to 1.24
PFT equipment	18.00	1.83	14.4 to 21.6
VA, females (n=530) (L)Breakpoint=30.3 (95% CI 28.0 to 32.4) years old
Intercept₁ (for 5.0–30.2 years old)	−4.16	0.32	−4.8 to −3.5	0.80	0.39₁
Intercept₂ (for 30.3–95.0 years old)	−2.79				0.58₂
Age²₁ (for 5.0–30.2 years old)	0.00132	0.00017	0.001 to 0.002
Age²₂ (for 30.3–95.0 years old)	−0.00018	0.00002	−0.0002 to −0.0001
Height (cm)	0.05	0.0023	0.045 to 0.055
PFT equipment	0.2545	0.054	0.15 to 0.36
VA males (n=546) (L).Breakpoint=27.0 (95% CI 25.1 to 28.8) years old
Intercept₁ (for 5.0–26.9 years old)	−5.64	0.36	−6.4 to −4.9	0.86	0.46₁
Intercept₂ (for 27.0–95.0 years old)	−3.61				0.73₂
Age²₁ (for 5.0–26.9 years old)	0.00265	0.0003	0.002 to 0.003
Age²₂ (for 27.0–95.0 years old)	−0.00013	0.00002	−0.0003 to −0.0001
Height (cm)	0.060	0.0026	0.055 to 0.065
PFT equipment	0.241	0.07	0.11 to 0.37

For the PFT equipment, 1=Hyp’Air Compact, and 0=Jaeger Masterscreen. For example, for a man who is 26.9 years old with the same height and equipment used, the predicted alveolar volume (VA) (L)=0.0027‧(26.92)+0.06 ‧(180)+0.24–5.64=7.35 L with a lower limits of normal (LLN) of 7.35 – (0.46‧1.645)=6.59 L. For a man 27 years old, 180 cm tall, and who had the measurement performed on the Hyp’Air, the predicted VA (L) = –0.00013‧(272)+0.06‧(180)+0.24–3.61=7.34 L with the LLN=7.34 – (0.73‧1.645)=6.14 L.

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; RSE, residual SE.

Establishing categories on diffusion impairment based solely on per cent predicted values, as reported back in 2005,58 is not appropriate. The LLN as expressed as a percentage of the predicted value changes with age for several different lung function indices, such as FEV16, FVC6 and DLCO,7 and we have demonstrated this to be true for DLCO, DLNO and VA (online supplemental table S9). As such, the z-scores should be used to define the severity of diffusion impairment (table 8). Nevertheless, the per cent predicted value along with its variability is also provided in table 8 as it may be more intuitive than z-scores, and it could be an easier way for clinicians to assess the severity of a pulmonary function abnormality. In the past, an upper limit of normal (ULN) was not formally established for spirometry6 because high values are not clinically meaningful. A ULN for DLCO was also not established when the 2017 GLI DLCO reference equations were published.7 Nonetheless, abnormally high DLCO values may be pathologic, even though they are rare. In those rare cases where high values are seen (ie, pulmonary haemorrhage, polycythaemia, obesity, asthma),59–64 the DLCO test is not the standard for diagnosis.7 Even so, we believe that there is a role for the ULN moving forward, and we incorporate it here. An increase in DLNO and DLCO above the ULN may not be pathologic; an increase could be caused by a negative intrathoracic pressure during breath-hold (Müeller manoeuvre),65 66 or lung size may be very large. In fact, most nationally ranked swimmers have DLNO and DLCO above the ULN,67 which are strongly associated with their large lung volumes. Regular swimming training throughout childhood and adolescence may have aided in the development of larger lungs,68–70 although this remains controversial.71 There also may be a selection bias such that those with larger lungs become good swimmers. Furthermore, given an apparent association of DLNO and DLCO with cardiorespiratory fitness,72 setting a ULN could identify individuals with supra-normal diffusing capacities. As such, we agree with Quanjer et al,6 in that for those individuals suspected of lung disease, an LLN of the fifth percentile (z = –1.645) should be used; and if lung function testing is for screening and fact-finding purposes only, a value of the 2.5th and 97.5th percentile should be used (z-scores of ±1.96). Nonetheless, this classification of diffusion impairment in table 8 does not necessarily correlate with symptomatology, mortality and/or morbidity. There are reasons why the shared variance between DLNO and DLCO is not 100%. Approximately 70%–80% of the barrier to carbon monoxide uptake resides within the red cell (ie, red cell resistance), while the remaining 25% or so is in the alveolar membrane (see figure 1 elsewhere22). In contrast, the main barrier to NO uptake resides between the alveolar and red cell membranes (about 60%) (ie, membrane resistance).73 Thus, DLNO is better represented by gas transfer through the alveolar-capillary membrane compared with DLCO, and DLNO is more affected by changes in lung volume.73 Thus, DLNO provides a more sensitive evaluation of fibrotic changes in the lung compared with DLCO, and DLCO provides a more sensitive evaluation of pulmonary vascular disorders than DLNO. Unlike DLCO, DLNO is relatively unaffected by changes in haemoglobin concentration45 or carboxyhaemoglobin concentration.74 From the pooled data in this study of non-diseased subjects, 88% of the variance in DLNO is shared by DLCO yet DLNO z-scores share about 39%–47% of the variance in DLCO z-scores (online supplemental table S6). Indeed, it seems logical that measuring DLNO and DLCO together would provide a better assessment of a patient’s pulmonary condition than measuring either one of them on its own since approximately 53%–61% of the total variance between the fitted DLCO z-scores and fitted DLNO z-scores are not shared. Moreover, the fact that there is a low true positive rate and a low positive predictive value between DLNO and DLCO when the prevalence of an abnormal result is 5% further demonstrates that DLNO and DLCO measure different things, even though there is considerable overlap (online supplemental table S8). Regardless of whether segmented linear regression or GAMLSS models are used in predicting DLNO and DLCO, there is only a 38%–42% probability that when DLNO is abnormal ( The current GLI DLCO reference equations7 8 and the reference equations updated here for DLNO, DLCO and VA are for white subjects only. As there are slight but essential differences in DLNO,75 76 DLCO77 78 and VA between various ethnic groups,75 76 it is crucial to develop multiethnic reference equations79. For example, lung disease could be overdiagnosed by about 8% in the black population if reference equations for white subjects were used.75 This false-positive misdiagnosis could increase patient stress, and healthcare resources would be extended, resulting in a higher cost for a non-illness.75 In conclusion, when developing pulmonary function reference equations, we propose that segmented (piecewise) linear regression can be used instead of GAMLSS due to its simplicity, especially when overall prediction errors are similar between the two types of models. Still, the Q–Q plots of observed versus expected per cent predicted reveals a better fit to the normal distribution when GAMLSS models are used, but only at the upper end of per cent predicted (ie, ≥140% predicted), and these were rare occurrences. These reference equations for DLNO, DLCO and VA developed here are robust and should be used moving forward for any clinical assessment that uses the NO–CO double diffusion technique and breath-hold time of about 6 seconds. Since the Hyp'Air Compact device measures DLNO and VA that is systematically higher than that of the Jaeger MasterScreen Pro, we urge the two manufacturers to come together to resolve these differences.

Table 3

Reference equations using generalised additive models of location, scale and shape models

	M=mu, median	(S)=sigma, coefficient of variation, which explains the variability around median	L, lamda, which is the index of skewness
Females (n=530)
DLCO (mL/min/mm Hg)	exp(– 4.481+1.406‧ln(height)+0.194‧ln(age)+0.0002‧altitude+Mspline)	exp(0.642‧ln(age) – 1.018·ln(height)+Sspline)	0.325
DLNO (mL/min/mm Hg)	exp(– 3.777+0.144‧machine+1.510‧ ln(height)+0.3405‧ln(age)+Mspline)	0.1053 for Jaeger Masterscreen,0.1401 for Hyp’Air	0.836
VA (L)	exp(– 8.323+0.060‧machine+1.842‧ ln(height)+0.1705‧ln(age)+Mspline)	exp(–0.616‧ln(height)+0.2485‧ln(age))	0.577
Males (n=546)
DLCO (mL/min/mm Hg)	exp(– 5.163+1.500‧ln(height)+0.3507‧ ln(age)+0.0002‧altitude+Mspline)	exp(8.365+0.914‧ln(age) –2.503·ln(height)+Sspline)	0.632
DLNO (mL/min/mm Hg)	exp(– 4.339+0.138‧machine+1.617‧ ln(height)+0.410‧ln(age)+Mspline)	Exp(0.230‧machine – 2.191)	1.113
VA (L)	exp(– 9.443+0.0569‧machine+2.076‧ ln(height)+0.169‧ln(age)+Mspline)	0.1016	0.0635

Height is in cm, age in years; Machine=1 for Hyp’Air Compact and 0 for the Jaeger Masterscreen; lower limits of normal (fifth percentile)=exp(ln(M)+ln(1–1.645‧L‧S)/L); Per cent predicted = (measured/M)‧100; Z-score = ((measured value/M)L – 1)/(L·S); exp ()=natural exponential; ln()=natural logarithm; Mspline and Sspline correspond to the age-varying coefficients provided in the supplementary materials. Model is valid from ages 5–95 years of age and an altitude of 0–300 m. Note: If pulmonary diffusing capacity for carbon monoxide (DLCO) is measured at an altitude that is more than 300 m, we recommend converting the measured DLCO to sea level first, based on the data by Gray et al,80 and then omitting the altitude covariate from the equation (as the converted DLCO will be at an altitude of 0 m). Adjusted DLCO to sea level (mL/min/mm Hg)=measured DLCO at altitude·(0.505+0.00065·barometric pressure in mm Hg at altitude). The formula to estimate barometric pressure at altitude in mm Hg is: 760·exp(– 0.284·altitude in m / (8.314·Temperature in Kelvin)), where Kelvin = °C+273.15. (see: https://planetcalc.com/938/).

DLCO, pulmonary diffusing capacity for carbon monoxide; DLNO, pulmonary diffusing capacity for nitric oxide; VA, alveolar volume.

69 in total

1. Reference ranges for spirometry across all ages: a new approach.

Authors: Sanja Stanojevic; Angie Wade; Janet Stocks; John Hankinson; Allan L Coates; Huiqi Pan; Mark Rosenthal; Mary Corey; Patrick Lebecque; Tim J Cole
Journal: Am J Respir Crit Care Med Date: 2007-11-15 Impact factor: 21.405

2. Physical fitness reference standards in Italian children.

Authors: Filippo Vaccari; Federica Fiori; Giulia Bravo; Maria Parpinel; Giovanni Messina; Rita Malavolta; Stefano Lazzer
Journal: Eur J Pediatr Date: 2021-01-28 Impact factor: 3.183

3. Standardisation and application of the single-breath determination of nitric oxide uptake in the lung.

Authors: Gerald S Zavorsky; Connie C W Hsia; J Michael B Hughes; Colin D R Borland; Hervé Guénard; Ivo van der Lee; Irene Steenbruggen; Robert Naeije; Jiguo Cao; Anh Tuan Dinh-Xuan
Journal: Eur Respir J Date: 2017-02-08 Impact factor: 16.671

4. Of the need to reconcile discrepancies between two different reference equations for combined single-breath D _LNO-D _LCO in systemic sclerosis.

Authors: Thomas Radtke; Thông Hua-Huy; Holger Dressel; Anh Tuan Dinh-Xuan
Journal: Eur Respir J Date: 2019-04-25 Impact factor: 16.671

5. Reference values for spirometry, including vital capacity, in Japanese adults calculated with the LMS method and compared with previous values.

Authors: Masaru Kubota; Hirosuke Kobayashi; Philip H Quanjer; Hisamitsu Omori; Koichiro Tatsumi; Minoru Kanazawa
Journal: Respir Investig Date: 2014-05-06

6. Probability plotting methods for the analysis of data.

Authors: M B Wilk; R Gnanadesikan
Journal: Biometrika Date: 1968-03 Impact factor: 2.445

7. Anthropometric Reference Data for Children and Adults: United States, 2015-2018.

Authors: Cheryl D Fryar; Margaret D Carroll; Qiuping Gu; Joseph Afful; Cynthia L Ogden
Journal: Vital Health Stat 3 Date: 2021-01

8. Smoothing reference centile curves: the LMS method and penalized likelihood.

Authors: T J Cole; P J Green
Journal: Stat Med Date: 1992-07 Impact factor: 2.373

Review 9. The Global Lung Function Initiative (GLI) Network: bringing the world's respiratory reference values together.

Authors: Brendan G Cooper; Janet Stocks; Graham L Hall; Bruce Culver; Irene Steenbruggen; Kim W Carter; Bruce Robert Thompson; Brian L Graham; Martin R Miller; Gregg Ruppel; John Henderson; Carlos A Vaz Fragoso; Sanja Stanojevic
Journal: Breathe (Sheff) Date: 2017-09

10. Comparison of four algorithms on establishing continuous reference intervals for pediatric analytes with age-dependent trend.

Authors: Kun Li; Lixin Hu; Yaguang Peng; Ruohua Yan; Qiliang Li; Xiaoxia Peng; Wenqi Song; Xin Ni
Journal: BMC Med Res Methodol Date: 2020-06-01 Impact factor: 4.615