Literature DB >> 35299255

Reference Curves for Pediatric Endocrinology: Leveraging Biomarker Z-Scores for Clinical Classifications.

Andre Madsen¹, Bjørg Almås¹, Ingvild S Bruserud^2,3, Ninnie Helen Bakken Oehme³, Christopher Sivert Nielsen^4,5, Mathieu Roelants⁶, Thomas Hundhausen^7,8, Marie Lindhardt Ljubicic⁹, Robert Bjerknes^3,10, Gunnar Mellgren^1,10,11, Jørn V Sagen^1,10, Pétur B Juliusson^3,10,12, Kristin Viste¹.

Abstract

CONTEXT: Hormone reference intervals in pediatric endocrinology are traditionally partitioned by age and lack the framework for benchmarking individual blood test results as normalized z-scores and plotting sequential measurements onto a chart. Reference curve modeling is applicable to endocrine variables and represents a standardized method to account for variation with gender and age.
OBJECTIVE: We aimed to establish gender-specific biomarker reference curves for clinical use and benchmark associations between hormones, pubertal phenotype, and body mass index (BMI).
METHODS: Using cross-sectional population sample data from 2139 healthy Norwegian children and adolescents, we analyzed the pubertal status, ultrasound measures of glandular breast tissue (girls) and testicular volume (boys), BMI, and laboratory measurements of 17 clinical biomarkers modeled using the established "LMS" growth chart algorithm in R.
RESULTS: Reference curves for puberty hormones and pertinent biomarkers were modeled to adjust for age and gender. Z-score equivalents of biomarker levels and anthropometric measurements were compiled in a comprehensive beta coefficient matrix for each gender. Excerpted from this analysis and independently of age, BMI was positively associated with female glandular breast volume (β = 0.5, P < 0.001) and leptin (β = 0.6, P < 0.001), and inversely correlated with serum levels of sex hormone-binding globulin (SHBG) (β = -0.4, P < 0.001). Biomarker z-score profiles differed significantly between cohort subgroups stratified by puberty phenotype and BMI weight class.
CONCLUSION: Biomarker reference curves and corresponding z-scores provide an intuitive framework for clinical implementation in pediatric endocrinology and facilitate the application of machine learning classification and covariate precision medicine for pediatric patients.

Entities: Chemical

Keywords: biomarker; machine learning; pediatric endocrinology; references

Mesh：

Substances：
Biomarkers

Year: 2022 PMID： 35299255 PMCID： PMC9202734 DOI： 10.1210/clinem/dgac155

Source DB: PubMed Journal: J Clin Endocrinol Metab ISSN： 0021-972X Impact factor: 6.134

Reproductive hormone references for evaluating blood test results in pediatric patients are essential during clinical investigations of a wide range of conditions including hypo/hypergonadism, differences of sex development (DSDs), and neoplastic and autoimmune conditions that compromise endocrine function. Such pathologies may be associated with abnormal somatic development and altered puberty timing. On the population level, female onset of puberty has decreased by 3 months per decade since the 1970s and has appeared to continue to decline (1). With the secular trend of earlier puberty timing, particularly observed in girls, pertinent references applied in pediatric endocrinology should periodically be updated. Further, the association between childhood obesity and earlier puberty timing warrants quantitative benchmarking and further attention (2, 3). During childhood and puberty, circulating levels of hormones and biochemical markers frequently vary considerably with both gender and age. Typically, awakening of the adrenal cortex (adrenarche) precedes attainment of pubic hair (pubarche) and gonadal function (gonadarche) (4). Biochemical reference intervals are fundamental tools to evaluate samples and secure correct diagnosis and treatment. In pediatric endocrinology this necessitates appropriate adjustment or stratification by the major covariates of age, gender, and puberty stage. The well-established and widely applied nonparametric method imposes arbitrary partitioning of age groups to define a series of central 95% CIs in table format (5). However, when assigning a pediatric patient to such predetermined age partitions, the corresponding reference interval will not account for the fact that biochemical observations within most such partitions are likely to exhibit age-dependent skewness, conforming to a non-Gaussian distribution. Further, the Clinical and Laboratory Standards Institute (CLSI) C28-A3c standard upholds that clinically valid reference intervals should ideally be sourced from at least 120 observations (6). In this regard, ethical limitations make it notoriously challenging to recruit cohorts of healthy children to establish sufficiently powered and comprehensive pediatric references (7-9). Notably, the Canadian Laboratory Initiative on Pediatric Reference Intervals (CALIPER) and Nordic Reference Interval Project (NORIP) have previously established comprehensive ranges by nonparametric partitioning (10, 11) and quantile regression (12). Conventional growth charts are ubiquitously used in clinical practice to benchmark the gross anthropometric status of pediatric patients. At the heart of this framework is the LMS method, originally described by Cole and Green (13, 14). Notably, the LMS framework is adopted in most contemporary national growth references, including the growth charts provided by the World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC) (15, 16). Briefly, the LMS algorithm applies a Box-Cox data transformation and uses 3 parameters to account for the skewness (L), mean (M) and coefficient of variation (S) for each local distribution, effectively providing nonlinear adjustment of age while negating heteroscedasticity (eg, increasing variance with age). Final LMS models enable calculation of standard deviation scores (z-scores) that are typically adjusted for the main covariates age and gender. Such z-scores are also centered at zero (ie, mean for age) and uniformly scaled in terms of SDs and normally distributed, that is, properties that make for ideal input variables in statistical modeling and machine learning (ML). Briefly, supervised ML is an artificial intelligence (AI) method by which an algorithm captures the configuration of several independent feature variables (eg, a biochemical profile) in relation to one dependent variable (eg, a known dichotomy of “disease” or “healthy”) in order to make new and robust predictions (17). The Bergen Growth Study 2 (BGS2) was conducted in 2016 and has provided new anthropometric puberty references for the Norwegian pediatric population (18, 19) and simple nonparametric hormone references (20, 21). In the current study, we aimed to provide a comprehensive set of LMS gender-specific references curves for 17 pediatric biomarkers. We hypothesized that biomarker z-scores may be useful to quantify associations between hormone levels, pubertal status, and weight class, and thus enable clinical classifications irrespective of patient gender or age. Furthermore, we hypothesized that pediatric overweight may be associated with an altered endocrine profile that, in particular, may be characterized by increased levels of estrogens due to increased adipose tissue aromatase activity. The current reference curves have not been published previously but were recently used to benchmark biomarker z-scores in an unrelated cohort of children exposed to metformin in utero due to maternal polycystic ovary syndrome (22).

Materials and Methods

Cohort Description

The Bergen Growth Study 2 (BGS2) was conducted in 2016 and comprised a population sample of Norwegian children that was representative of the general Norwegian demographic composition, consisting of approximately 90% Caucasians as described previously (20, 21). Exclusion criteria in the BGS2 cohort included self-reported chronic disease or a medical history of cancer or epilepsy. Data describing puberty status, anthropometric profile, and biochemical data were available for 650 healthy girls and 465 healthy boys in the age interval from 6 to 16 years. The BGS2 total participation rate, that is, ratio of children invited and children enrolled in the study was 43%. The Norwegian Fit Futures 1 youth study was conducted in 2010 and 2011 to benchmark public health parameters pertaining to lifestyle choices, bone health, and inflammation as described previously (23, 24). All 1117 first year high-school students in Tromsø and Balsfjord municipalities were invited and 1038 participated, yielding a response rate of 93%. From this cohort, previously unpublished data pertaining to steroid hormone levels in girls and boys aged 15 to 18 years was used in the current study. An overview of the current sample sizes and applied exclusion criteria is provided (Table 1). Data regarding puberty status in the Fit Futures 1 cohort was self-reported and did not comply with Tanner staging performed in the BGS2 cohort; these observations were accordingly not included in the current references that were arranged by puberty stages.

Table 1.

Cohort sample overview

Sample description	BGS2 cohort (6 to <16 y)		Fit Futures cohort (15-18 y)
Gender	Boys	Girls	Boys	Girls
Unique blood samples, n	451	650	509	486
Excluded due to chronic disease, n	25	27	8	0
Excluded due to oral contraceptives, n	0	12	0	167
Excluded due to corticosteroid use, n	7	10	10	6
Viable blood samples for references, n	414	601	491	319

The current study included observations sourced from the 2 population-based samples of Norwegian children and adolescents enrolled in the Bergen Growth Study 2 (BGS2) and Fit Futures 1 cohorts. Indicated numbers of girls and boys were enrolled and the following exclusion criteria were applied: self-reported history of chronic disease or cancer (excluded from all biomarker references); use of oral contraceptives (excluded from all female biomarker references); use of corticosteroid medication (excluded from cortisol references). The number of viable blood samples used in the current references excludes serum samples that were discarded due to hemolysis or insufficient blood draw volume.

Cohort sample overview The current study included observations sourced from the 2 population-based samples of Norwegian children and adolescents enrolled in the Bergen Growth Study 2 (BGS2) and Fit Futures 1 cohorts. Indicated numbers of girls and boys were enrolled and the following exclusion criteria were applied: self-reported history of chronic disease or cancer (excluded from all biomarker references); use of oral contraceptives (excluded from all female biomarker references); use of corticosteroid medication (excluded from cortisol references). The number of viable blood samples used in the current references excludes serum samples that were discarded due to hemolysis or insufficient blood draw volume. Both written parental consent and child assent was required for any examination, and sourcing biochemical data from the biobanks was approved by Norwegian Regional Committees for Medical and Health Research Ethics (approval references REK-2015/128 for BGS2 and REK-2017/1976 for Fit Futures, respectively).

Methods

Puberty Evaluation Protocols

For girls in the BGS2 cohort, the depth and diameter of the fibroglandular area was systematically measured with ultrasound in each girl’s left breast in a sagittal plane, unless the right breast visually appeared more mature. Methodological documentation of the ultrasonographic measurement of glandular depth and diameter was described previously (25). Briefly, breasts were palpated and evaluated according to Tanner’s classification (26). Ultrasound evaluation of the largest breast was performed with SonoSite Edge (FUJIFILM SonoSite, USA) device with a 15-16 MHz (5 cm) linear transducer. Two consecutive scans were performed and merged when the diameter was 5 to 10 cm, and the measurements were summed. Diameters above 10 cm were not measured or included in analyses. The more mature breast, according to Tanner B or ultrasound breast staging, was used for the analyses. To calculate glandular volume, the formula for a conical shape was applied: volume = (π/3) × radius × 2 × depth. This mathematical formula for a conical shape has also been used by others (27, 28). Using Tanner stages as the gold standard marker of thelarche (Tanner 1 vs Tanner B2+), the optimal cutoff to classify puberty onset in girls corresponded to 0.5 mL of glandular breast tissue volume, and this threshold exhibited a positive predictive value of 60.4%, negative predictive value of 98.6%, and accuracy of 85.2%. Methodological documentation for ultrasound evaluation of male testicular volume and its mathematical relation to conventional orchidometer milliliters was detailed previously (29). Briefly, the dimensions of the biggest testicle were recorded, and the ellipsoid volume was calculated using Lambert equation (length × width × depth × 0.71). The Norwegian growth chart describing male testicular volume-for-age was sourced from the BGS2 cohort and recently published (19). Anthropometric body mass index (BMI), waist circumference, and subscapular triceps z-scores assigned to participants in the current study were interpolated from the Norwegian national growth charts according to gender and age (30).

Blood Sample Analyses

Venous blood samples were collected with both parental and child consent if the child was younger than 16 years of age, and with consent of the person if he/she was older than 16 years. Blood samples were collected between 8 am and 2 pm in both studies. Isolated serum was stored in registered biobanks at −80 °C prior to analyses. All biomarkers were analyzed in the standard international (SI) unit framework at the Hormone Laboratory, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, accredited in compliance with ISO 15189:2012. Androgens and corticosteroids were analyzed by liquid chromatography–tandem mass spectrometry (LC-MS/MS) multiplex method as described previously (31). For testosterone, the analytical inter-assay coefficient of variation (CVA) was 4% in the range 1.5 to 37 nmol/L and the lower limit of quantification (LLOQ) was 0.02 nmol/L. For BGS2 samples, serum levels of estradiol (E2) were quantified using an ultrasensitive LC-MS/MS method documented previously (32). Here, the E2 analytical inter-assay CVA was 9.1% in the range 1.7 to 153.3 pmol/L and the LLOQ was 0.58 pmol/L. In the Fit Futures cohort, E2 levels were determined by an LC-MS/MS method with intermediate sensitivity (CVA 13% at 57 pmol/L; range, 13-2508 pmol/L; LLOQ 13 pmol/L). The 2 methods of E2 determination are traceable to the CRM BCR-576, and no significant bias was detected between the 2 methods when biological samples were run in parallel (R2 = 0.96; average difference = 1.7 % and t test P = 0.053). Estrogen level data from the 2 cohorts were hence merged without mathematical adjustments. Follicle-stimulating hormone (FSH; CVA 5% at 5 IU/L; LLOQ 0.1 IU/L), luteinizing hormone (LH; CVA 7% at 10 IU/L; LLOQ 0.1 IU/L), sex hormone-binding globulin (SHBG; CVA 6% at 60 nmol/L; LLOQ 2 nmol/L) and insulin-like growth factor 1 (IGF1; CVA 7% at 18 nmol/L; LLOQ 4 nmol/L) were quantified in BGS2 serum samples using Siemens Immulite 2000 XPi. Enzyme-linked immunosorbent assay kits were used to quantify serum leptin (Mediagnost Cat# E07, RRID: AB_2813737) and adiponectin (Mediagnost Cat# E09, RRID: AB_2813736) in serum samples from the BGS2 cohort. Inter-assay CVA was determined to 5% at 8.3 µg/L leptin (LOQ 1-100 µg/L) and 8% at 14 µg/mL adiponectin (LOQ 0.6-31 µg/mL). Levels of HDL cholesterol (CVA 3% at 1.9 mmol/L), LDL cholesterol (CVA 2.5% at 3.4 mmol/L), total cholesterol (CVA 3% at 4.4 mmol/L) and triglycerides (CVA 3% at 1.5 mmol/L) in BGS2 serum samples were quantified by Cobas 8000.

Hormone Reference Intervals

Biomarker reference curves were modeled using the LMS method provided in the “gamlss” package in R (33). No outliers were removed outside cohort exclusion criteria. The combined triplet of values assigned for L, M, and S enables calculation of z-scores adjusted for gender and age by the following formula: (((X/M)^L)-1)/(L*S) where X is the relevant blood test result in SI units. All LMS models in the current study are provided in Supplemental Table 1 (34). Quality assurance and satisfactory residual distribution of LMS references was assured by QQ-plots, worm-plots, and Q-tests for normality of each reference model. The computational script used to perform the above LMS operations is available in R code (35). Traditionally partitioned and nonparametric 95% reference intervals for all biomarkers were established by bootstrapping and Dixon’s outlier removal using the “referenceIntervals” package in R (36) and are provided in table format in Supplemental Table 2 (37). Partitioning of the reference ranges was determined according to CLSI guidelines (6). Girls using oral contraceptives were not included in any reference intervals, and children using glucocorticoid medication were not included in cortisol and 11-deoxycortisol references, specifically.

Statistical Analyses

Correlation matrices were computed using the Pearson method with the “reshape” and “ggplot2” packages in R. The P values for the correlation matrices are provided in Supplemental Table 3 (38). Total variance in the biomarker z-score dataset was explored by principal component analysis (PCA), using the “prcomp” and “ggbiplot” functions in R as described previously (21). Supervised machine learning (ML) to predict weight class (BMI-SDS ≥ 1 or BMI-SDS ≤ −1) from all featured biomarker variables was performed by establishing a “randomForest” model and evaluating the resulting confusion matrix using the “caret” package in R. The pipeline script used to perform the above operations is available in R code (39). Complete observations for all biomarker z-scores were available for 122 boys and 172 girls from the BGS2 cohort and combined to one data frame, from which the random forest model was trained using 75% of the data and tested using 25% of the remaining and unseen data with 10-fold cross-validation. Receiver operating characteristics (ROC) curves were constructed using the “pROC” package in R to evaluate the ability of single biomarkers to distinguish between the weight classes specified above. ROC accuracy was calculated as (true negatives + true positives)/all classification outcomes.

Results

Continuous Hormone References

We combined data from the 2 Norwegian cohorts of healthy children and adolescents and modeled circulating steroid hormone levels in girls and boys in relation to chronological age using the LMS growth chart algorithm (Fig. 1). Additional biomarkers analyzed exclusively in the BGS2 cohort included peptide hormones and lipids (Fig. 2). The reference curves showed in Fig. 1 and Fig. 2 are provided as supplementary information and enable anyone to calculate biomarker z-scores adjusted for gender and age. Notably, observations located on the mean-for-age centile have a z-score of 0, corresponding to the 50th percentile.

Figure 1.

Figure 2.

Continuous biomarker reference curves. Biomarker levels quantified in serum samples from the BGS2 cohort were modeled as reference curves using the LMS algorithm. Male (left column panels a, c, e, g, i, k) and female (right column panels b, d, f, h, j, l) references were modeled separately. Abbreviations: FSH, follicle-stimulating hormone; IGF1, insulin-like growth factor 1; LH, luteinizing hormone; SHBG, sex hormone-binding globulin; y, chronological age in years.

Continuous steroid hormone reference curves. Steroid hormone levels in individuals enrolled in the Bergen Growth Study 2 (black dots) and Fit Futures cohort (green dots) were quantified by LC-MS/MS. Male (left column panels a, c, e, g, i, k) and female (right column panels b, d, f, h, j, l) references were modeled separately for indicated hormones. Continuous centiles indicating the mean for age (p50, solid lines) and discrete SDs from the mean (dashed lines) were fitted using the LMS algorithm. The −2 and + 2 SD curves correspond to percentiles p2.2 and p97.8, respectively, and the vertical range between these centiles approximate the 95% CI at any age. Abbreviations: 11-DOC, 11-deoxycortisol; 17-OHP, 17-hydroxyprogesterone; y, chronological age in years. Continuous biomarker reference curves. Biomarker levels quantified in serum samples from the BGS2 cohort were modeled as reference curves using the LMS algorithm. Male (left column panels a, c, e, g, i, k) and female (right column panels b, d, f, h, j, l) references were modeled separately. Abbreviations: FSH, follicle-stimulating hormone; IGF1, insulin-like growth factor 1; LH, luteinizing hormone; SHBG, sex hormone-binding globulin; y, chronological age in years.

Age-adjusted Associations Between Endocrine, Pubertal, and Anthropometric Variables

From the current biomarker reference curves (Fig. 1 and Fig. 2), age-adjusted z-scores were calculated for each cohort participant. Combining these biomarker z-scores with conventional anthropometric z-scores, we next calculated the Pearson correlation between all variables, according to gender (Fig. 3). Hence, the provided correlation coefficients are standardized beta coefficients that specify relationships between all variables in terms of SD and irrespective of age. Sample sizes for these analyses were 552 to 995 girls and 419 to 910 boys, since puberty endpoints were not included in the Fit Futures dataset. No correlation was observed between z-scores and chronological age, indicating successful adjustments for age. In boys, both total testosterone and LH z-scores were positively associated with testicular volume-for-age (β = 0.4 and P < 0.001 for both). In girls, LH, FSH, and IGF1 z-scores were positively associated with E2 (β = 0.5 to 0.6, respectively; P < 0.001 for both). BMI z-scores associated positively with male testicular volume-for-age (β = 0.2 and P < 0.001) and female glandular breast tissue volume-for-age (β = 0.5 and P < 0.001) but negatively with SHBG in both genders (β = −0.4 and P < 0.001 for both). Strong β coefficients were observed between obesometric variables and the same was also true for related biomarkers in the steroid hormone synthesis pathway, eg, between estrone (E1) and estradiol (E2).

Figure 3.

Standardized β coefficient matrices for puberty development, anthropometry, and hormone profile. Age-adjusted z-scores derived from anthropometric LMS growth charts and the current biomarker LMS reference curves were correlated to obtain standardized beta coefficients that describe relationships between all variables. To exemplify the readout, 1 SD score increase in BMI incurs a 0.6 SD score increase in circulating levels of leptin, regardless of age. Testicular volume-for-age z-scores were included in the top (a) male matrix and corresponding z-scores for female glandular tissue volume-for-age were included in the bottom (b) female matrix. Standardized β coefficients were calculated as the linear regression (Pearson r) between pairwise z-scores and colored according to the indicated heatmap scale. Complete statistical analyses including β coefficient P values are available in Supplemental Table 3. (38).

Endocrine Features of Pubertal Phenotypes

A refined analysis of hormone profile in relation to pubertal phenotypes was achieved by stratifying the BGS2 cohort according to attainment of pubarche or central puberty onset at the time of examination (Table 2). Specifically, participants were grouped according to attainment of pubic hair (Tanner pubic hair stage PH2) and/or canonical markers of pubertal onset for boys (testicular volume ≥ 4 mL) and girls (Tanner breast stage B2+ ie, thelarche). The earliest and latest occurrences of puberty onset were between 10 and 13 years for boys and between 8 and 12 years for girls. Boys exhibiting central pubertal onset without pubarche had significantly higher z-scores of LH, total testosterone, and testicular volume than prepubertal peers with or without pubarche. Girls exhibiting gonadarche but no pubarche had significantly higher z-scores of FSH, E2, and glandular breast tissue volume than prepubertal peers without pubarche. Cohort participants presenting with both pubarche and gonadarche exhibited pubertal and endocrine z-scores markedly above the mean for age.

Table 2.

Baseline characteristics of male and female puberty phenotypes

Male baseline characteristics (puberty onset age range, 10-13 years)
Boys, ages 10-13	TV < 4 mL; PH1	TV ≥ 4 mL; PH1	TV < 4 mL; PH2+	TV ≥ 4 mL; PH2+
Sample size, n	69	23	20	37
Attained testicular vol. ≥ 4 mL, %	0%	100%	0%	100%
Attained pubic hair ≥ PH2, %	0%	0%	100%	100%
Age, y	10.74 (10.07 to 12.54)	11.82 (10.37 to 12.92)	11.64 (10.36 to 12.78)	12.47 (11.01 to 12.93)
Testicular volume, z-score	−0.55 (−1.84 to 1.01)	0.79 (−0.69 to 3.13)	−0.72 (−1.82 to 0.71)	0.34 (−0.95 to 2.30)
LH, z-score	−0.75 (−1.97 to 1.44)	0.81 (−0.93 to 1.99)	−0.30 (−1.95 to 1.92)	0.54 (−0.55 to 2.39)
FSH, z-score	−0.25 (−2.39 to 1.35)	0.07 (−0.86 to 1.93)	0.16 (−1.66 to 1.42)	0.27 (−1.43 to 1.77)
Testosterone, z-score	−0.37 (−1.59 to 0.85)	0.19 (−1.10 to 2.92)	−0.46 (−1.43 to 1.34)	0.64 (−0.66 to 2.29)
Female baseline characteristics (puberty onset age range, 8-12 years)
Girls, ages 8-12	No thelarche; PH1	Thelarche; PH1	No thelarche; PH2+	Thelarche; PH2+
Sample size, n	92	28	11	35
Attained breasts ≥ Tanner B2, %	0%	100%	0%	100%
Attained pubic hair ≥ PH2, %	0%	0%	100%	100%
Age, y	9.23 (8.08 to 11.67)	10.46 (8.58 to 12.30)	9.93 (8.15 to 11.26)	11.31 (9.90 to 11.95)
Glandular tissue volume, z-score	−0.49 (−2.09 to 1.24)	0.68 (−1.01 to 1.82)	−0.13 (−1.61 to 1.44)	1.16 (−0.86 to 1.96)
LH, z-score	0.20 (−1.95 to 1.90)	0.06 (−1.54 to 2.34)	−0.41 (−1.39 to 1.85)	1.05 (−2.05 to 1.85)
FSH, z-score	−0.15 (−2.06 to 1.59)	0.53 (−0.72 to 2.41)	−0.05 (−1.44 to 1.14)	0.53 (−1.49 to 1.84)
E₂, z-score	−0.40 (−2.55 to 1.21)	0.42 (−1.26 to 2.64)	−0.12 (−1.61 to 1.24)	0.88 (−1.73 to 2.03)

Participants in the BGS2 cohort were stratified by differential puberty phenotypes at the time of examination, and the resulting sample sizes and baseline characteristics are presented as median (p2.5 to p97.5). The earliest and latest occurrences of puberty onset in the dataset, defined by attainment of 4 mL orchidometer testicular volume (boys) or Tanner stage B2 (girls), were set as respective age limits for this stratification analysis. Abbreviations: E2, estradiol; FSH, follicle-stimulating hormone; LH, luteinizing hormone; PH, Tanner pubic hair stage; SDS, z-score measured in SD from the mean for age; US, ultrasound; y, years.

Baseline characteristics of male and female puberty phenotypes Participants in the BGS2 cohort were stratified by differential puberty phenotypes at the time of examination, and the resulting sample sizes and baseline characteristics are presented as median (p2.5 to p97.5). The earliest and latest occurrences of puberty onset in the dataset, defined by attainment of 4 mL orchidometer testicular volume (boys) or Tanner stage B2 (girls), were set as respective age limits for this stratification analysis. Abbreviations: E2, estradiol; FSH, follicle-stimulating hormone; LH, luteinizing hormone; PH, Tanner pubic hair stage; SDS, z-score measured in SD from the mean for age; US, ultrasound; y, years.

Leveraging Biomarker Z-Scores for Clinical Classifications

We initially hypothesized that weight classes would associate with differential biomarker profiles, and we pursued this hypothesis as a classification problem. In order to characterize the biomarker profile and explore the utility of z-scores in clinical classifications, we applied a principal component analysis (PCA) to “fingerprint” the biomarker profiles associated with BMI weight class. This analysis included the biomarker z-score profile for 154 “underweight” (BMI z-score ≤ −1.0) and all 140 “overweight” (BMI z-score ≥ 1.0) girls and boys in the BGS2 study. In addition to biomarker z-scores, female glandular tissue-for-age and male testicular volume-for-age were included to evaluate associations between gonadal development and weight class. The resulting PCA biplot showed a partially distinct clustering of underweight and overweight biomarker z-score profiles (Fig. 4). From the previous beta coefficients matrices (Fig. 2), we observed a positive and association between BMI and circulating levels of leptin for both genders (β = 0.6 and P < 0.001). In line with this finding, the PCA factor analysis (arrows) indicated that leptin, along with SHBG and IGF1, were important biomarkers of weight class. Subsequent ROC analyses verified that classification of overweight was achieved by leptin (88.8% accuracy), SHBG (75.2% accuracy), and IGF1 (69.4% accuracy), respectively. As a proof-of-concept for weight class prediction using the entire biomarker z-score profile, we trained a supervised machine learning (ML) model using the decision tree–based “random forest” algorithm. By evaluating only biomarker z-scores, this ML classification model was able to predict BMI weight class with an accuracy of 94.5% (95% CI, 86.6% to 98.5%), as shown in the classification table (Table 3).

Figure 4.

Association between biomarker levels and weight class. Dimension reduction by principal component analysis (PCA) was applied to 17 biomarkers and puberty status in terms of testicular volume or glandular tissue volume in 154 underweight (BMI-SDS ≤ −1.0) and 140 overweight (BMI-SDS ≥ 1.0) boys and girls. Directional contribution of individual variables to dataset variance is shown in the biplot in relation to clusters for underweight (red dots) and overweight (blue dots) BMI weight classes. The 1.5 SD confidence ellipses define each weight class cluster in terms of the dataset variance.

Table 3.

Classification of BMI weight class by applying machine learning to the biomarker profile

		Reference
		Underweight	Overweight
Prediction	Underweight	37	3
	Overweight	1	32

Biomarker z-scores from 294 underweight (BMI z-score ≤ −1.0) and ‘overweight’ (BMI z-score ≥ 1.0) children were included in the analysis, and the random forest decision tree classification model was trained using 75% of the data prior to prediction of BMI weight class in the remaining 25% unseen data shown in the current confusion matrix. Classification performance of the ML model exceeded that of any individual biomarkers of BMI weight class. A satisfactory measure of classification agreement was estimated for the ML model: Cohen’s kappa of 0.89 (95% CI, 0.74-0.89).

Classification of BMI weight class by applying machine learning to the biomarker profile Biomarker z-scores from 294 underweight (BMI z-score ≤ −1.0) and ‘overweight’ (BMI z-score ≥ 1.0) children were included in the analysis, and the random forest decision tree classification model was trained using 75% of the data prior to prediction of BMI weight class in the remaining 25% unseen data shown in the current confusion matrix. Classification performance of the ML model exceeded that of any individual biomarkers of BMI weight class. A satisfactory measure of classification agreement was estimated for the ML model: Cohen’s kappa of 0.89 (95% CI, 0.74-0.89). Association between biomarker levels and weight class. Dimension reduction by principal component analysis (PCA) was applied to 17 biomarkers and puberty status in terms of testicular volume or glandular tissue volume in 154 underweight (BMI-SDS ≤ −1.0) and 140 overweight (BMI-SDS ≥ 1.0) boys and girls. Directional contribution of individual variables to dataset variance is shown in the biplot in relation to clusters for underweight (red dots) and overweight (blue dots) BMI weight classes. The 1.5 SD confidence ellipses define each weight class cluster in terms of the dataset variance.

Discussion

A critical function of clinical laboratories is to construct and maintain updated biochemical references to guide medical decision making. Establishing suitable references for the pediatric population is especially challenging due to ethical, practical, and regulatory impediments to the recruitment of healthy blood donors while also keeping up to date with the secular trend of earlier pubertal onset. The current and widely adopted nonparametric method to construct reference intervals for arbitrarily partitioned age groups may not satisfactorily capture the age-dependent trends that are continuous in nature. Compared with the continuous distributions and centiles obtained by the LMS method, nonparametric reference intervals would necessarily require partitioning of age groups (eg, 6 to < 9 years; 9 to < 13 years, and so on) into discrete distributions. Furthermore, there is a demand to device new reference frameworks that enable quantitative precision medicine and integrate with AI approaches to improve clinical investigations and patient treatment strategies. Although growth curves were implemented decades ago and have become standardized tools in pediatrics, the potential for more advanced clinical utilization of this framework remains unexplored, particularly with respect to nonanthropometric variables. Although there are several examples of AI tools implemented to enhance radiologic predictions of disease and bone age in pediatrics (40, 41) and biochemistry profiles for hematologic disease in adults (42), equivalent progress has not been made to enable computer-aided diagnosis in pediatric endocrinology. The current study applied the conventional “LMS” growth chart framework to model reference curves for 17 biomarkers in the pediatric population. Serum levels of several of these biomarkers increase by powers of 10 throughout puberty and are challenging to resolve in age-partitioned reference intervals. Precedence for using the semiparametric LMS algorithm to model anthropometric parameters of pediatric development is provided in the official WHO, CDC, and national growth charts worldwide. Furthermore, some previous publications demonstrate the successful application of this framework applied to steroid sex hormones (43, 44) and the biomarker IGF1 (45, 46). Importantly, disclosing the L, M, and S parameters enables health personnel elsewhere to implement the relevant reference curves and calculate z-scores according to their patients’ gender and age. Reference curves in the current study describe the gender-specific and age-dependent variation of 17 different biomarkers, and these have been made available in Supplemental Table 1 (34). The references presented in the current manuscript were generated from a population sample representative of the general Norwegian demography, corresponding to approximately 89% Caucasian, 6% Asian, 3% African, and 0.5% Hispanic according to the latest census update (47). Although our use of LC-MS/MS and common mainstream commercial instruments to quantify biomarkers may provide more robust generalization to other laboratories, the current references should be interpreted with caution and validated prior to clinical implementation elsewhere, and also with respect to other ethnicities. Diurnal variation was not accounted for in the current reference curves but decreasing in hormone levels throughout the day should be considered in clinical practice, particularly with respect to cortisol and testosterone where we observed significantly higher hormone levels in morning samples (before 10:00) than afternoon samples (after 10:00) in pubertal children. Statistical tests were performed according to CLSI guidelines to determine whether stratification of reference ranges according to time of blood draw was warranted, and appropriately stratified nonparametric reference intervals accounting for sample time of day in the current study are provided in the Supplemental Table 2 (37). Further, cyclical hormone variation should be considered when sampling gonadotropins, estradiol, 17-hydroxyprogesterone, and 4-androstenedione in girls. For girls exhibiting a regular menstruation cycle in the Fit Futures cohort, we were able to partition reference intervals by both menstrual cycle week and use of oral contraceptives, and for menarcheal patients we therefore recommend consulting the reference limits provided as supplemental information. The clinical utility of anthropometric growth charts is integral in pediatric practice, and we propose that this statistical framework is also applicable to biomarkers in pediatric endocrinology. While nonparametric hormone references represent clinical cutoff values and primarily provide a qualitative indication as to whether the patient is within or outside the reference interval, LMS models enable quantitative benchmarking and longitudinal tracking of patients’ biomarker levels in terms of z-scores. This feature was recently leveraged to quantify endocrine abnormalities in a pediatric cohort that were exposed to metformin in utero due to treatment of maternal polycystic ovary syndrome (22). Applying reference curves to monitor individual patients over time may also be useful for evaluating pediatric endocrinopathies, differences of sex development, and general follow-up. In this respect, the steroid hormone 11-deoxycortisol is an integral biomarker of congenital adrenal hyperplasia due to 11-hydroxylase deficiency, and a general biomarker of virilization, hirsutism, and further used in other diagnostic contexts of suspected Cushing disease or adrenal insufficiency (48). We were unfortunately not able to quantify levels of dehydroepiandrosterone (DHEA) or DHEA sulfate (DHEAS) in the current study. Biomarker z-scores are subject to the same considerations of biological (CVI) and analytical (CVA) variation as results denoted in absolute concentration units. The “critical difference” threshold obtained by calculating reference change value (RCV) by the classical formula [RCV = 21/2 × Z × (CVA2 + CVI2)1/2] defines whether or not a new sample result (eg, during longitudinal tracking) should be regarded as a significant patient change. The “critical difference” obtained by multiplying the absolute value of the previous sample with the RCV percentage can be readily converted to equivalent z-scores using the LMS formula outlined in the “Methods” section. By application of the current reference curves and calculation of multiple z-scores for each cohort participant, we were able to parameterize a quantitative profile of biochemical and anthropometric measures that may enable personalized medicine. The standardized correlations between biomarker and anthropometric z-scores in Fig. 3 are novel and meaningful, because it is otherwise not feasible to obtain the absolute unit correlation between 2 variables (eg, testicular volume measured in mL, and serum LH measured in IU/L) that are both confounded by age-dependent and nonlinear variation. Similarly, with regard to Table 2, the observed differences in biomarker z-scores according to pubertal phenotypes are not attributable to variation in age. Results from Table 2 show that boys and girls exhibiting pubarche without testicular maturation or breast development had significantly lower LH-for-age compared to children with established pubertal onset. Surprisingly, no significant differences in adrenal steroid hormones were observed between any of the groups. The gender-specific beta correlation matrices in Fig. 3 provide quantitative adjustments to be applied to patient endpoints during a clinical investigation. The standardized beta coefficients featured in Fig. 3 warrant some further discussion. First, the association between weight class and early puberty timing is well established, particularly with respect to female development (2). The results in Fig. 3 provide a quantitative measure of the association between BMI and female glandular tissue volume (β = 0.5; P < 0.001). With regard to boys, we observed only a minor standardized association between BMI and male testicular volume (β = 0.2; P < 0.001). Irrespective of age and gender, BMI was negatively associated with SHBG (β = −0.4; P < 0.001). In clinical practice, the results shown in Fig. 3 can be used as follows: since a 1 SD increase in BMI z-score associates with a 0.4 decline in SHBG z-score, blood sample results of underweight or overweight persons can be calibrated according to the patient’s weight class. Conversely, weight class and adiposity may be regarded as significant covariates of SHBG levels in children. Our reference beta coefficient matrices should enable clinicians to implement adjustments for gender, age, BMI, and other clinically relevant features when evaluating patient biomarker levels. We propose that applying such quantitative adjustments for key clinical covariates is an effective practice of personalized precision medicine in pediatric endocrinology. Lastly, we visualized systemic differences in the biomarker profile according to BMI weight classes and demonstrated a use-case for leveraging the biomarker z-scores interpolated from the current reference curves in ML classification. This analysis sought to examine whether the endocrine profiles (comprising all biomarkers featured in Figs. 1 and 2, total cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides) associated with “overweight” and “underweight” BMI weight classes could be resolved by a machine learning model. Importantly, using normalized biomarker z-scores adjusted for age and gender, it was feasible to combine girls and boys of all ages in this analysis, whereas this would not be the case with biomarkers denoted in absolute concentration units. Notably, adiponectin and leptin are adipocyte-derived adipokines that modulate whole-body energy balance and exhibit profoundly dysfunctional signaling in obese and insulin resistant individuals (49). Peripheral insulin resistance and hyperinsulinemia further dysregulate circulating cholesterol composition and hepatic synthesis of SHBG and IGF1 (50, 51). Results from the current PCA demonstrated that children with mild overweight (BMI z-score ≥ 1) exhibit a materially altered biomarker profile compared with underweight (BMI z-score ≤ −1) peers. Moreover, the corresponding PCA biplot demonstrated that more advanced pubertal characteristics for age was a defining feature of overweight children. Supervised machine learning is an AI method to produce an inferred function from labeled data (eg, using several feature variables to explain a known phenotype dichotomy) in order to computerize classification of new cases. The current “random forest” classification model to infer BMI weight class by biomarker profile outperformed the predictive values of any individual biomarkers. Although the population samples in the current study included only healthy children, we propose that supervised training of such classification models may provide useful clinical tools to diagnose and manage pediatric diseases, unfavorable metabolic profiles, and endocrinopathies. In conclusion, the LMS framework was used to configure reference charts for 17 circulating biomarkers, by which patients may be benchmarked in terms of age- and sex-adjusted equivalent z-scores. Differential attainment of pubic hair and/or gonadarche during the puberty onset age window was associated with distinct differences for both anthropometric and biomarker z-scores. Finally, we compiled a comprehensive association map of clinical variables and demonstrate high-accuracy machine-aided classification of a clinical dichotomy only by evaluating the biomarker z-score profile.

42 in total

1. Descriptive analytical data and consequences for calculation of common reference intervals in the Nordic Reference Interval Project 2000.

Authors: P Rustad; P Felding; A Lahti; P Hyltoft Petersen
Journal: Scand J Clin Lab Invest Date: 2004 Impact factor: 1.713

2. Development of paediatric biochemistry centile charts as a complement to laboratory reference intervals.

Authors: Tze Ping Loh; Georgia Antoniou; Peter Baghurst; Michael P Metz
Journal: Pathology Date: 2014-06 Impact factor: 5.306

3. Age- and sex-specific dynamics in 22 hematologic and biochemical analytes from birth to adolescence.

Authors: Jakob Zierk; Farhad Arzideh; Tobias Rechenauer; Rainer Haeckel; Wolfgang Rascher; Markus Metzler; Manfred Rauh
Journal: Clin Chem Date: 2015-05-12 Impact factor: 8.327

4. Smoothing reference centile curves: the LMS method and penalized likelihood.

Authors: T J Cole; P J Green
Journal: Stat Med Date: 1992-07 Impact factor: 2.373

5. Worldwide Secular Trends in Age at Pubertal Onset Assessed by Breast Development Among Girls: A Systematic Review and Meta-analysis.

Authors: Camilla Eckert-Lind; Alexander S Busch; Jørgen H Petersen; Frank M Biro; Gary Butler; Elvira V Bräuner; Anders Juul
Journal: JAMA Pediatr Date: 2020-04-06 Impact factor: 16.193

6. Construction of LMS parameters for the Centers for Disease Control and Prevention 2000 growth charts.

Authors: Katherine M Flegal; Tim J Cole
Journal: Natl Health Stat Report Date: 2013-02-11

7. Multisteroid LC-MS/MS assay for glucocorticoids and androgens, and its application in Addison's disease.

Authors: Paal Methlie; Steinar Simon Hustad; Ralf Kellmann; Bjørg Almås; Martina Moter Erichsen; Eystein Husebye; Kristian Løvås
Journal: Endocr Connect Date: 2013-06-14 Impact factor: 3.335

8. An Ultrasensitive Routine LC-MS/MS Method for Estradiol and Estrone in the Clinically Relevant Sub-Picomolar Range.

Authors: Bjørn-Erik Bertelsen; Ralf Kellmann; Kristin Viste; Anne Turid Bjørnevik; Hans Petter Eikesdal; Per Eystein Lønning; Jørn V Sagen; Bjørg Almås
Journal: J Endocr Soc Date: 2020-04-21

9. Childhood overweight and obesity and timing of puberty in boys and girls: cohort and sibling-matched analyses.

Authors: Nis Brix; Andreas Ernst; Lea Lykke Braskhøj Lauridsen; Erik Thorlund Parner; Onyebuchi A Arah; Jørn Olsen; Tine Brink Henriksen; Cecilia Høst Ramlau-Hansena
Journal: Int J Epidemiol Date: 2020-06-01 Impact factor: 7.196

10. References for Ultrasound Staging of Breast Maturation, Tanner Breast Staging, Pubic Hair, and Menarche in Norwegian Girls.

Authors: Ingvild Særvold Bruserud; Mathieu Roelants; Ninnie Helén Bakken Oehme; Andre Madsen; Geir Egil Eide; Robert Bjerknes; Karen Rosendahl; Petur B Juliusson
Journal: J Clin Endocrinol Metab Date: 2020-05-01 Impact factor: 5.958

1 in total

1. The Beauty of Age-dependent Standardization in Pediatric Endocrine Research and Practice.

Authors: Jaakko J Koskenniemi; Jorma Toppari
Journal: J Clin Endocrinol Metab Date: 2022-07-14 Impact factor: 6.134

1 in total