Literature DB >> 34095386

Dataset-chemokines, cytokines, and biomarkers in the saliva of children with Sjögren's syndrome.

Miyuraj Harishchandra Hikkaduwa Withanage1, M Paula Gomez Hernandez2, Emily E Starman3, Andrew B Davis4, Erliang Zeng1, Scott M Lieberman5, Kim A Brogden2, Emily A Lanzel6.   

Abstract

Sjögren's syndrome is an autoimmune disease that can also occur in children. The disease is not well defined and there is limited information on the presence of chemokines, cytokines, and biomarkers (CCBMs) in the saliva of children that could improve their disease diagnosis. In a recent study [1], we reported a large dataset of 105 CCBMs that were associated with both lymphocyte and mononuclear cell functions [2] in the saliva of 11 children formally diagnosed with Sjögren's syndrome and 16 normal healthy children. Here, we extend those findings and use the Mendeley dataset [2] to identify CCBMs that have predictive power for Sjögren's syndrome in female children. Datasets of CCBMs from all saliva samples and female children saliva samples were standardized. We used machine learning methods to select Sjögren's syndrome associated CCBMs and assessed the predictive power of selected CCBMs in these two datasets using receiver operating characteristic (ROC) curves and associated areas under curve (AUC) as metrics. We used eight classifiers to identify 16 datasets that contained from 2 to 34 CCBMs with AUC values ranging from 0.91 to 0.94.
© 2021 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Biomarkers; Chemokines; Children; Cytokines; Saliva; Sjögren's syndrome

Year:  2021        PMID: 34095386      PMCID: PMC8165406          DOI: 10.1016/j.dib.2021.107139

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The Mendeley dataset is among the first to report 105 CCBMs that were associated with both lymphocyte and mononuclear cell functions in the saliva of children with Sjögren's syndrome. The dataset can be used to identify smaller groups of CCBMs that can serve as predictor biomarkers for Sjögren syndrome diagnosis in children. Receiver operating characteristic (ROC) curves and associated areas under curve (AUC) are the metrics of machine learning methods that were used to evaluate the predictive power of these CCBMs for Sjögren's syndrome. Datasets with high AUC values in saliva of female children samples indicated that those CCBMs can serve as predictor biomarkers for Sjögren syndrome diagnosis in children.

Data Description

Sjögren's syndrome in children is not well defined. Xerostomia and xerophthalmia are not often the primary manifestations in children. Half of the children with Sjögren's syndrome present with parotitis and the other half present with less-specific clinical features. Therefore, there is a need to identify specific CCBMs in saliva of children with Sjögren's syndrome to increase child-specific diagnostic criteria. In a recent study, we used multiplex fluorescent microparticle-based immunoassays to determine the concentrations of 105 CCBMs in the saliva of children with Sjögren's syndrome and in the saliva of 16 normal healthy children [1]. The CCBMs we selected for that study were related to leukocyte activities and functions [1]. The dataset is accessible in the Mendeley Data Repository [2]. Specifically, 43/105 CCBMs were different (p < 0.05) in children with Sjögren's syndrome compared to the healthy study controls [1]. Elevated CCBMs in IPA annotations were associated with autoimmune diseases and specific leukocyte functions including those associated with cellular movement, immune cell trafficking, and cell signaling. ROC curves and AUC values identified smaller datasets of CCBMs (e.g., IL27 and CCL4, k-Nearest Neighbor; AUC=0.93) that could be used as predictors for Sjögren's syndrome in children. In adults, the ratio of females to males with Sjögren's syndrome is 9:1 but in children, the ratio of females to males is 5:1. Previously, we used the Mendeley dataset [2] to identify the CCBMs that can have predictive power for Sjögren's syndrome in children [1]. In this report, we used the Mendeley dataset [2] to identify CCBMs that can have predictive power for Sjögren's syndrome in female children. The data for this study is in an excel file called “Dataset-CCBMs, saliva, children, SS (01–02–21).xlsx”. posted in the Mendeley database: http://dx.doi.org/10.17632/yphm77tg24.1. This file contains the replications of 105 CCBM concentrations (pg/ml) for the 27 subjects. The excel file itself is a bundle of multiple spreadsheets identified by different tabs (startingData, step2_replicateGroups, stp3_groupMeans, stp4_meanDataRowsAdded, stp5_OnlyMeanRowsRemain, input_allData, input_FemaleData_only, Description). Each spreadsheet stores afore-mentioned data at different preprocessing stages including the starting raw data and final preprocessed data. Tab names of spreadsheets are enumerated and briefly described below. StartingData The spreadsheet contains the CCBM measurements before any preprocessing. Stp2_replicateGroups: The spreadsheet contains the grouped replicate information of the three replicates. We annotate grouped replicates as 1,2,3,..,26, 27. Annotation are in the second column of the spreadsheet. Stp3_groupMeans: The spreadsheet contains group means. The rows containing group means are annotated as “new” in the second column of the spreadsheet. Stp4_meanDataRowsAdded: The spreadsheet in which a labeled row (i.e., labeled to represent each group) was added and defined for mean values. These rows have been annotated as “new” in the second column of the spreadsheet. Stp5_OnlyMeanRowsRemain: The spreadsheet contains only the mean data rows after the raw data rows are discarded. Input_alldata The spreadsheet was created based on stp5_OnlyMeanRowsRemain spreadsheet. It contains data from both female and male samples. All the CCBM data columns were standardized using https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler). In Fig. 1, we compared the maximum AUC values resulting from different classifiers (i.e., best performing models) constructed based on all samples (orange) and constructed based only on female samples (blue). The maximum AUC values reported from classifier models constructed based only on female samples were higher than the AUC values reported by models constructed based on all samples at all times. Differences appeared greater using the classifiers Logistic Regression, Naïve Bayes, Support Vector Machine with radial basis function (rbf) Kernel, Support Vector Machine with Linear Kernel, and AdaBoost than with the classifiers Gaussian Process, Random Forest, and Nearest Neighbor.
Fig. 1

Comparison of the highest area under curve (AUC) values resulting from classifiers using two datasets: maximum AUC of all samples (red) and maximum AUC of female samples only (blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Comparison of the highest area under curve (AUC) values resulting from classifiers using two datasets: maximum AUC of all samples (red) and maximum AUC of female samples only (blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) In Fig. 2, we compared the ROC curves of classifiers on different feature sets. The higher AUC values indicated that CCBMs can serve as predictor biomarkers for Sjögren syndrome diagnosis in children. Classifiers included the k-Nearest Neighbor, Random Forest, Gaussian Process, Support Vector Machine with rbf Kernel, Support Vector Machine with Linear Kernel, Logistic Regression, AdaBoost, and Naïve Bayes.
Fig. 2

Comparison of receiver operating characteristic (ROC) curves of classifiers on different feature sets. The area under curve (AUC) values indicate CCBMs can be served as predictor biomarkers for Sjögren syndrome diagnosis in children. Eight classifiers were used, including k-NN: k-Nearest Neighbor, RF: Random Forest, GP: Gaussian Process, SVM (rbf): Support Vector Machine with rbf Kernel, SVM (Linear): Support Vector Machine with Linear Kernel, LR: Logistic Regression, AB: AdaBoost, and NB: Naïve Bayes.

Comparison of receiver operating characteristic (ROC) curves of classifiers on different feature sets. The area under curve (AUC) values indicate CCBMs can be served as predictor biomarkers for Sjögren syndrome diagnosis in children. Eight classifiers were used, including k-NN: k-Nearest Neighbor, RF: Random Forest, GP: Gaussian Process, SVM (rbf): Support Vector Machine with rbf Kernel, SVM (Linear): Support Vector Machine with Linear Kernel, LR: Logistic Regression, AB: AdaBoost, and NB: Naïve Bayes. Eight classifiers identified 16 datasets containing 2–34 CCBMs with AUC values of 0.91 to 0.94 (Table 1). It is interesting to note that different feature sets could have the same prediction power, that is, the same AUC values, yet contain different datasets of CCBMs. This ranged from the Gaussian Process classifier, which contained 4 datasets of 4–11 CCBMs with an AUC of 0.91 to the Nearest Neighbor, SVM (rbf), SVM (Linear), AdaBoost, and Naïve Bayes classifiers, which contained 8 datasets of 2–34 CCBMs with an AUC of 0.94.
Table 1

The area under curve (AUC) values of best performing models.

ClassifierModel NameFeature set (a)AUCa
Random Forest (RR)RR_fe_1IL27, MIA, CCL4, CXCL110.93
RR_fe_2IL27, MIA, CCL4, CXCL11, IL23A
RR_fe_3IL27, MIA, CCL4, CXCL11, TNFRSF18
Logistic Regression (LR)LR_fe_1IL27, CCL4, CXCL110.93
Gaussian Process (GP)GP_fe_1IL27, CCL4, CXCL11, IL23A0.91
GP_fe_2IL27, CCL4, CXCL11, MIA
GP_fe_3IL27, CCL4, CXCL11, MIA, TNFRSF18, CCL19, IL12B, TNFRSF9, LIF, CCM2, GZMB
GP_fe_4IL27, CCL4, CXCL11, IL23A, TNFRSF18, CCL19, IL12B, TNFRSF9, LIF
k-Nearest Neighbor (NN)NN_fe_1IL27, CCL40.94
SVM(rbf)SVMr_fe_1CCL4, IL27, CXCL11, TNFRSF180.94
SVMr_fe_2CCL4, IL27, CXCL11
SVM(Linear)SVML_fe_1IL27, CCL4, CXCL11, TNFRSF180.94
SVML_fe_2IL27, CCL4, CXCL11, TNFRSF18, MIA
AdaBoost (AB)AB_fe_1CCL4, IL27, CXCL11, TNFRSF18, IL12B, ALCAM, CCL19, IL23A, TSLP, IL16, LIF, TNFSF5, TNFRSF8, CCL20, IRX1, CCL15, IL15, TNFRSF9, CXCL13, CCM2, CD40, CCL21, IL1B, MIA, XCL1, MMP9, CCL11, S100A8, GZMB, ULBP2, TNFSF11, CCL5, CXCL10, IFNB10.94
Naïve Bayes (NB)NB_fe_1IL27, CCL4, CXCL11, MIA, IL23A, CCL21, CCL19, ACAN, CCM2, TNFRSF90.94
NB_fe_2IL27, CCL4, CXCL11, MIA, IL23A, CCL21, CCL19, ACAN, CCM2, TNFRSF9, IL12B

Two different feature sets could have the same prediction power, that is, same AUC values. For example, models “SVML_fe_1″ and “SVML_fe_1″ have the same AUC value (0.94).

The area under curve (AUC) values of best performing models. Two different feature sets could have the same prediction power, that is, same AUC values. For example, models “SVML_fe_1″ and “SVML_fe_1″ have the same AUC value (0.94).

Experimental Design, Materials and Methods

Samples

Unstimulated whole saliva was collected as approved by the Human Institutional Review Board of the University of Iowa (IRB ID #: 200907702) (1). Consent was obtained, background material was recorded, and 1.0 to 7.0 ml saliva was collected from 11 children formally diagnosed with Sjögren's syndrome and 16 normal healthy children, matched for gender and age. After collection, saliva was stored at −80 °C until analysis. The features of these children have been previously reported (1). After all saliva samples were collected, they were thawed on ice and centrifuged at 16,100 RCF (13,200 RPM, Eppendorf, 5415D centrifuge, Brinkmann Instruments, Inc., Westbury, NY) for 5 min at 24 °C to pellet particulates and debris. Sample supernatants were decanted to another tube and held on ice.

Determination of CCBMs

The concentrations (pg/ml) of 105 CCBMs were determined in each supernatant, in triplicate, using multiplex fluorescent microparticle-based immunoassays (Luminex Human Magnetic Assay, R&D Systems, Minneapolis, MN). Detection of 105 CCBM analytes was performed in 4 assay runs and the product catalog numbers, lot numbers, and CCBM composition in each kits is listed in Table 2. Briefly, three 50 µl aliquots of saliva supernatant were added to immunoassay plates. Magnetic microparticles with attached anti-human analyte-specific antibodies were added and the immunoassay plates were incubated on an orbital shaker (Titer Plate Shaker, Lab-Line Instruments, Inc., Melrose Park, IL USA) at 4 °C. After 18.0 h, the microparticles were washed (ELx405TS magnetic plate washer, BioTek, Winooski, VT USA) and incubated with biotinylated anti-human analyte-specific antibodies at room temperature for 1.0 h in the dark. These antibodies were diluted as indicated in the instructions on the inserts in each kit. The microparticles were washed, incubated with streptavidin-phycoerythrin conjugate at room temperature for 0.5 h, washed, and suspended in buffer. Median fluorescent intensity (MFI) values of the bound phycoerythrin were determined in the Luminex model 100 IS (Luminex, Austin, TX USA).
Table 2

Catalog and lot numbers for kits of fluorescent microparticle-based immunoassays used to determine the concentrations of chemokines, cytokines, and biomarkers (CCBMs) in the saliva of children formally diagnosed with Sjögren's syndrome and from normal healthy children, matched for gender and age, who served as study controls.

Kit/LotaNo. of CCBMs in PlexCCBMs in kit
LXSAHM-07 (lot 129,053)7CCL5, CCL17, IL12A, TIMP1, TNFRSF13B, TNFRSF17, TNFRSF1A
LXSAHM-36 (lot 129,052)36B2M, CALCA, CCL1, CCL11, CCL20, CCL22, CCL26, CCL28, CCL3, CCL7, CCL8, CD274, CD40, CXCL13, CXCL14, CXCL4, CXCL9, FCER1G, GZMA, IFNA1, IFNB1, IFNG, IFNGR1, IL16, IL1A, IL1B, IL1R2, IL21, IL6, LGALS3, LGALS9, SLPI, TNFRSF7, TNFRSF8, TNFSF11, TNFSF5
LXSAHM-22 (lot 128,990)22IL11RA, C9, CCL21, CST3, FSTL1, IFNL3, IGFBP3, IL12B, IL15, IL27, LGALS3BP, LIF, LTF, MIA, NAGLU, S100A8, S100A9, TNFRSF18, TNFRSF1B, TSLP, ULBP2, XCL1
LXSAHM-40 (lot 129,245)40A2M, ACAN, ALCAM, AMBP, C5, CA9, CCL13, CCL14, CCL15, CCL18, CCL19, CCL2, CCL23, CCL24, CCL25, CCL27, CCL4, CCM2, CD276, CTSS, CXCL10, CXCL11, FASLG, FSTL3, GAS6, GDF2, GZMB, IFNL2, IL10, IL23A, IL2RA, IL7, IRX1, MMP9, PECAM1, SELL, TNFA, TNFRSF9, TNFSF13, TNFSF13B

Luminex Human Magnetic Assay, R&D Systems, Minneapolis, MN USA.

Catalog and lot numbers for kits of fluorescent microparticle-based immunoassays used to determine the concentrations of chemokines, cytokines, and biomarkers (CCBMs) in the saliva of children formally diagnosed with Sjögren's syndrome and from normal healthy children, matched for gender and age, who served as study controls. Luminex Human Magnetic Assay, R&D Systems, Minneapolis, MN USA. CCBM concentrations were interpolated from their MFI values using five parameter logistic curves created from the standard concentrations and their respective MFI readings on the Luminex 100 IS using xPonent v3.1 software (Luminex, Austin, TX) or on the readout files using Milliplex Analyst v5.1 software (EMD Millipore, Billerica, MA) as previously described (1). CCL27 concentrations were below the standard curve and these concentrations were extrapolated from their MFI values using curves created from zero concentration to the lowest standard concentration and their respective MFI readings. B2M concentrations were above the standard curve and these concentrations were extrapolated from their MFI values using curves extended beyond the highest concentration and their respective MFI readings.

Analysis

For analysis, the data went through a series of transformations as described in the Mendeley dataset. The replicates were first grouped, a mean was calculated, and the means were used for determining statistical analysis among groups, hierarchical clustering, principal component analysis, and ingenuity pathway analysis (1). The dataset of mean concentrations of CCBMs from both female and male children saliva samples were then standardized and the dataset of mean concentrations of CCBMs from female children saliva samples were also standardized. Machine learning methods were then used to assess the predictive power of the CCBMs in these two datasets using ROC and AUC metrics [1]. Correlation [3], Information Gain [4], Information Gain Ratio [5], Symmetrical Uncertainty [6], and ReliefF [7] feature selection methods were used to rank CCBMs. An ensemble method of the aggregated ranks of five feature selection methods was also used to rank CCBMs. Discrete sets of CCBMs (i.e., Union, AtLeast2, AtLeast3, AtLeast4, SelectedByFive) were obtained using Venn diagram, representing overlapped features of five feature selection methods. The “Union” feature set at a given rank k is the union of all top k features selected by five methods. The “SelectedByFive” feature set at a given rank k is the common features of all top k feature sets each selected by five methods. This definition is analogous for feature sets of “AtLeast2”, “AtLeast3”, and “AtLeast4”. For example, “AtLeast2” feature set at a given rank k includes top k features that are selected by at least 2 feature selection methods. Each top k feature set was evaluated using classification methods: K-Nearest Neighbor (k = 3) [8], AdaBoost (trees=100) [9], Support Vector Machine (Linear Kernel) [10], Support Vector Machine (radial basis function (rbf) Kernel) [11], Naïve Bayes [12], Random Forest (trees=100) [13], Logistic Regression, and Gaussian Process [14]. The classifier used selected features to train a model that then was used to predict Sjögren's Syndrome. The performance of a classifier on a specific feature set was evaluated using leave-one-out cross-validation and ROC curve. The average AUC was used to measure the performance.

Ethics Statement

Twenty-seven children were consented and 1.0 to 7.0 ml of unstimulated whole saliva was collected as approved by the Human Institutional Review Board of the University of Iowa (IRB ID #: 200907702).

CRediT Author Statement

Miyuraj Harishchandra Hikkaduwa Withanage: Methodology, Formal analysis, Writing – original draft, Writing – review & editing; M. Paula Gomez Hernandez: Methodology, Conceptualization, Writing – original draft, Writing – review & editing; Emily E. Starman: Methodology, Writing – original draft, Writing – review & editing; Andrew B. Davis: Methodology; Erliang Zeng: Formal analysis, Writing – original draft, Writing – review & editing; Scott M. Lieberman: Conceptualization, Writing – original draft, Writing – review & editing, Funding acquisition; Kim A. Brogden: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Project administration, Funding acquisition; Emily A. Lanzel: Conceptualization, Writing – original draft, Writing – review & editing, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.
SubjectImmunology
Specific subject areaSjögren's syndrome in children
Type of dataFigure and Table
How data were acquired 1. Fluorescent microparticle-based immunoassays (Luminex Human Magnetic Assay, R&D Systems, Minneapolis, MN)
 2. Luminex model 100 IS (Luminex, Austin, TX USA)
 3. xPonent v3.1 software (Luminex, Austin, TX)
 4. Milliplex Analyst v5.1 software (EMD Millipore, Billerica, MA)
Data formatRaw and analyzed
Parameters for data collectionCCBM concentrations in saliva samples from children with Sjögren's syndrome and from healthy children of the same gender and age
Description of data collectionSaliva samples were collected from August 30, 2016 to May 23, 2017. CCBM data was collected from July 12, 2019 to August 16, 2019
Data source locationInstitution: University of Iowa College of DentistryCity/Town: Iowa CityCountry: USALocation: 41.6628 (41°39′46″N), −91.5511 (91°33′4″W)
Data accessibilityRepository name: Mendeley DataIdentification number: http://dx.doi.org/10.17632/yphm77tg24.1Direct URL to data: https://data.mendeley.com/datasets/yphm77tg24/1
Related research articleM.P. Gomez Hernandez, E.E. Starman, A.B. Davis, M.H. Hikkaduwa Withanage, E. Zeng, S.M. Lieberman, K.A. Brogden, E.A. Lanzel. A unique profile of chemokines, cytokines, and biomarkers in the saliva of children with Sjögren syndrome. Rheumatology (2021) 1–13.https://doi.org/10.1093/rheumatology/keab098
  2 in total

1.  Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring.

Authors:  Xia Jiang; Jeremy Jao; Richard Neapolitan
Journal:  PLoS One       Date:  2015-12-01       Impact factor: 3.240

2.  A distinguishing profile of chemokines, cytokines and biomarkers in the saliva of children with Sjögren's syndrome.

Authors:  M Paula Gomez Hernandez; Emily E Starman; Andrew B Davis; Miyuraj Harishchandra Hikkaduwa Withanage; Erliang Zeng; Scott M Lieberman; Kim A Brogden; Emily A Lanzel
Journal:  Rheumatology (Oxford)       Date:  2021-10-02       Impact factor: 7.580

  2 in total
  1 in total

1.  A distinguishing profile of chemokines, cytokines and biomarkers in the saliva of children with Sjögren's syndrome.

Authors:  M Paula Gomez Hernandez; Emily E Starman; Andrew B Davis; Miyuraj Harishchandra Hikkaduwa Withanage; Erliang Zeng; Scott M Lieberman; Kim A Brogden; Emily A Lanzel
Journal:  Rheumatology (Oxford)       Date:  2021-10-02       Impact factor: 7.580

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.