| Literature DB >> 35486595 |
Zhan Dong Li1, Xiangtian Yu2, Zi Mei3, Tao Zeng4, Lei Chen5, Xian Ling Xu6, Hao Li1, Tao Huang4,7, Yu-Dong Cai8.
Abstract
Mammary gland is present in all mammals and usually functions in producing milk to feed the young offspring. Mammogenesis refers to the growth and development of mammary gland, which begins at puberty and ends after lactation. Pregnancy is regulated by various cytokines, which further contributes to mammary gland development. Epithelial cells, including basal and luminal cells, are one of the major components of mammary gland cells. The development of basal and luminal cells has been observed to significantly differ at different stages. However, the underlying mechanisms for differences between basal and luminal cells have not been fully studied. To explore the mechanisms underlying the differentiation of mammary progenitors or their offspring into luminal and myoepithelial cells, the single-cell sequencing data on mammary epithelia cells of virgin and pregnant mouse was deeply investigated in this work. We evaluated features by using Monte Carlo feature selection and plotted the incremental feature selection curve with support vector machine or RIPPER to find the optimal gene features and rules that can divide epithelial cells into four clusters with different cell subtypes like basal and luminal cells and different phases like pregnancy and virginity. As representations, the feature genes Cldn7, Gjb6, Sparc, Cldn3, Cited1, Krt17, Spp1, Cldn4, Gjb2 and Cldn19 might play an important role in classifying the epithelial mammary cells. Notably, seven most important rules based on the combination of cell-specific and tissue-specific expressions of feature genes effectively classify the epithelial mammary cells in a quantitative and interpretable manner.Entities:
Mesh:
Year: 2022 PMID: 35486595 PMCID: PMC9053804 DOI: 10.1371/journal.pone.0267211
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Entire procedures to analyze the single-cell sequencing data for virgin and pregnant mouse mammary epithelia cells.
The data is retrieved from Gene Expression Omnibus. Powerful Monte Carlo Feature Selection is applied on such data, resulting in a feature list. This list is fed into incremental feature selection, incorporating two classification algorithms, to build efficient classifiers and extract discriminative genes and rules.
Fig 2IFS curves of two classification algorithms based on feature list yielded by MCFS.
The highest MCC values for SVM and RIPPER are 0.976 and 0.905, respectively, which are obtained by using top 420 and 120, respectively, features in the list. The SVM classifier using top 90 features also provides good performance.
Performance of some key classifiers.
| Classification algorithm | Number of features | Overall accuracy | MCC |
|---|---|---|---|
| Support vector machine | 420 | 0.983 | 0.976 |
| Support vector machine | 90 | 0.933 | 0.906 |
| Repeated incremental pruning to produce error reduction | 120 | 0.933 | 0.905 |
Fig 3Individual accuracies of three key classifiers.
The optimum SVM classifier gives quite high even perfect performance on four classes. The SVM classifier with top 90 features and the optimum SVM classifier provide almost equal performance.
Rules generated by RIPPER.
| ID | Rule | Class |
|---|---|---|
| Rule-1 | (Tbata > = 1.3033) and (Ssh2 > = 0.1463) | VB |
| Rule-2 | (Apoe > = 1564.7779) and (Acta2 < = 6013.4274) | VB |
| Rule-3 | (Iqub > = 0.0098) | VB |
| Rule-4 | (Tusc5 > = 3.6569) | VB |
| Rule-5 | (Gjb6 > = 0.0158) and (Mal2 > = 0.1745) | PL |
| Rule-6 | (Cldn7 > = 0.8431) | VL |
| Rule-7 | Others | PB |
Important discriminative genes for distinguishing mammary epithelia cells.
| Gene symbol | Description | RI score |
|---|---|---|
| Cldn7 | Claudin 7 | 0.6574 |
| Gjb6 | Gap Junction Protein Beta 6 | 0.5942 |
| Sparc | Secreted Protein Acidic And Cysteine Rich | 0.5584 |
| Cldn3 | Claudin 3 | 0.3598 |
| Cited1 | Cbp/P300 Interacting Transactivator With Glu/Asp Rich Carboxy-Terminal Domain 1 | 0.3334 |
| Krt17 | Keratin 17 | 0.3098 |
| Spp1 | Secreted Phosphoprotein 1 | 0.2669 |