| Literature DB >> 35071482 |
Hui Zhao1,2,3, Ying Guo1,2, Yanan Ma1,2, Yunping Chen1,2, Haiming Sun1,2, Donglin Sun1,2, Nan Wu1,2, Yan Jin1,2.
Abstract
BACKGROUND: The identification of disease-related biological modules plays an important role in our understanding of the process of diseases. Although single-cell RNA sequencing (scRNA-seq) provides high-resolution transcriptome data that can potentially characterize subtle gene expression changes within cells, the susceptibility of the gene expression information to the influence of individual genes also makes it difficult to distinguish the biological module.Entities:
Keywords: Single-cell RNA sequencing (scRNA-seq); biological module; blood; colorectal cancer; python package
Year: 2021 PMID: 35071482 PMCID: PMC8756206 DOI: 10.21037/atm-21-6401
Source DB: PubMed Journal: Ann Transl Med ISSN: 2305-5839
Figure 1Scatter diagrams used to show the characteristics of entropy information and correlation coefficients of tested GO terms. (A) Scatter plot of annotation gene number (x-axis) and original entropy information (y-axis), entropy information increased with the increase of annotation gene number. (B) Scatter plot of annotation gene number (x-axis) and normalized entropy information (y-axis). (C) Scatter plot of annotation gene number (x-axis) and correlation coefficient (y-axis). The correlation coefficients were more volatile with smaller annotation gene numbers and became stable with the increase of annotation gene number. (D) Scatter plot of normalized entropy information difference (x-axis) and correlation coefficient. The IScores were computed by the Euclidean distance between coordinated point values of GO terms and coordinates [0, 1]. GO, Gene Ontology; IScore, integrated scores.
Figure 2Biological function module analysis of CRC-data. (A) The heatmap shows the IScore of each cell. The top 5 and last 5 BFMs of KEGG pathways were presented to show differences between Tumor (Tu) and normal mucosa (Mu) groups. (B) The violin plot shows gene expression patterns in Tu and Mu groups. Genes displayed in the graph were annotated in the hsa04612 KEGG pathway. (C) The violin plot shows gene expression patterns of the hsa05034 KEGG pathway. Gene names with scarlet font indicate significant differentially expressed genes, and others are represented by the gray font. To better present the expression trend of genes, gene expression values were log-transformed and shown as colored circles for each gene, whereas the empty circles indicate the median of gene expression value across cells. FDR* indicates the log-transformed and converted into the positive value of FDR. KEGG, Kyoto Encyclopedia of Genes and Genomes; BFM, biological function module; FDR, false discovery rate; CRC, colorectal cancer.
The statistical information of the top 5 and last 5 KEGG pathways identified by NonLoss
| Accession ID | Function module | Number of genes | Ratio | |
|---|---|---|---|---|
| Total | Sig | |||
| Top 5 BFMs | ||||
| Hsa04612 | Antigen processing and presentation | 22 | 14 | 0.64 |
| Hsa03320 | PPAR signaling pathway | 13 | 7 | 0.54 |
| Hsa04978 | Mineral absorption | 13 | 8 | 0.62 |
| Hsa05164 | Influenza A | 36 | 14 | 0.39 |
| Hsa04919 | Thyroid hormone signaling pathway | 23 | 7 | 0.30 |
| Last 5 BFMs | ||||
| Hsa05031 | Amphetamine addiction | 16 | 4 | 0.25 |
| Hsa05034 | Alcoholism | 24 | 4 | 0.17 |
| Hsa04071 | Sphingolipid signaling pathway | 16 | 1 | 0.06 |
| Hsa05206 | MicroRNAs in cancer | 31 | 8 | 0.26 |
| Hsa04211 | Longevity regulating pathway | 12 | 2 | 0.17 |
Note: ‘Total’ represents the number of genes annotated to the module and ‘Sig’ represents the number of significantly differentially expressed genes. The last column is the ratio of ‘Sig’ and ‘Total’ column. KEGG, Kyoto Encyclopedia of Genes and Genomes.
Figure 3Representative dot plot of DIS (x-axis) versus log-transformed P value (y-axis) analysis of three GO categories. The red spots indicate the GO terms with significant differenced (P<0.05 and DIS ≥0.5) and the black spots indicate GO terms with no major differences (P≥0.05 or DIS <0.5) between conditions. DIS, difference of IScore; GO, Gene Ontology.
Figure 4Robustness evaluation of the NonLoss method. (A) Repeat rates of the top 15 GO terms show the frequency of their occurrence for different SN, when randomly sampled 100 times. (B) Overlap rate was calculated by taking the intersection of DBM results of a random sample set with a specific SN. The x-axis shows the average overlap rate of DBMs from random sampling 100 times. The y-axis displays the number of cells randomly sampled from datasets. The gray bars represent 2 groups of the calculation that were randomly sampled from normal mucosa and tumor cell populations, respectively (paired), whereas the crimson bars represent 2 groups of the calculation that were randomly sampled from the CRC-data (random). GO, Gene Ontology; SN, sampling number; DBM, differential biological module.
Figure 5The overlap analysis of DBMs between different time pair comparisons. (A) The Venn diagram shows the number of DBMs overlap between cells of DM and cells collected 24, 48 or 72 h after serum switch, respectively. Upper panel is for bulk RNA-seq data and lower panel is for scRNA-seq data. (B) The Venn diagram shows the number of DBMs overlap between successive time paired comparisons. Upper panel is for bulk RNA-seq data and lower panel is for scRNA-seq data. (C) Number of DBMs histogram statistics. Upper panel is number of DBMs overlap between cells of DM and cells collected 24, 48, or 72 h after serum switch, respectively. The lower panel is number of DBMs overlap between successive time paired comparisons. DBM, differential biological module.