| Literature DB >> 29025390 |
Abstract
BACKGROUND: DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system.Entities:
Keywords: Escherichia coli Microarray expression data; Fourier coefficients; K-means clustering; Legendre polynomials; Principal points; Silhouette; Yeast cell-cycle data
Mesh:
Year: 2017 PMID: 29025390 PMCID: PMC5639779 DOI: 10.1186/s12859-017-1860-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of partitioning with principal points for original data, Legendre polynomial coefficients and Fourier coefficients in 500 repetitions and m = 20 repeated design points with low noise σ = 0.5 and high noise σ = 1.5
| K = 6 subsets | σ = 0.5 | σ = 1.5 | |||
|---|---|---|---|---|---|
| Number of coeff | Mean | Connectivity | Mean | Connectivity | |
|
| Original data: y | 0.114 | 102.05 | 0.076 | 105.54 |
| Legendre coeff: LPC | 0.531 | 25.036 | 0.511 | 23.932 | |
| Fourier coeff: FC | 0.270 | 61.628 | 0.235 | 63.621 | |
|
| Original data: y | 0.118 | 102.691 | 0.082 | 105.497 |
| Legendre coef: LPC | 0.534 | 22.699 | 0.539 | 22.614 | |
| Fourier coeff: FC | 0.235 | 68.572 | 0.224 | 73.308 | |
|
| Original data: y | 0.116 | 101.743 | 0.081 | 105.343 |
| Legendre coeff: LPC | 0.547 | 22.526 | 0.539 | 22.846 | |
| Fourier coeff: FC | 0.212 | 74.110 | 0.198 | 77.572 | |
Fig. 1Flowchart of the whole methodology of the proposed partitioning
Fig. 2GAP statistics from K = 4 to K = 8
Principal points partitioning results in K = 5 subsets based on J the number Legendre polynomial coefficients and Fourier coefficients with yeast data
| LPCa | FCb | |
|---|---|---|
| Number of LPC | Average | Average |
|
| 0.485 | 0.2256 |
|
| 0.494 | 0.1954 |
|
| 0.511 | 0.2118 |
|
| 0.520 | 0.1417 |
|
| 0.516 | 0.1298 |
|
| 0.500 | 0.1394 |
aLPC: Legendre polynomial coefficients
bFC: Fourier coefficients
Principal points partitioning results with original data, Legendre polynomial coefficients and Fourier coefficients in K = 5 subsets with yeast data
| K = 5 | Components | ||
|---|---|---|---|
| Y ( | LPC ( | FC ( | |
| Number of genes in 5 subsets | 1232 743,484,147 1883 | 120,128,914 1241 2086 | 2625 495 40 1160 169 |
| Average Silhouette | 0.095 | 0.511 | 0.2118 |
| Connectivity | 2273.658 | 61.53 | 1018.696 |
Fig. 3Silhouette values in 5 subsets with principal points partitioning with J = 4 Legendre polynomial coefficients for yeast data
Fig. 4Loess smoothed gene score means in 5 subsets based on five Legendre polynomial coefficients of yeast data
Fig. 5Means of Legendre polynomial coefficients in five subsets of yeast data
Summary of over-represented KEGG pathway terms in each subset of yeast data
| Category | Term | KEGG id | count |
| FDR |
|---|---|---|---|---|---|
| Subset 1 | DNA replication | ko03030 | 10 | 6.10E-09 | 2.40E-07 |
| Mismatch repair | ko03430 | 7 | 2.20E-06 | 4.30E-05 | |
| Cell cycle - yeast | ko04111 | 11 | 1.80E-04 | 2.30E-03 | |
| Amino sugar and nucleotide sugar metabolism | ko00520 | 6 | 4.70E-04 | 4.70E-03 | |
| Pyrimidine metabolism | ko00240 | 8 | 6.70E-04 | 5.40E-03 | |
| Base excision repair | ko03410 | 4 | 6.00E-03 | 3.90E-02 | |
| Nucleotide excision repair | ko03420 | 5 | 7.30E-03 | 4.10E-02 | |
| Starch and sucrose metabolism | ko00500 | 5 | 9.60E-03 | 4.70E-02 | |
| Galactose metabolism | ko00052 | 4 | 1.40E-02 | 5.90E-02 | |
| Purine metabolism | ko00230 | 7 | 1.50E-02 | 5.90E-02 | |
| Meiosis - yeast | ko04113 | 7 | 4.90E-02 | 1.70E-01 | |
| Homologous recombination | ko03440 | 3 | 6.10E-02 | 1.90E-01 | |
| Fructose and mannose metabolism | ko00051 | 3 | 7.30E-02 | 2.10E-01 | |
| Subset 2 | MAPK signaling pathway - yeast | ko04011 | 6 | 6.00E-04 | 1.20E-02 |
| Cell cycle - yeast | ko04111 | 8 | 1.20E-03 | 1.20E-02 | |
| Meiosis - yeast | ko04113 | 7 | 7.10E-03 | 4.90E-02 | |
| DNA replication | ko03030 | 3 | 7.00E-02 | 3.20E-01 | |
| Subset 3 | Metabolic pathways | map01100 | 136 | 3.90E-09 | 3.80E-07 |
| Biosynthesis of secondary metabolites | map01110 | 65 | 1.20E-05 | 5.80E-04 | |
| Glycerophospholipid metabolism | ko00564 | 14 | 5.50E-04 | 1.70E-02 | |
| Carbon metabolism | ko01200 | 29 | 5.70E-04 | 1.40E-02 | |
| Tyrosine metabolism | ko00350 | 7 | 6.10E-03 | 1.10E-01 | |
| Glycolysis / Gluconeogenesis | ko00010 | 16 | 6.70E-03 | 1.00E-01 | |
| Propanoate metabolism | ko00640 | 6 | 9.30E-03 | 1.20E-01 | |
| Fatty acid elongation | ko00062 | 5 | 1.40E-02 | 1.50E-01 | |
| Biosynthesis of antibiotics | map01130 | 41 | 1.70E-02 | 1.70E-01 | |
| Fatty acid metabolism | ko01212 | 8 | 1.90E-02 | 1.70E-01 | |
| Oxidative phosphorylation | ko00190 | 17 | 2.30E-02 | 1.80E-01 | |
| Pyruvate metabolism | ko00620 | 11 | 2.70E-02 | 2.00E-01 | |
| Starch and sucrose metabolism | ko00500 | 11 | 3.20E-02 | 2.10E-01 | |
| Glycosylphosphatidylinositol(GPI)-anchor biosynthesis | ko00563 | 8 | 3.90E-02 | 2.40E-01 | |
| Mismatch repair | ko03430 | 7 | 4.00E-02 | 2.30E-01 | |
| Phenylalanine metabolism | ko00360 | 5 | 4.70E-02 | 2.50E-01 | |
| Biosynthesis of unsaturated fatty acids | ko01040 | 5 | 4.70E-02 | 2.50E-01 | |
| Protein processing in endoplasmic reticulum | ko04141 | 18 | 5.50E-02 | 2.70E-01 | |
| Arginine biosynthesis | ko00220 | 6 | 6.40E-02 | 3.00E-01 | |
| MAPK signaling pathway - yeast | ko04011 | 12 | 6.60E-02 | 2.90E-01 | |
| Methane metabolism | ko00680 | 8 | 6.70E-02 | 2.90E-01 | |
| Degradation of aromatic compounds | ko01220 | 4 | 7.80E-02 | 3.10E-01 | |
| Other types of O-glycan biosynthesis | ko00514 | 5 | 8.20E-02 | 3.10E-01 | |
| N-Glycan biosynthesis | ko00510 | 8 | 9.20E-02 | 3.30E-01 | |
| Fatty acid degradation | ko00071 | 6 | 9.60E-02 | 3.30E-01 | |
| Subset 4 | Ribosome biogenesis in eukaryotes | ko03008 | 33 | 2.40E-05 | 2.30E-03 |
| RNA transport | ko03013 | 34 | 2.40E-05 | 1.20E-03 | |
| Purine metabolism | ko00230 | 34 | 5.10E-05 | 1.70E-03 | |
| RNA polymerase | ko03020 | 15 | 2.40E-04 | 5.70E-03 | |
| Steroid biosynthesis | ko00100 | 9 | 5.20E-03 | 9.50E-02 | |
| Biosynthesis of amino acids | ko01230 | 33 | 1.30E-02 | 1.80E-01 | |
| Proteasome | ko03050 | 13 | 1.40E-02 | 1.80E-01 | |
| Non-homologous end-joining | ko03450 | 6 | 2.00E-02 | 2.20E-01 | |
| Pyrimidine metabolism | ko00240 | 21 | 2.20E-02 | 2.20E-01 | |
| RNA degradation | ko03018 | 18 | 3.30E-02 | 2.80E-01 | |
| Cysteine and methionine metabolism | ko00270 | 12 | 4.30E-02 | 3.20E-01 | |
| Phosphatidylinositol signaling system | ko04070 | 7 | 5.00E-02 | 3.40E-01 | |
| Biosynthesis of antibiotics | map01130 | 49 | 6.00E-02 | 3.70E-01 | |
| Subset 5 | Metabolic pathways | map01100 | 239 | 2.60E-05 | 2.70E-03 |
| Biosynthesis of secondary metabolites | map01110 | 113 | 1.90E-04 | 1.00E-02 | |
| Protein processing in endoplasmic reticulum | ko04141 | 40 | 6.50E-04 | 2.20E-02 | |
| Biosynthesis of antibiotics | map01130 | 84 | 1.40E-03 | 3.60E-02 | |
| Basal transcription factors | ko03022 | 18 | 3.10E-03 | 6.30E-02 | |
| mRNA surveillance pathway | ko03015 | 23 | 4.50E-03 | 7.50E-02 | |
| Endocytosis | ko04144 | 31 | 9.50E-03 | 1.30E-01 | |
| Ubiquitin mediated proteolysis | ko04120 | 22 | 1.40E-02 | 1.70E-01 | |
| Spliceosome | ko03040 | 33 | 1.50E-02 | 1.70E-01 | |
| Phagosome | ko04145 | 17 | 3.20E-02 | 2.90E-01 | |
| Biosynthesis of amino acids | ko01230 | 46 | 3.40E-02 | 2.80E-01 | |
| Glycine, serine and threonine metabolism | ko00260 | 15 | 5.00E-02 | 3.60E-01 | |
| Citrate cycle (TCA cycle) | ko00020 | 15 | 5.00E-02 | 3.60E-01 | |
| Arginine and proline metabolism | ko00330 | 11 | 5.20E-02 | 3.50E-01 | |
| Proteasome | ko03050 | 16 | 5.20E-02 | 3.30E-01 | |
| Phenylalanine, tyrosine and tryptophan biosynthesis | ko00400 | 9 | 8.50E-02 | 4.60E-01 | |
| Glyoxylate and dicarboxylate metabolism | ko00630 | 12 | 9.80E-02 | 4.90E-01 | |
| Valine, leucine and isoleucine biosynthesis | ko00290 | 7 | 9.90E-02 | 4.80E-01 |
Fig. 6Plot of linear and quadratic coefficients for Legendre polynomials in each subset of yeast data
Fig. 7Heatmap of Legendre polynomial coefficients of yeast data
Fig. 8Top 10 over-represented GO terms in each subset (Subset 1:Red, Subset 2:Orange, Subset 3:Blue, Subset 4:Green, and Subset 5:Purple) in yeast data. Only Subset 3 has four over-represented GO terms. For the comparison, PAM was performed with various numbers of centers ranging from 3 to 15. The cell is colored with dark gray or light gray if PAM found the same GO terms with ORA test
Fig. 9Silhouette values in 4 subsets with principal points partitioning with J = 4 Legendre polynomial coefficients of E.coli data
Fig. 10Expression patterns of genes in each subset. Red lines represent smoothed gene expression mean of E.coli microarray data
Summary of over-represented GO terms (molecular function) in each subset of E.coli data
| Subset4 | Transposition | 26 | 3.10E-12 |
| Category | Term | Count |
|
| Subset1 | DNA recombination | 49 | 8.80E-06 |
| RNA-binding | 58 | 2.00E-04 | |
| Transposition | 31 | 2.60E-04 | |
| Protein biosynthesis | 26 | 3.60E-04 | |
| Transposable element | 31 | 3.90E-04 | |
| tRNA-binding | 16 | 7.90E-04 | |
| DNA-binding | 159 | 1.70E-03 | |
| tRNA processing | 26 | 2.90E-03 | |
| Nucleotide-binding | 180 | 2.90E-03 | |
| ATP-binding | 155 | 2.90E-03 | |
| Cytoplasm | 232 | 4.30E-03 | |
| Nucleotidyltransferase | 23 | 6.90E-03 | |
| Subset2 | Acetylation | 21 | 6.20E-07 |
| Purine biosynthesis | 9 | 2.10E-06 | |
| Oxidoreductase | 43 | 6.10E-06 | |
| Nitrate assimilation | 8 | 6.50E-05 | |
| Metal-binding | 59 | 2.40E-04 | |
| Ligase | 15 | 9.20E-04 | |
| Tricarboxylic acid cycle | 7 | 1.40E-03 | |
| Pyridoxal phosphate | 11 | 2.20E-03 | |
| NADP | 14 | 2.30E-03 | |
| Pyrimidine biosynthesis | 5 | 2.90E-03 | |
| Enterobactin biosynthesis | 4 | 6.10E-03 | |
| Transferase | 48 | 8.80E-03 | |
| Subset3 | Oxidoreductase | 137 | 5.80E-03 |
| Iron-sulfur | 58 | 7.40E-03 | |
| Subset4 | Transposition | 26 | 3.10E-12 |
| Transposable element | 26 | 5.20E-12 | |
| DNA recombination | 29 | 6.30E-09 | |
| Cytoplasm | 96 | 2.80E-06 | |
| Ribonucleoprotein | 16 | 2.00E-04 | |
| Bacterial flagellum | 9 | 2.90E-04 | |
| Ribosomal protein | 15 | 5.70E-04 | |
| Transmembrane beta strand | 13 | 6.00E-04 | |
| RNA-binding | 24 | 9.80E-04 | |
| DNA replication | 12 | 1.20E-03 | |
| Cell outer membrane | 18 | 2.30E-03 | |
| Ion transport | 21 | 3.20E-03 | |
| Bacterial flagellum biogenesis | 7 | 3.60E-03 | |
| rRNA processing | 10 | 4.50E-03 | |
| Methyltransferase | 13 | 7.40E-03 |