| Literature DB >> 34249514 |
Consolata Gakii1,2, Billiah Kemunto Bwana3, Grace Gathoni Mugambi2, Esther Mukoya2, Paul O Mireji4, Richard Rimiru2.
Abstract
BACKGROUND: High-throughput sequencing generates large volumes of biological data that must be interpreted to make meaningful inference on the biological function. Problems arise due to the large number of characteristics p (dimensions) that describe each record [n] in the database. Feature selection using a subset of variables extracted from the large datasets is one of the approaches towards solving this problem.Entities:
Keywords: Association rule mining; Co-expression network; Discretization; In silico analysis; RNASeq data
Year: 2021 PMID: 34249514 PMCID: PMC8255069 DOI: 10.7717/peerj.11691
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Top 50 chemosensory genes.
Top 50 differentially expressed genes after exposure to repellant (δ-nonalactone) or attractant (Ɛ-nonalactone). (A) Top 50 chemosensory genes and associating genes with no assigned function. (B) Top 50 non-chemosensory genes and associating genes with no assigned function.
Figure 2Co-expression network analysis using WGCNA.
(A) Scale-free fit index versus soft-thresholding power. (B) Mean connectivity versus soft-thresholding power of 12. (C) Cluster dendrogram based on dissimilarity measures (1-TOM). The branches correspond to modules of highly interconnected groups of genes. Colors in the horizontal bar represent the modules. A total of 12 modules were identified. (D) Module network dendrogram constructed by clustering module eigengene distances. The red line shows the merging threshold.
Figure 3Modules identified along with the number of genes in each module.
Total number of modules identified along with the number of genes in each module.
Figure 4Co-expression networks for the genes in the turquoise, blue, brown and yellow modules.
Co-expression networks analysis. (A) Subnetwork for genes in the turquoise, blue, brown and yellow modules. (B) Filtered co-expression network for the genes with a degree value greater than five. (C) Network summary statistics before filtering. (D) Network summary statistics after filtering.
Network topology for the top genes.
| No. | Gene symbol | Clustering coefficient | Degree | Betweenness centrality | |
|---|---|---|---|---|---|
| 1 | 0.47 | 11 | 0.06 | ||
| 2 | 0.53 | 11 | 0.07 | ||
| 3 | 0.38 | 10 | 0.1 | ||
| 4 | 0.44 | 10 | 0.2 | ||
| 5 | 0.6 | 10 | 0.03 | ||
| 6 | 0.29 | 10 | 0.08 | ||
| 7 | 0.56 | 9 | 0.02 | ||
| 8 | 0.58 | 9 | 0.01 | ||
| 9 | 0.31 | 9 | 0.24 | ||
| 10 | 0.5 | 9 | 0.11 | ||
| 11 | 0.36 | 9 | 0.32 | ||
| 12 | 0.57 | 8 | 0.18 | ||
| 13 | 0.61 | 8 | 0.03 | ||
| 14 | 0.39 | 8 | 0.05 | ||
| 15 | 0.54 | 8 | 0 | ||
| 16 | 0.32 | 8 | 0.22 | ||
| 17 | 0.46 | 8 | 0.2 | ||
| 18 | 0.68 | 8 | 0.02 | ||
| 19 | 0.46 | 8 | 0.09 | ||
| 20 | 0.43 | 8 | 0.14 |
Association rules among genes that showed significant upregulation.
| No. | Association rule | Set |
|---|---|---|
| 1. | {Ir84a,Or2a,Or42a,Or56a} => {CG3679} | A |
| 2. | {Or2a,Or42a,Or56a,Or49b} => {CG3679} | |
| 3. | {Ir84a,Or42a,Or56a,Or49b} => {CG3679} | |
| 4. | {Or88a,Gr63a,CG5273,CG17572} => {CG18480} | |
| 5. | {CG4950,Or88a,Gr63a,CG5273} => {CG18480} | |
| 6. | {Or88a,Gr63a,CG17572,CG31663} => {CG18480} | |
| 7. | {Or88a, Gr63a, CG5273,CG17572} => {CG31663} | |
| 8. | {CG4950, Or88a, Gr63a,CG5273} => {CG31663} | |
| 9. | {CG4950,Or88a,CG18480,Gr63a} => {CG31663} | |
| 10. | {CG4950,Or88a,Gr63a,CG31663} => {CG17572} | |
| 11. | {Or88a,Gr63a,CG5273,CG31663} => {CG17572} | |
| 12. | {Or88a,Gr63a,CG17572,CG31663} => {CG5273} | |
| 13. | {CG4950,Or88a,Gr63a,CG17572} => {CG5273} | |
| 14. | {CG4950,Or88a,CG18480,Gr63a} => {CG5273} | |
| 15. | {Tsf1,Scp1,vkg,Adgf.A} => {CG6126} | B |
| 16. | {Ppn,Sp7,NtR,Adgf.A} => {CG6126} | |
| 17. | {vkg,Sp7,NtR,Adgf.A} => {CG6126} | |
| 18. | {LanB2,Sp7,NtR,Adgf.A} => {CG6126} | |
| 19. | {Sp7,NtR,Adgf.A,CG6126} => {CG3168} | |
| 20. | {Ppn,NtR,Adgf.A,CG6126} => {CG3168} | |
| 21. | {Idgf4,NtR,Adgf.A,CG6126} => {CG3168} | |
| 22. | {Ppn,Sp7,Adgf.A,CG6126} => {CG3168} |
Figure 5Geneset Enrichment Analysis (GSEA) of differentially expressed transcripts in G. m. morsitans antennae.