| Literature DB >> 34595238 |
Furqan Awan1,2, Muhammad Muddassir Ali3, Muhammad Hamid4, Muhammad Huzair Awan5, Muhammad Hassan Mushtaq2, Saeeda Kalsoom6, Muhammad Ijaz7, Khalid Mehmood8, Yongjie Liu1.
Abstract
The main aim of this study was to develop a set of functions that can analyze the genomic data with less time consumption and memory. Epi-gene is presented as a solution to large sequence file handling and computational time problems. It uses less time and less programming skills in order to work with a large number of genomes. In the current study, some features of the Epi-gene R-package were described and illustrated by using a dataset of the 14 Aeromonas hydrophila genomes. The joining, relabeling, and conversion functions were also included in this package to handle the FASTA formatted sequences. To calculate the subsets of core genes, accessory genes, and unique genes, various Epi-gene functions have been used. Heat maps and phylogenetic genome trees were also constructed. This whole procedure was completed in less than 30 minutes. This package can only work on Windows operating systems. Different functions from other packages such as dplyr and ggtree were also used that were available in R computing environment.Entities:
Mesh:
Year: 2021 PMID: 34595238 PMCID: PMC8478537 DOI: 10.1155/2021/5585586
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Steps involved in the work flow of Epi-gene package.
A. hydrophila genomes included in this study with the summary of calculated datasets.
| Bacterial strains | ID | Total number of genes | Number of accessory genes | Number of unique genes |
|---|---|---|---|---|
| 4AK4 | org1 | 3928 | 323 | 445 |
| Ah10 | org2 | 4178 | 847 | 171 |
| AHNIH1 | org3 | 4176 | 854 | 162 |
| AL0606 | org4 | 4252 | 922 | 170 |
| AL0971 | org5 | 4319 | 1158 | 1 |
| ATCC7966 | org6 | 4076 | 812 | 104 |
| D4 | org7 | 4371 | 1201 | 10 |
| GYK1 | org8 | 4226 | 1039 | 27 |
| J1 | org9 | 4307 | 1141 | 6 |
| JBN2301 | org10 | 4404 | 1237 | 7 |
| ML09-119 | org11 | 4320 | 1159 | 1 |
| NJ35 | org12 | 4512 | 1199 | 153 |
| PC104A | org13 | 4322 | 1161 | 1 |
| YL17 | org14 | 4099 | 694 | 245 |
Figure 2Graphical representation of unique genes across the included A. hydrophila strains.
Figure 3Dendrogram showing the phylogenetic relation of A. hydrophila genomes.
Figure 4Heat map showing the graphical representation of phylogenetic relation of A. hydrophila genomes.
Figure 5Heat map showing the phylogenetic relation of A. hydrophila genomes along with the presence or absence of clusters.
Figure 6PCA-based results performed by utilizing the pan-matrix based on the sequence identity. (a) Scree plot describes the reduced dimensions and eigenvalues. (b) Individuals included in PCA show the clustering that is quite similar to the bin-matrix-based clustering.
Figure 7PCA-based biplot describing the genomes and homogenous gene clusters (colors filled based on cos2 character of variables).