| Literature DB >> 29363431 |
Yongbing Zhao1,2, Chen Sun1,2, Dongyu Zhao1,2, Yadong Zhang1,2, Yang You3, Xinmiao Jia1,2, Junhui Yang1,2, Lingping Wang1,2, Jinyue Wang1,2, Haohuan Fu3, Yu Kang1, Fei Chen1, Jun Yu1, Jiayan Wu4,5, Jingfa Xiao6,7,8,9.
Abstract
BACKGROUND: Since PGAP (pan-genome analysis pipeline) was published in 2012, it has been widely employed in bacterial genomics research. Though PGAP has integrated several modules for pan-genomics analysis, how to properly and effectively interpret and visualize the results data is still a challenge. RESULT: To well present bacterial genomic characteristics, a novel cross-platform software was developed, named PGAP-X. Four kinds of data analysis modules were developed and integrated: whole genome sequences alignment, orthologous genes clustering, pan-genome profile analysis, and genetic variants analysis. The results from these analyses can be directly visualized in PGAP-X. The modules for data visualization in PGAP-X include: comparison of genome structure, gene distribution by conservation, pan-genome profile curve and variation on genic and genomic region. Meanwhile, result data produced by other programs with similar function can be imported to be further analyzed and visualized in PGAP-X. To test the performance of PGAP-X, we comprehensively analyzed 14 Streptococcus pneumonia strains and 14 Chlamydia trachomatis. The results show that, S. pneumonia strains have higher diversity on genome structure and gene contents than C. trachomatis strains. In addition, S. pneumonia strains might have suffered many evolutionary events, such genomic rearrangements, frequent horizontal gene transfer, homologous recombination, and other evolutionary process.Entities:
Keywords: Genetic variation; Genome visualization; Pan-genomics
Mesh:
Year: 2018 PMID: 29363431 PMCID: PMC5780747 DOI: 10.1186/s12864-017-4337-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The frame for PGAP-X. All analytical modules are grouped in to three layers: data visualization layer, data interface layer and data analysis layer, which are separated with dashed line. Data interface layer is divided into two parts, raw input data (grey) and output data (black), and the output data could also be imported out-side for visualization
Fig. 2The snapshot of PGAP-X graphic interface. The interface of PGAP-X in different analytical modules and hierarchical interactive interface
Fig. 3The whole genome alignment result among 14 Streptococcus pneumoniae strains. a Is the default status of the visualized alignment result, and (b) is the re-plotted and visualized alignment result after clicking the black triangle, and the whole genome was realigned based on the center of the clicked sites. Colored blocks are different homologous genomic fragment regions in each strain, and all DNA genomic fragments are marked with the same color with the corresponding homologous genomic fragment regions in other strains. Blocks below the center line indicate that regions are aligned in the reverse complement orientation (inversion)
Fig. 4Snapshot of gene distribution by their conservation in 14 S. pneumonia strains genomes. Gene distribution can be presented in three kinds of models: (a) According to the gene conservation level, the location of all genes were filled with gradient colors; (b) From pan-genome sight, all genes are classified into core genes, dispensable genes, and strains specific genes, and the locations of those three classes of genes are filled with three different colors; (c) All genes are classified into core genes, high conserved dispensable genes, low conserved genes, and strain specific genes, and the locations of those four classes genes are filled with four different colors. d Is the distribution of those strain specific genes
Fig. 5The diversity of gene contents in 14 S. pneumonia strains genomes. a Is the distribution of gene orthologous clusters with different conservation level. b Is the pan-genome profile curve for both pan-genome size and core genome size
Fig. 6Genetic variation in 14 S. pneumonia strains genome from both genome and gene scale. Based on whole genome alignment, the snapshot of genetic variation on the genome scale are shown as (a) and the extended panel is the zoomed-in snapshot in the red rectangle. Based alignment result in each orthologous cluster, the snapshot of genetic variation on the gene scale are shown as (b), and the extended panel is the zoomed-in snapshot in the red rectangle. Information on the status bar is the genetic variation information on the selected region (in red circle) by right clicking