| Literature DB >> 34434663 |
Jason W Shapiro1, Catherine Putonti1,2,3.
Abstract
BACKGROUND: A pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.Entities:
Keywords: Bacteriophage; Fragmented genes; Gene clustering; Pangenome
Year: 2021 PMID: 34434663 PMCID: PMC8351571 DOI: 10.7717/peerj.11950
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Flowchart of the Rephine.r pipeline.
Figure 2Fragmented gene calls can be identified from alignments.
(A) An original multiple sequence alignment where the gene from NC_041902 has been split into two fragments by an indel. (B) The corrected alignment following Rephine.r. Highlighted colors are used to indicate regions of each fragment and where they correspond within an intact homolog.
Summary of results of running Rephine.r for each phage group.
| Studiervirinae | Tevenvirinae | Pbunaviruses | |
|---|---|---|---|
| Number of genomes | 145 | 127 | 30 |
| Mean genome size | 39696 | 174775 | 66068 |
| Initial gene calls | 6956 | 35436 | 3540 |
| Initial gene clusters | 558 | 4067 | 195 |
| Initial core genes | 12 | 27 | 28 |
| Initial SCG size | 3 | 13 | 19 |
| New clusters after merging | 16 | 64 | 2 |
| Clusters involved in a merger | 63 | 270 | 5 |
| Biggest merger | 7 | 30 | 3 |
| Core genes after merging | 14 | 37 | 28 |
| SCG size after merging | 3 | 13 | 19 |
| Defragmented clusters | 14 | 99 | 17 |
| SCG size after fusion and merge | 8 | 22 | 26 |
| Additional fusions after merge | 1 | 7 | 1 |
| New core genes after final fusion | 0 | 0 | 0 |
| Total SCG gain | 5 | 9 | 7 |
| Mean tree support before | 77.14 | 87.24 | 63.6 |
| Mean tree support after | 90.55 | 93.44 | 69.57 |
Figure 3Studiervirinae phylogeny before (A) and after (B) using Rephine.r to correct the SCG.
Bootstrap support is shown by coloring branches preceding nodes, with low support (from 0 to 70) ranging from white to red. Increasing the size of the SCG reduced the number of low-support branches.