| Literature DB >> 28185565 |
Abstract
BACKGROUND: Reconstructing ancestral gene orders in the presence of duplications is important for a better understanding of genome evolution. Current methods for ancestral reconstruction are limited by either computational constraints or the availability of reliable gene trees, and often ignore duplications altogether. Recently, methods that consider duplications in ancestral reconstructions have been developed, but the quality of reconstruction, counted as the number of contiguous ancestral regions found, decreases rapidly with the number of duplicated genes, complicating the application of such approaches to mammalian genomes. However, such high fragmentation is not encountered when reconstructing mammalian genomes at the synteny-block level, although the relative positions of genes in such reconstruction cannot be recovered.Entities:
Keywords: Ancestral genome reconstruction; Duplications; Gene orders; Synteny blocks
Mesh:
Year: 2016 PMID: 28185565 PMCID: PMC5123302 DOI: 10.1186/s12859-016-1262-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1MULTIRES flowchart. A high-level overview of the MULTIRES pipeline
Fig. 2a Estimating localizations. Here it shows how MULTIRES defines and infers localizations. Coloured solid wedges represent gene families, with wedges of the same colour belonging to the same family. Synteny blocks are indicated by hollow wedges, with colour indicating homology. The orientation of the wedges represent the orientation of the genes/blocks. Inferring adjacencies between gene families parsimoniously results in the ancestral adjacency graph shown on the top left, with edges representing adjacencies between gene ends. We also have a set of contiguous ancestral regions (CARs) reconstructed at the ancestor, each of which consists of an ordering of the ancestral synteny blocks. On the right of the tree, we display an example of a CAR, and the CAR after all synteny blocks have been doubled into head and tail extremities. We define windows of length 3 as consecutive subsequences of 3 extremities on the CAR. The windows, indicated by coloured line segments in the figure, are used to partition the CAR. In the diagram, we observe after partitioning that one copy of the brown gene (g 1) always occurs in the red window, and one always occurs in the blue window, and never in their intersection. This allows us to partition the brown gene family into two subfamilies, and , called localizations, which are restricted to appear only in the relevant blocks, leading to the localized adjacency graph at the bottom. b Optimization and consensus. Here we show a localized adjacency graph (top) with copy numbers associated to each localization (numbers under the genes). Partitioning the ancestral CARs into segments (black line segments) defines an ordered sequence of induced subgraphs. Using the algorithm given by [24] on each induced subgraph results in a set of adjacencies shown at each layer, with each localization adjacent to at most as many adjacencies as its copy number. For example, the brown localization can have at most 1 copy in the gene order, making it adjacent to at most 2 other localizations. The algorithm indicates that these adjacencies are to the red () and orange () localizations. Finally, we combine the subgraphs and find a linear gene order by finding the most frequently conserved adjacencies and using the order of the segments. In the example, since the purple localization is only conserved in Segment 3, while the cyan localization is conserved in Segment 2 as well, we can resolve the gene order around the duplicated orange localization
Fig. 3Mammalian species tree. The species tree, with the ancestor of interest marked in red
Fig. 4Adjacency recovery comparison. Comparison of adjacency true positive (TP), false positive (FP) and false negative (FN) rates for MULTIRES against FPMAG (cf. [28]) and MGRA2 [19] on both the low rearrangement rate and high rearrangement rate simulations. FPMAG fails to recover a number of ancestral adjacencies, despite using repeat spanning intervals. The results using MGRA2 are provided to contrast how much of a difference the presence of duplications can make in a reconstruction
Fig. 5Interval recovery. The ratios of intervals recovered against the size of the intervals, for both simulation sets. Note the steady difference in the ratio of recovered intervals: fewer intervals in the high rearrangement set are recovered. The red plot has been shifted by 0.1 along the x-axis for easier viewing. Longer intervals are lost due to the number of genes which are not recovered in the reconstruction
Comparison of the gene order reconstruction of the primate-rodent ancestral X-chromosome using MGRA2, FPMAG and MULTIRES
| Conserved | MGRA2 | FPMAG |
| |
|---|---|---|---|---|
| Genes | 746 | 132 | 429 | 518.12 (19.81) |
| Adjacencies | 749 | 130 | 350 | 468.31 (6.05) |
| Recovered | - | 42 | 350 | 468.31 (6.05) |
| Fragments | N/A | 1 | 79 | 53.16 (4.76) |
The row for total adjacencies indicates the number of adjacencies found in the reconstruction. The third row indicates the number of reconstructed adjacencies which are conserved in 2 or more descendant species. Note that FPMAG and MultiRes only recover conserved adjacencies. MGRA2 can also limit the number of CARs reconstructed and find a single CAR. The results for MultiRes are averaged over all parameter combinations. The low standard deviations demonstrate the robustness of the method to parameter choices