| Literature DB >> 35176153 |
Shenghan Gao1,2, Xiaofei Yang2,3,4, Jianyong Sun5, Xixi Zhao4, Bo Wang1,2, Kai Ye1,2,4,6,7.
Abstract
Significant improvements in genome sequencing and assembly technology have led to increasing numbers of high-quality genomes, revealing complex evolutionary scenarios such as multiple whole-genome duplication events, which hinders ancestral genome reconstruction via the currently available computational frameworks. Here, we present the Inferring Ancestor Genome Structure (IAGS) framework, a novel block/endpoint matching optimization strategy with single-cut-or-join distance, to allow ancestral genome reconstruction under both simple (single-copy ancestor) and complex (multicopy ancestor) scenarios. We evaluated IAGS with two simulated data sets and applied it to four different real evolutionary scenarios to demonstrate its performance and general applicability. IAGS is available at https://github.com/xjtu-omics/IAGS.Entities:
Keywords: IAGS; WGD; inferring ancestral genome; multicopy ancestor
Mesh:
Year: 2022 PMID: 35176153 PMCID: PMC8896626 DOI: 10.1093/molbev/msac041
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Overview of the four computational models of IAGS. (A) Genome median problem (GMP) model. (B) Guided genome halving problem (GGHP) model. (C) GMP with a multicopy ancestral genome (multicopy GMP model). (D) GGHP with a multicopy ancestral genome (multicopy GGHP model). The red stars denote WGD events. The blue point denotes the divergent ancestor, and the red point denotes the preduplicated ancestor. The green circles indicate species whose syntenic block sequences were used as the input for the calculation. There were the child and outgroup species in GMP and multicopy GMP and the duplicated child and outgroup species in GGHP and multicopy GGHP. The ellipses represent four IP formulations. The rectangles represent the output of each step. The dashed line indicates the guide species used for endpoint matching optimization (EMO) and self-block matching optimization (self-BMO).
Fig. 2.Performance of IAGS under two simulated evolutionary scenarios. (A) Non-CRE evolutionary scenarios. The red stars represent WGD events. The blue and red points indicate the ancestors. The green circles represent species in the evolutionary trees with the target copy number labeled. (B) The result of the reconstruction of intermediate ancestors in the non-CRE simulation. The species block copy number is in brackets, and the small plus “+” indicates matching (EMO) with this species. (C) Quadratic polynomial fitting of the relationship between the CRE ratio and the adjacency inconsistency ratio for the four models. The red points and red lines represent the IAGS results. The blue points and blue line represent MGRA2 results. The green points and green lines represent the Gapadj results. The quadratic fitting functions are provided at the bottom. P value is calculated by Wilcoxon rank-sum test.
Fig. 3.Inferring ancestral genome structures of Brassica and yeast species. (A) Evolutionary history of three Brassica species. The black point is the location of the inferred ancestor. (B) Dotplot comparing the genome structure with Perumal et al. MRCA of Brassica rapa and Brassica olearacea. The y axis represents previously reported ancestral Brassica. The x axis represents ancestral genomes reconstructed by IAGS. Adjacency inconsistency was computed and compared with a published ancestor. (C) The number of supporting adjacencies for the result of IAGS and Perumal et al. ancestor in input species. (D) Evolutionary history of nine yeast species. The blue ellipse labels the WGD event at approximately 100 Ma. “*” indicates that the numbers of shuffling events were directly computed against the pre-WGD ancestor. The black triangle represents six non-WGD species. A detailed diagram of all outgroup species is shown in supplementary figure 8, Supplementary Material online. (E) Comparison of the ancestral genome reconstructed by IAGS and Gordon et al. pre-WGD yeast. (F) The number of supporting adjacencies for the result of IAGS and Gordon et al. in input species. The squares containing colored blocks represent the ancestral chromosomes, and how the syntenic blocks are rearranged in the different species. BP is breakpoint.
Fig. 4.Applying IAGS to complex scenarios with multicopy ancestors. (A) Evolutionary history of five Gramineae species. “*” indicates that the shuffling events leading to ancestor 3 were computed relative to ancestor 1. (B) Dotplot comparing the genome structure with Murat et al. post-ρ ancestral grass karyotype (AGK). The y axis represents post-ρ AGK. The x axis represents IAGS ancestor 1. Adjacency inconsistency was computed and compared with post-ρ AGK. (C) The number of supporting adjacencies for the result of IAGS and post-ρ AGK in input species. (D) Dotplot comparing ancestor 4 and Sorghum bicolor. (E) Dotplot comparing ancestor 3 and S. bicolor. (F) Evolutionary history of three Papaver species. “*” indicates that the shuffling events leading to Papaver rhoeas were computed relative to ancestor 1. The squares containing colored blocks represent the ancestral chromosomes, and how the syntenic blocks are rearranged in the different species. BP is breakpoint.
Notations Used in the GMP Formulation.
| Notations | Meaning |
|---|---|
|
| A list of genome adjacency matrixes for input species |
|
| 2D variable representing the ancestor adjacency matrix |
|
| Number of genome blocks |
|
| Target copy number of the ancestor |
Notations used in GGHP Formulation.
| Notations | Meaning |
|---|---|
|
| Genome adjacency matrix for a duplicated species |
|
| Genome adjacency matrix for an outgroup species |
|
| Number of genome blocks |
|
| 2D variable representing the ancestor adjacency matrix |
|
| Target copy number of ancestor |
Notations Used in Reduced Variable GMP Formulation.
| Notations | Meaning |
|---|---|
|
| A list of genome adjacencies for input species |
|
| A list of variables representing ancestor adjacencies |
|
| Each endpoint adjacency options’ index range in |
|
| Self-connection adjacency option indexes in |
|
| Symmetry adjacency index of each item in |
|
| Number of genome blocks |
|
| Number of all endpoint adjacency options (length of |
|
| Target copy number of ancestor |
Notations Used in the Reduced Variable GGHP Formulation.
| Notations | Meaning |
|---|---|
|
| Genome adjacencies for a duplicated species |
|
| Genome adjacencies for an outgroup species |
|
| A list of variables representing ancestor adjacencies |
|
| Each endpoint adjacency options’ index range in |
|
| Self-connection adjacency option indexes in |
|
| Symmetry adjacency index of each item in |
|
| Number of genome blocks |
|
| All endpoint adjacency options number (length of |
|
| Target copy number of ancestor |
Notations Used in the EMO and BMO Formulations.
| Notations | Meaning |
|---|---|
|
| 3D variable representing the matching matrix list |
|
| Matching pair data set |
|
| Copy numbers of the target genome and guide genome |
|
| Number of matching matrixes in |
|
| Matching ratio between target genome and guide genome |