| Literature DB >> 30321324 |
Mathieu Rouard1, Gaetan Droc2,3, Guillaume Martin2,3, Julie Sardos1, Yann Hueber1, Valentin Guignon1, Alberto Cenci1, Björn Geigle4, Mark S Hibbins5,6, Nabila Yahiaoui2,3, Franc-Christophe Baurens2,3, Vincent Berry7, Matthew W Hahn5,6, Angelique D'Hont2,3, Nicolas Roux1.
Abstract
Edible bananas result from interspecific hybridization between Musa acuminata and Musa balbisiana, as well as among subspecies in M. acuminata. Four particular M. acuminata subspecies have been proposed as the main contributors of edible bananas, all of which radiated in a short period of time in southeastern Asia. Clarifying the evolution of these lineages at a whole-genome scale is therefore an important step toward understanding the domestication and diversification of this crop. This study reports the de novo genome assembly and gene annotation of a representative genotype from three different subspecies of M. acuminata. These data are combined with the previously published genome of the fourth subspecies to investigate phylogenetic relationships. Analyses of shared and unique gene families reveal that the four subspecies are quite homogenous, with a core genome representing at least 50% of all genes and very few M. acuminata species-specific gene families. Multiple alignments indicate high sequence identity between homologous single copy-genes, supporting the close relationships of these lineages. Interestingly, phylogenomic analyses demonstrate high levels of gene tree discordance, due to both incomplete lineage sorting and introgression. This pattern suggests rapid radiation within Musa acuminata subspecies that occurred after the divergence with M. balbisiana. Introgression between M. a. ssp. malaccensis and M. a. ssp. burmannica was detected across the genome, though multiple approaches to resolve the subspecies tree converged on the same topology. To support evolutionary and functional analyses, we introduce the PanMusa database, which enables researchers to exploration of individual gene families and trees.Entities:
Mesh:
Year: 2018 PMID: 30321324 PMCID: PMC6282646 DOI: 10.1093/gbe/evy227
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of the Gene Clustering Statistics Per (Sub)Species
| # genes | 35,276 | 45,069 | 32,692 | 44,702 | 36,836 |
| # genes in orthogroups | 31,501 | 34,947 | 26,490 | 33,059 | 29,225 |
| # unassigned genes | 3,775 | 10,122 | 6,202 | 11,643 | 7,611 |
| % genes in orthogroups | 89.3 | 77.5 | 81 | 74 | 79.3 |
| % unassigned genes | 10.7 | 22.5 | 19 | 26 | 20.7 |
| # orthogroups containing species | 24,074 | 26,542 | 21,446 | 25,730 | 23,935 |
| % orthogroups containing species | 74.4 | 82 | 66.2 | 79.5 | 73.9 |
| # species-specific orthogroups | 6 | 46 | 47 | 11 | 9 |
| # genes in species-specific orthogroups | 14 | 104 | 110 | 23 | 21 |
| % genes in species-specific orthogroups | 0 | 0.2 | 0.3 | 0.1 | 0.1 |
. 1.—Intersection diagram showing the distribution of shared gene families (at least two sequences per OG) among M. a. banksii “Banksii,” M. a. zebrina “Maia Oa,” M. a. burmannica “Calcutta 4,” M. a. malaccensis “DH Pahang,” and M. balbisiana “PKW” genomes. The figure was created with UpsetR (Lex et al. 2014).
. 2.—Illustration of gene tree discordance. (A) Cloudogram of single copy OGs (CDS) visualized with Densitree. The blue line represents the consensus tree as provided by Densitree. (B) Species tree with bootstrap-like support based on corresponding gene tree frequency from table 2 (denoted topology number 2). PKW, M. balbisiana “PKW”; C4, M. acuminata burmannica “Calcutta 4”; M, M. acuminata zebrina “Maia Oa”; DH, M. acuminata malaccensis “DH Pahang”; B, M. acuminata banksii “Banksii”.
Frequency of Gene Tree Topologies of the 8,030 Single Copy OGs
| No. | Topology | # CDS (%) | # Protein (%) | # Gene (%) | # Gene Bootstrap >90 (%) |
|---|---|---|---|---|---|
| 1 | (PKW,(C4,(M,(DH, B)))) | 10.58 | 13.72 | ||
| 2 | (PKW,(C4,(DH,(B, M)))) | 10.8 | 10.48 | 11.92 | 14.88 |
| 3 | (PKW,((DH, C4),(B, M))) | 9.59 | 7.28 | 12.73 | |
| 4 | (PKW,(M,(C4,(DH, B)))) | 9.53 | 7.78 | 5.91 | |
| 5 | (PKW,(C4,(B,(DH, M)))) | 8.02 | 7.37 | 8.89 | 8.44 |
| 6 | (PKW,((DH, B),(C4, M))) | 7.67 | 6.55 | 9.16 | 12.56 |
| 7 | (PKW,(M,(B,(DH, C4)))) | 6.66 | 8.21 | 5 | 3.06 |
| 8 | (PKW,(B,(M,(DH, C4)))) | 5.58 | 5.23 | 4.61 | 2.53 |
| 9 | (PKW,(DH,(C4,(B, M)))) | 5.41 | 5.21 | 5.18 | 4.96 |
| 10 | (PKW,(B,(C4,(DH, M)))) | 5.26 | 4.45 | 6.2 | 7.07 |
| 11 | (PKW,(B,(DH,(C4, M)))) | 5.02 | 6.82 | 3.36 | 1.9 |
| 12 | (PKW,(M,(DH,(B, C4)))) | 4.23 | 4.68 | 2.84 | 1.16 |
| 13 | (PKW,((DH, M),(B, C4))) | 4.037 | 3.61 | 4.79 | 5.06 |
| 14 | (PKW,(DH,(B,(C4, M)))) | 3.85 | 4.18 | 2.44 | 0.63 |
| 15 | (PKW,(DH,(M,(B, C4)))) | 2.38 | 2.77 | 1.92 | 0.52 |
Note.—In bold, the most frequent topology.
PKW, Musa balbisiana “PKW”; C4, Musa acuminata burmannica “Calcutta 4”; M, Musa acuminata zebrina “Maia Oa”; DH, Musa acuminata malaccensis “DH Pahang”; B, Musa acuminata banksii “Banksii”.
. 3.—Species topologies computed with three different approaches. (A) Maximum likelihood tree inferred from a concatenated alignment of single-copy genes (CDS). (B) Supertree-based method applied to single and multilabelled gene trees. (C) Quartet-based model applied to protein, CDS, and gene alignments.
Four-Taxon ABBA-BABA Test (D-Statistic) Used for Introgression Inference from the Well-Supported Topology from Fig. 3
| P1 | P2 | P3 | BBAA | ABBA | BABA | |||
|---|---|---|---|---|---|---|---|---|
| Malaccensis (DH) | Banksii (B) | Burmannica (C4) | 12185 | 4289 | 8532 | 0.51 | −0.33 | <2.2e-16 |
| Malaccensis (DH) | Zebrina (M) | Burmannica (C4) | 9622 | 5400 | 9241 | 0.6 | −0.26 | < 2.2e-16 |
| Zebrina (M) | Banksii (B) | Burmannica (C4) | 11204 | 6859 | 6782 | 0.54 | 0.005 | 0.5097 |
| Malaccensis (DH) | Banksii (B) | Zebrina (M) | 10450 | 7119 | 6965 | 0.57 | 0.02 | 0.1944 |
Discordance=(ABBA+BABA)/Total
D =(ABBA−BABA)/(ABBA+BABA)
Based on Pearson chi-squared.
. 4.—Overview of available interfaces for the PanMusa database. (A) Homepage of the website. (B) List of functionally annotated OGs. (C) Graphical representation of the number of sequence by species. (D) Consensus InterPro domain schema by OG. (E) Individual gene trees visualized with PhyD3. (F) Multiple alignment of OG with MSAviewer.
. 5.—Area of distribution of Musa species in Southeast Asia as described by Perrier et al. (2011); including species tree of Musa acuminata subspecies based on results described in figure 4. Areas of distribution are approximately represented by colors; hatched zone shows area of overlap between two subspecies where introgression may have occurred.