| Literature DB >> 31188829 |
Christophe Guyeux1, Jean-Claude Charr1, Hue T M Tran2, Agnelo Furtado2, Robert J Henry2, Dominique Crouzillat3, Romain Guyot4,5, Perla Hamon6.
Abstract
Chloroplast sequences are widely used for phylogenetic analysis due to their high degree of conservation in plants. Whole chloroplast genomes can now be readily obtained for plant species using new sequencing methods, giving invaluable data for plant evolution However new annotation methods are required for the efficient analysis of this data to deliver high quality phylogenetic analyses. In this study, the two main tools for chloroplast genome annotation were compared. More consistent detection and annotation of genes were produced with GeSeq when compared to the currently used Dogma. This suggests that the annotation of most of the previously annotated chloroplast genomes should now be updated. GeSeq was applied to species related to coffee, including 16 species of the Coffea and Psilanthus genera to reconstruct the ancestral chloroplast genomes and to evaluate their phylogenetic relationships. Eight genes in the plant chloroplast pan genome (consisting of 92 genes) were always absent in the coffee species analyzed. Notably, the two main cultivated coffee species (i.e. Arabica and Robusta) did not group into the same clade and differ in their pattern of gene evolution. While Arabica coffee (Coffea arabica) belongs to the Coffea genus, Robusta coffee (Coffea canephora) is associated with the Psilanthus genus. A more extensive survey of related species is required to determine if this is a unique attribute of Robusta coffee or a more widespread feature of coffee tree species.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31188829 PMCID: PMC6561552 DOI: 10.1371/journal.pone.0216347
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Species considered in this study for chloroplast genomes analyses.
Herbarium codes after Holmgren et al. (1990). Germplasm source BR (Botanic Gardens of Belgium), BRC (Biological Resources Center, Reunion), CBI (Coffee Board of India), K (Royal Botanic Gardens, Kew (UK), KCRS (Kianjavato Coffee Research Station (Madagascar), P (Natural History Museum, Paris (France)).
| Species name | Plant, voucher herbarium code | Country of origin [African sub-region] | Germplasm collection source |
|---|---|---|---|
| PBT1 (CCRI) | India | CBI | |
| PBT5 (CCRI) | India | CBI | |
| D. Crayn 1196 (CNS) | Australia | CNS | |
| PSI11 (K,P) | Ivory Coast | BRC | |
| HOR (K) | Indonesia | ICRI | |
| 2003 1365–45 (BR) | Cameroon | BR | |
| NA | NA | NC_008535 | |
| DH200-94 | Democratic Republic of Congo | BRC | |
| BM19/20 (K, MO, TAN) | Comoros | BRC | |
| G57 (K) | Ivory Coast | BRC | |
| PET (P, K) | Mauritius | BRC | |
| A.206 (P) | Madagascar | KCRS | |
| H53 (K) | Kenya | BRC | |
| IB62 (K) | Mozambique | BRC | |
| FB55 (K) | Ivory Coast | BRC | |
| A.252 (K, MO, TAN) | Madagascar | KCRS |
Fig 1Annotation differences from Dogma and GeSeq using C. arabica Cp genome as study case.
The Cp genome, split into two parts, is described as an ordered list of predicted coding sequences, leading to two black dotted lines for each part, surrounded by gene names written in blue. In each part, the upper line entitled C. arabica corresponds to the Dogma annotations, while the lower line is for GeSeq. A red edge indicates paralogous genes within a given genome, while a blue line is drawn when predicted sequences are similar between the two annotations.
Fig 2Evolution of gene content based on Dogma annotations.
Each node contains the genome pan genes that are missing in the species under consideration (at the level of leaves, for species currently living, and at the level of internal nodes, concerning their ancestors). Gene insertion or deletion events are indicated at the branch level.
Fig 3Maximum likelihood nuclear phylogeny of coffee trees.
The molecular tree was based on 22,800 SNPs as described in [18] and a larger set of species (see S1 Table). Branch lengths are proportional to inferred nucleotide substitutions. The values at the nodes indicate the bootstrap support of each branch in percent. The main clades are shaded, i.e., blue for Psilanthus, pink for Coffea and among this latter group the main geographic areas are shaded pale green for Africa, yellow for Mascarenes and pale grey for Madagascar and Comoros (yellowish branch).
Fig 4Nuclear phylogenetic network representation.
The NeighborNet method of the Maximum likelihood nuclear phylogeny of coffee trees presented in Fig 3 was used to obtain the phylogenetic network.
Fig 5Maternal phylogenetic relationships.
The ML phylogenetic tree was constructed based on a set of 16 coffee trees representatives of the main geographic clades, and whole chloroplast sequences study using Emmenopterys henryi as outgroup. Psilanthus (in blue) and Coffea (in red) species clustered into two well-supported clades with the exception of C. canephora belonging to the Psilanthus clade.
Fig 6Cp genome gene contents for some pairwise comparisons based on GeSeq annotations.
Each of the five images compares the two genomes of sister species in phylogeny. Each genome is represented as a dotted line, each dot being a gene (the dot is labeled by its name). A red curve indicates paralogy between genes inside a given genome, while a blue line is for gene orthology between two distinct genomes.
Fig 7Gentianales and its neighboring orders phylogenetic relationships.
The tree is based on the taxonomy and data available on the NCBI website.
Fig 8Gene mutation rates per species among each clade.
The gene mutation rate between a species and its ancestor for a given gene is equal to the Levenshtein distance between the translated gene sequences of the specie and the ancestor divided by the length of their alignment. (A) Gene mutation rates between the species in clade 1 (Psilanthus) and the reconstructed ancestor of the clade. (B) Gene mutation rates between the species in clade 2 (Coffea) and the reconstructed ancestor of the clade.
Number of species in which one or several genes gene are different from the ancestral version.
Genes have been ordered according to their variability, from highly to less variable and according to the number of species in which these genes are different from the ancestral version. For instance, all the species of Clade 1 (7) have rpl16 and rps16 different from their ancestor.
| number of species | Clade 1 ( | Clade 2 ( |
|---|---|---|
| 9 | matK, rpoC2, rpoC1, rpoB, rbcL, clpP, psbB, rps3, ycf1 | |
| 8 | rps16, ccsA, ndhA, ndhF, ndhD | |
| 7 | matK, rps16, atpF, rpoC2, rpoC1, psaA, ycf3, rbcL, accD, clpP, rpoA, rpl16, rps3, ycf1, ndhF, ccsA, ndhD, ndhA, rps19 | accD, ycf2, ndhH |
| 6 | rpoB, rps4, psbB, rps8, petA, | rps8, rpl22, ycf3, psaA |
| rpl20, ycf2 | ||
| 5 | atpA, cemA, rps2, ndhC, psaB | atpF, psbC, rpoA, rpl2, rpl32 |
| 4 | rpl22, rps18, ndhI, psbT, petD, rpl14, rpl2, rps12 | rps14, rps4, atpA, atpB, rps19 |
| 3 | psbD, psbC, rps15, ycf4, ndhB | rps18, atpI, rpl20, petD, rpl36, ndhI, psaB |
| 2 | atpI, ndhK, ndhG, rpl32, ndhE, rps11, ndhH | psbA, ndhK, petA, rpl33, rpl14, ndhJ, ycf4, rpl16, rps12, ndhG, psbD, petG, ndhB |
| 1 | psbA, rpl33, atpE, atpB, psbZ, ndhJ, rpl23 | psbK, ndhC, rps15, rpl23, atpE, cemA, rps11, psaJ |
Fig 9Phylogenetic tree showing the number of mutated genes per species.
Phylogenetic tree with the number of mutated genes per species when compared to the ancestor (numbers next to the species names) of the clade (A)- Clade 1 (Psilanthus); (B)- Clade 2 (Coffea). Branch values are bootstraps according to the maximum likelihood phylogenetic reconstruction. A suffix has been added to provide unique names to ancestral nodes.