| Literature DB >> 29163600 |
Luiz H M Fonseca1, Lúcia G Lohmann1.
Abstract
The chloroplast is one of the most important organelles of plants. This organelle has a circular DNA with approximately 130 genes. The use of plastid genomic data in phylogenetic and evolutionary studies became possible with high-throughput sequencing methods, which allowed us to rapidly obtain complete genomes at a reasonable cost. Here, we use high-throughput sequencing to study the "Adenocalymma-Neojobertia" clade (Bignonieae, Bignoniaceae). More specifically, we use Hi-Seq Illumina technology to sequence 10 complete plastid genomes. Plastomes were assembled using selected plastid reads and de novo approach with SPAdes. The 10 assembled genomes were analyzed in a phylogenetic context using five different partition schemes: (1) 91 protein-coding genes ("coding"); (2) 76 introns and spacers with alignment manually edited ("non-coding edited"); (3) 76 non-coding regions with poorly aligned regions removed using T-Coffee ("non-coding filtered"); (4) 91 coding regions plus 76 non-coding regions edited ("coding + non-coding edited"); and, (5) 91 protein-coding regions plus the 76 filtered non-coding regions ("coding + non-coding filtered"). Fragmented regions were aligned using Mafft. Phylogenetic analyses were conducted using Maximum Likelihood (ML) and Bayesian Criteria (BC). The analyses of the individual plastomes consistently recovered an expansion of the Inverted Repeated (IRs) regions and a compression of the Small Single Copy (SSC) region. Major genomic translocations were observed at the Large Single Copy (LSC) and IRs. ML phylogenetic analyses of the individual datasets led to the same topology, with the exception of the analysis of the "non-coding filtered" dataset. Overall, relationships were strongly supported, with the highest support values obtained through the analysis of the "coding + non-coding edited" dataset. Four regions at the LSC, SSC, and IR were selected for primer development. The "Adenocalymma-Neojobertia" clade shows an unusual pattern of plastid structure variation, including four major genomic translocations. These rearrangements challenge the current view of conserved plastid genome architecture in terms of gene order. It also complicates both genomic assemblies using reference genomes and sequence alignments using whole plastomes. Therefore, strategies that employ de novo assemblies and manual evaluation of sequence alignments are required to prevent assembly and alignment errors.Entities:
Keywords: DNA sequence alignment; cp genome; genomic rearrangments; phylogenomics; plastid primers
Year: 2017 PMID: 29163600 PMCID: PMC5672021 DOI: 10.3389/fpls.2017.01875
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Taxa, vouchers, collection sites, and accession numbers of Adenocalymma and Neojobertia specimens sampled.
| B.M. Gomes 671 (SPF) | Brazil; Pará; Santarém | ||
| L.G. Lohmann 658 (SPF) | Brazil; Espírito Santo; Linhares | ||
| M.R. Pace 521 (SPF) | Peru; Loreto; Iquitos | ||
| L.H.M. Fonseca 100 (SPF) | Brazil; Paraná; Jundiaí do Sul | ||
| L.G. Lohmann 705 (SPF) | Brazil; Paraíba; Alagoa Grande | ||
| L.H.M. Fonseca 262 (SPF) | Brazil; Minas Gerais; Catugi | ||
| L.H.M. Fonseca 267 (SPF) | Brazil; Minas Gerais; Diamantina | ||
| L.H.M. Fonseca 444 (SPF) | Brazil; Goiás; São Jorge da Chapada | ||
| R.G. Udulutsch 2758 (SPF) | Brazil; Ceará; Tianguá | ||
| L.G. Lohmann 363 (SPF) | Brazil; Bahia; Mucugê |
Summary of sequenced plastomes of Adenocalymma and Neojobertia.
| 14,546,510 | 284,977 | 364.4 | 157,952 | 84,668 | 12,804 | 30,240 | 38.2 | 85 | 37 | 8 | |
| 9,142,072 | 713,911 | 904.6 | 159,407 | 84,934 | 12,585 | 30,954 | 37.4 | 85 | 37 | 8 | |
| 30,862,472 | 501,539 | 645.2 | 157,025 | 84,059 | 12,723 | 30,097 | 38.2 | 84 | 37 | 8 | |
| 17,633,356 | 762,288 | 964 | 159,725 | 85,665 | 12,632 | 30,835 | 38.3 | 85 | 37 | 8 | |
| 10,369,744 | 399,680 | 507.7 | 159,010 | 85,654 | 12,677 | 30,310 | 38 | 85 | 37 | 8 | |
| 19,780,568 | 432099 | 549.7 | 158,786 | 84,999 | 12,616 | 30,561 | 41.6 | 85 | 37 | 8 | |
| 13,881,445 | 305,321 | 390 | 158,103 | 85,043 | 12,730 | 30,147 | 37.6 | 85 | 37 | 8 | |
| 18,872,525 | 506,711 | 643 | 159,187 | 85,106 | 12,804 | 30,614 | 38.3 | 84 | 37 | 8 | |
| 11,761,741 | 239,286 | 307.7 | 157,089 | 84,219 | 12,765 | 30,084 | 37.7 | 85 | 37 | 8 | |
| 8,532,329 | 738,770 | 942 | 158,409 | 85,192 | 12,737 | 30,211 | 38.1 | 85 | 37 | 8 |
Figure 1Gene map of the A. peregrinum chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are transcribed counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content, and the lighter gray corresponds to AT content.
Genes recovered within the “Adenocalymma-Neojobertia” clade.
| Self-replication | • rRNA genes | |
| Photosynthesis | • Photosystem I | |
| Other genes | • Translational initiator | |
| Unknown function | Conserved open read frames |
Gene with one intron.
Gene with two introns.
Gene with two copies.
Pseudogene in some species.
Figure 2Comparisons of the Long Single Copy (LSC), Small Single Copy (SSC), and Inverted Repeated (IR) region borders among four Lamiales chloroplast genomes. Genes shown above the lines are transcribed forward while genes shown below the lines are transcribed reversely. Two-headed arrows indicate plastome partition sizes in base pairs and single-headed arrows indicate size of features or distances between plastome partition borders and features.
Figure 3Phylogeny of the “Adenocalymma-Neojobertia” clade recovered from the analysis of the combined datasets from 10 representative species, followed by the linear plastid maps of all species sampled. Plastome regions are depicted with different colors; Salmon lines link conserved regions while blue lines link rearranged homologous regions. LSC, Long Single Copy region; SSC, Small Single Copy region; IR, Inverted Repeated region.
Summary of partition schemes.
| Coding | 71,395 | 19,912 | 27.9 |
| Non-coding edited | 48,469 | 19,052 | 39.3 |
| Non-coding filtered | 48,319 | 19,005 | 39.3 |
| Coding + non-coding edited | 119,864 | 38,964 | 32.5 |
| Coding + non-coding filtered | 119,714 | 38,917 | 32.5 |
Figure 4Maximum Likelihood (ML) trees derived from the analyses of five different partition schemes. Nodes A, B, C and D are depicted at the tree derived from the analyses of the “coding” region dataset. Values shown next to nodes are likelihood bootstrap support.
Summary statistics of the five most useful introns and intergenic spacers for phylogeny reconstruction.
| 1173 | 958 | 1109 | 1 | 0.85 | 0.89 | |
| 1233 | 986 | 994 | 0.39 | 1 | 0.79 | |
| 706 | 670 | 699 | 0.2 | 1 | 0.91 | |
| 852 | 732 | 750 | 0.21 | 0.89 | 0.9 | |
| 961 | 863 | 889 | 0.64 | 0.59 | 0.75 |
Regions were selected based on a standardized mean of three variables:
(1) Percentage of variable sites;
(2) Phylogenetic tree topology distance and
(3) Phylogenetic tree branch lengths distance.
Values standardized.
Distances computed using Kendall and Colijn (.
Sequences of primer pairs designed in this study for selected regions.
| TCAATATCTCTACGTGCGATTCG | CCGTCGCTATTACAGAACCGT | 59.5 | |
| TGGGGAAGAAGTGGGCTCTA | AGTTCCTACCGCTTTTCTACTT | 59.9 | |
| TGTAGCGGGTATAGTTTAGTGGT | CGCATCGTTAGCTTGGAAGG | 59 | |
| AGGACCGTACATGCACCTTT | CCTCTGTTTCGCCCAAGAAA | 58.9 |