Literature DB >> 30131893

Complete plastome sequences from Bertholletia excelsa and 23 related species yield informative markers for Lecythidaceae.

Ashley M Thomson1,2, Oscar M Vargas1, Christopher W Dick1,3.   

Abstract

PREMISE OF THE STUDY: The tropical tree family Lecythidaceae has enormous ecological and economic importance in the Amazon basin. Lecythidaceae species can be difficult to identify without molecular data, however, and phylogenetic relationships within and among the most diverse genera are poorly resolved.
METHODS: To develop informative genetic markers for Lecythidaceae, we used genome skimming to de novo assemble the full plastome of the Brazil nut tree (Bertholletia excelsa) and 23 other Lecythidaceae species. Indices of nucleotide diversity and phylogenetic signal were used to identify regions suitable for genetic marker development.
RESULTS: The B. excelsa plastome contained 160,472 bp and was arranged in a quadripartite structure. Using the 24 plastome alignments, we developed primers for 10 coding and non-coding DNA regions containing exceptional nucleotide diversity and phylogenetic signal. We also developed 19 chloroplast simple sequence repeats for population-level studies. DISCUSSION: The coding region ycf1 and the spacer rpl16-rps3 outperformed plastid DNA markers previously used for barcoding and phylogenetics. Used in a phylogenetic analysis, the matrix of 24 plastomes showed with 100% bootstrap support that Lecythis and Eschweilera are polyphyletic. The plastomes and primers presented in this study will facilitate a broad array of ecological and evolutionary studies in Lecythidaceae.

Entities:  

Keywords:  Amazonian trees; Bertholletia excelsa; DNA barcoding; Lecythidaceae; genetic markers; plastome

Year:  2018        PMID: 30131893      PMCID: PMC5991589          DOI: 10.1002/aps3.1151

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


Lecythidaceae (sensu lato) is a pantropical family of trees with three subfamilies: Foetidioideae, which is restricted to Madagascar; Barringtonioideae, found in the tropical forests of Asia and Africa; and the Neotropical clade Lecythidoideae, which contains approximately 234 of the approximately 278 known species in the broader family (Mori et al., 2007, 2017; Huang et al., 2015; Mori, 2017). Neotropical Lecythidaceae are understory, canopy, or emergent trees with distinctive floral morphology and woody fruit capsules. Among Lecythidaceae species are the iconic Brazil nut tree, Bertholletia excelsa Bonpl.; the oldest documented angiosperm tree, Cariniana micrantha Ducke (dated at >1400 years old in Manaus, Brazil; Chambers et al., 1998); the cauliflorous cannonball tree commonly grown in botanical gardens, Couroupita guianensis Aubl.; and important timber species (e.g., Cariniana legalis (Mart.) Kuntze). Lecythidaceae is the third most abundant family of trees in the Amazon forest, following Fabaceae and Sapotaceae (ter Steege et al., 2013). The most species‐rich genus, Eschweilera Mart. ex DC., with approximately 99 species (Mori, 2017), is also the most abundant tree genus in the Amazon basin (ter Steege et al., 2013), and E. coriacea (DC.) S. A. Mori is the most common tree species in much of Amazonia (ter Steege et al., 2013). Lecythidaceae provide important ecological services such as carbon sequestration and are food resources for pollinators (bats and large bees) and seed dispersers (monkeys and agouties) (Prance and Mori, 1979; Mori and Prance, 1990). Tools for species‐level identification and phylogenetic analyses of Lecythidaceae could significantly advance research on Amazon tree diversity. However, despite their ease of identification at the family level, species‐level identification of many Lecythidaceae (especially Eschweilera) is notoriously difficult when based on sterile (i.e., without fruit or floral material) herbarium specimens, and flowering of individual trees often occurs only at multi‐year intervals (Mori and Prance, 1987). As a complement to other approaches, DNA barcoding (Dick and Kress, 2009; Dexter et al., 2010) may help to identify species or clades of Lecythidaceae. A combination of two protein‐coding plastid regions (matK and rbcL) has been proposed as a core plant DNA barcode (Hollingsworth et al., 2009), although other coding and non‐coding plastome regions (psbA‐trnH, rpoB, rpoC1, trnL, and ycf5) and the ITS of nuclear ribosomal genes have been recommended as supplemental barcodes for vascular plants (Kress et al., 2005; Lahaye et al., 2008; Li et al., 2011). However, an evaluation of a subset of these markers (ITS, psbA‐trnH, matK, rbcL, rpoB, rpoC1, and trnL) on Lecythidaceae in French Guiana (Gonzalez et al., 2009) showed poor performance for species identification. Furthermore, the use of traditional markers (plastid ndhF, trnL‐F, and trnH‐psbA, and nuclear ITS) for phylogenetic analysis has produced weakly supported trees (Mori et al., 2007; Huang et al., 2015), indicating a need to develop more informative markers and/or increase molecular sampling. The main objectives of this study were to (1) assemble, annotate, and characterize the first complete plastome sequence of Lecythidaceae from the iconic Brazil nut tree, B. excelsa; (2) obtain a robust backbone phylogeny for the Neotropical clade using newly assembled draft plastome sequences for an additional 23 species; and (3) develop a novel set of informative molecular markers for DNA barcoding and broader evolutionary studies.

METHODS

Plant material and DNA library preparation

We performed genomic skimming on 24 Lecythidaceae species, including 23 Lecythidoideae and one outgroup species (Barringtonia edulis Seem.) from the Barringtonioideae. The sampling included all 10 Lecythidoideae genera (Appendix 1). Silica‐dried leaf tissue from herbarium‐vouchered collections was collected by Scott Mori and colleagues and loaned by the New York Botanical Garden. Total genomic DNA was extracted from 20 mg of dried leaf tissue using the NucleoSpin Plant II extraction kit (Machery‐Nagel, Bethlehem, Pennsylvania, USA) with SDS lysis buffer. Prior to DNA library preparation, 5 μg of total DNA was fragmented using a Covaris S‐series sonicator (Covaris Inc., Woburn, Massachusetts, USA) following the manufacturer's protocol to obtain approximately 300‐bp insert sizes. We prepared the sequencing library using the NEBNext DNA library Prep Master Mix and Multiplex Oligos for Illumina Sets (New England BioLabs Inc., Ipswich, Massachusetts, USA) according to the manufacturer's protocol. Size selection was carried out prior to PCR using Pippin Prep (Sage Science, Beverly, Massachusetts, USA). Molecular mass of the finished paired‐end library was quantified using an Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, California, USA) and by quantitative PCR using an ABI PRISM 7900HT (Thermo Fisher Scientific, Waltham, Massachusetts, USA) at the University of Michigan DNA Sequencing Core (Ann Arbor, Michigan, USA). We sequenced the libraries on one lane of the Illumina HiSeq 2000 (Illumina Inc., San Diego, California, USA) with a paired‐read length of 100 bp.

Plastome assembly

Illumina adapters and barcodes were excised from raw reads using Cutadapt version 1.4.2 (Martin, 2011). Reads were then quality filtered using Prinseq version 0.20.4 (Schmieder and Edwards, 2011), which trimmed 5ʹ and 3ʹ sequence ends with a Phred quality score <20 and removed all trimmed sequences <50 bp in length, with >5% ambiguous bases, or with a mean Phred quality score <20. A combination of de novo and reference‐guided approaches was used to assemble the plastomes. First, chloroplast reads were separated from the raw read pool by BLAST‐searching all raw reads against a database consisting of all complete angiosperm plastome sequences available on GenBank (accessed in 2014). Any aligned reads with an E‐value <1–5 were retained for subsequent analysis. The filtered chloroplast reads were de novo assembled using Velvet version 7.0.4 (Zerbino and Birney, 2008) with k‐mer values of 71, 81, and 91 using a low‐coverage cutoff of 5 and a minimum contig length of 300. The assembled contigs were then mapped to a reference genome (see below) using Geneious version R8 (Kearse et al., 2012) to determine their order and direction using the reference‐guided assembly tool with medium sensitivity and iterative fine‐tuning options. Finally, raw reads were iteratively mapped onto the draft genome assembly to extend contigs and fill gaps using the low‐sensitivity, reference‐guided assembly in Geneious. We first assembled the draft genome of B. excelsa; the plastomes of the remaining 23 species were assembled subsequently using the plastome of B. excelsa as a reference. The B. excelsa plastome was annotated using DOGMA (Wyman et al., 2004) with the default settings for chloroplast genomes. Codon start and stop positions were determined using the open reading frame finder in Geneious and by comparison with the plastome sequence of Camellia sinensis (L.) Kuntze var. pubilimba Hung T. Chang (GenBank ID: KJ806280). A circular representation of the B. excelsa plastome was made using OGDraw V1.2 (Lohse et al., 2007). The complete annotated plastome of B. excelsa and the draft plastomes of the remaining 23 Lecythidaceae species sampled were deposited into GenBank (Appendix 1).

Identification of molecular markers

Chloroplast simple sequence repeats (cpSSRs) in B. excelsa were identified using the Phobos Tandem Repeat Finder version 3.3.12 (Mayer, 2010) by searching for uninterrupted repeats of nucleotide units of 1 to 6 bp in length, with thresholds of ≥12 mononucleotide, ≥6 dinucleotide, and ≥4 trinucleotide repeats, and ≥3 tetra‐, penta‐, and hexanucleotide repeats (Sablok et al., 2015). We developed primers to amplify the cpSSRs using Primer3 version 2.3.4 (Untergasser et al., 2012) with the default options and setting the PCR product size range between 100 and 300 bp. The 24 plastomes were aligned with MAFFT version 7.017 (Katoh et al., 2002) and then scanned for regions of high nucleotide diversity (π; Nei, 1987) using a sliding window analysis implemented in DNAsp version 5.10.1 (Librado and Rozas, 2009) with a window and a step size of 600 bp. Levels of nucleotide diversity were plotted using the native R function “plot” (R Core Team, 2017), and windows with values over the 95th percentile were considered of high π. Taking into account that DNA barcodes can also be used in phylogenetic analyses and because regions with high π do not necessarily have high phylogenetic signal (e.g., unalignable hypervariable regions), we employed a log‐likelihood approach modified from Walker et al. (2017) to identify phylogenetically influential regions. First, we inferred a phylogenetic tree with the plastome alignment (including only one inverted repeat) by performing 100 independent maximum likelihood (ML) searches using a GTRGAMMA model with RAxML version 8.2.9 (Stamatakis, 2014). Those searches resulted in the same topology that was subsequently annotated with the summary from 100 bootstraps using “sumtrees.py” version 4.10 (Sukumaran and Holder, 2010). We then calculated site‐specific log‐likelihoods in the alignment over the plastome phylogeny and calculated their differences site‐wise to the averaged log‐likelihood per site of 1000 randomly permuted trees (tips were randomly shuffled). Log‐likelihood scores were calculated with RAxML using a GTRGAMMA model. The site‐wise log‐likelihood differences (LD) were calculated using 600‐bp non‐overlapping windows with a custom R script (see below). We interpreted greater LD as an indication of greater phylogenetic signal, and windows with an LD above the 95th percentile were considered to have exceptional phylogenetic signal. Primers flanking the top 10 regions with high π were designed using Primer3 with default program options. We employed a maximum product size of 1300 bp because lower cutoff values (i.e., 600 bp) made the primer design extremely challenging due to the lack of conserved regions. Primers were designed to amplify across all 23 Neotropical species without the use of degenerate bases. However, primers with a small number of degenerate bases were permitted for some regions where primer development otherwise would not have been possible due to high sequence variability in the priming sites. We investigated the potential of our markers to produce robust phylogenies by calculating individual gene trees in RAxML version 8.2.9 in an ML search with 100 rapid bootstraps (option “‐f a”) using the GTRGAMMA model. To evaluate the number of markers needed to obtain a resolved tree with an average of ~90 bootstrap support (BS), we first concatenated the two markers with the highest π and inferred a tree; subsequently, we added the marker with the next highest π score. We iterated this process until we obtained a matrix with each of the 10 markers developed. For every tree obtained, we calculated its average BS and its Robinson–Foulds distance (RF; Robinson and Foulds, 1981) from the plastome phylogeny using a custom R script employing the packages APE (Paradis et al., 2004) and Phangorn (Schliep, 2011). The scripts and alignments used in this study can be found at https://bitbucket.org/oscarvargash/lecythidaceae_plastomes.

RESULTS

Lecythidaceae plastome features

The sequenced plastome of B. excelsa contained 160,472 bp and 115 genes, of which four were rRNAs and 30 were tRNAs (Fig. 1, Table 1). The arrangement of the B. excelsa plastome had a typical angiosperm quadripartite structure with a single‐copy region of 85,830 bp, a small single‐copy region of 16,670 bp, and two inverted repeat regions of 27,481 bp each. Relative to C. sinensis var. pubilimba, we found no gene gain/losses in B. excelsa. The only structural difference we found is that B. excelsa contains the sequential genes trnH‐GUG, rps3, rpl22, and rps19 in the inverted repeat region, whereas C. sinensis var. pubilimba contains these genes in the large single‐copy region. Similarly, no gene gain/losses were found when B. excelsa was compared to other Neotropical Lecythidaceae plastomes assembled herein (Table 2). In addition to B. excelsa, the plastome of Eschweilera alata A. C. Sm. was also completely assembled; the coverage for the remaining plastomes ranged between 85% and 99.60% (Appendix 1).
Figure 1

Plastome map of the Brazil nut tree, Bertholletia excelsa. Genes outside of the circle are transcribed clockwise; genes inside of the circle are transcribed counterclockwise. Gray bars in the inner ring show the GC content percentage.

Table 1

Genes contained within the chloroplast genome of Bertholletia excelsa

FunctionGene groupGene name
Self‐replicationRibosomal proteins (large subunit) rpl2, rpl14, rpl16, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
Ribosomal proteins (small subunit) rps2, rps3, rps4, rps7, rps8, rps11, rps12, rps14, rps15, rps16, rps18, rps19
RNA polymerase subunits rpoA, rpoB, rpoC1, rpoC2
Ribosomal RNAs rrn4.5, rrn5, rrn16, rrn23
Transfer RNAs trnA‐UGC, trnC‐GCA, trnD‐GUC, trnE‐UUC, trnF‐GAA, trnG‐GCC, trnG‐UCC, trnH‐GUG, trnI‐CAU, trnI‐GAU, trnK‐UUU, trnL‐CAA, trnL‐UAA, trnL‐UAG, trnfM‐CAU, trnM‐CAU, trnN‐GUU, trnP‐UGG, trnQ‐UUG, trnR‐AGC, trnR‐UCU, trnS‐GCU, trnS‐GGA, trnS‐UGA, trnT‐GGU, trnT‐UGU, trnV‐GAC, trnV‐UAC, trnW‐CCA, trnY‐GUA
PhotosynthesisPhotosystem I psaA, psaB, psaC, psaI, psaJ
Photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
NADH dehydrogenase ndhA, ndhB, ndhC, ndhD, ndhE, ndF, ndhG, ndhH, ndhI, ndhJ, ndhK
Cytochrome b/f complex petA, petB, petD, petG, petL, petN
ATP synthase atpA, atpB, atpE, atpF, atpH, atpI
RuBisCO large subunit rbcL
Other genesSubunit of acetyl‐CoA‐carboxylase accD
Envelope membrane protein cemA
Protease clpP
c‐type cytochrome synthase ccsA
Translational initiation factor infA
Maturase matK
Unknown functionHypothetical chloroplast reading frames ycf1, ycf2, ycf3, ycf4, yc15
Table 2

Comparison for plastome subunits for the samples for which the inverted repeats were completely assembled.a

SpeciesLSC length (bp)SSC length (bp)IR length (bp)GC content (%)Protein‐coding genesrRNAstRNAs
Allantoma decandra 85,26918,73827,61836.981430
A. lineata 85,11918,75627,63536.981430
Bertholletia excelsa 85,84018,95027,84136.481430
Corythophora amapaensis 85,86118,77827,63836.781430
C. labriculata 85,67318,75927,59436.781430
Couratari macrosperma 83,78518,72827,61437.081430
C. stellata 85,54718,49127,57636.981430
Eschweilera alata 85,05618,72127,63536.681430
E. caudiculata 84,71318,75927,63837.081430
E. congestiflora 84,81518,16727,71537.181430
E. integrifolia 84,68818,79627,59236.981430
E. micrantha 85,28618,71927,66836.881430
E. wachenheimii 85,37818,81527,60336.881430
Lecythis pneumatophora 85,50618,84527,62236.781430

IR = inverted repeat; LSC = large single‐copy region; SSC = small single‐copy region.

Length and GC content of the large single‐copy and small single‐copy regions in partial plastomes are estimates only.

Plastome map of the Brazil nut tree, Bertholletia excelsa. Genes outside of the circle are transcribed clockwise; genes inside of the circle are transcribed counterclockwise. Gray bars in the inner ring show the GC content percentage. Genes contained within the chloroplast genome of Bertholletia excelsa Comparison for plastome subunits for the samples for which the inverted repeats were completely assembled.a IR = inverted repeat; LSC = large single‐copy region; SSC = small single‐copy region. Length and GC content of the large single‐copy and small single‐copy regions in partial plastomes are estimates only.

Identification of molecular markers

Within the plastome of B. excelsa we found 23 cpSSRs, 22 of which were in non‐coding regions and one in the ndhD coding region. We designed 19 primer pairs with an acceptable product length, annealing temperature, and GC content for cpSSRs located in non‐coding regions (Table 3). π exceeded the 95th percentile for nine 600‐bp windows (Fig. 2, Tables 4 and 5). Similarly, 13 windows were over the 95th percentile for LD (Fig. 2, Tables 4 and 5), indicating high phylogenetic signal. Although most of the informative windows were in non‐coding regions, two consecutive regions were positioned in the ycf1 gene. Six windows contained both high π and LD. As expected, high π and greater LD largely agreed. Based on the rank of the windows obtained for π, we developed primers for the following regions (ordered from high to low π): ycf1, rpl16‐rps3, psbM‐trnD, ccsA‐ndhD, trnG‐psaB, petD‐rpoA, psbZ‐trnfM, trnE‐trnT, and trnT‐psbD (Table 6).
Table 3

Primers for the amplification of simple sequence repeats in the plastome of Bertholletia excelsa. All primer pairs amplify non‐coding sequences with the exception of ndhD.a

Forward primer sequence (5′–3′)Reverse primer sequence (5′–3′)Repeat unitLocationRegionNo. of repeatsProduct size (bp)
CCAAAATCATGAACTAACCCCCAACCAAGAGGGCGTTATTGCTA396–409 trnH‐psbA 14226
TGAAGTCGTGTTGCTGAGATCTCTGTTGATAAGTTTGCCGAGGTC3686–3702 trnK intron 17197
GAGGTTTTCTCCTCGGACGGACCACTCATTAAACGAAATGCCTA5680–5691 rps16 intron 12244
GTCCACTCAGCCATCTCTCCAGCCCGGCCATAGGAATAAAAAAG9396–9407 trnS‐trnG 3297
TTTATTCCTATGGCCGGGCTTGCATTGTTTAAGAATCCATAGTTTCAA9769–9780 trnS‐trnG 12246
TTTTCCCCACACTTCCCCTCTGTCCGGTCATTTGATTTGGTA17,925–17,938 rps2‐rpoC2 14192
AAGAGAGGAGAAGTTTTTAGGCACCTTACCACTCGGCCATGTCA29,392–29,403 rpoB‐trnC 12232
GGGATGCGAGAAAGAGACTTCAAAAGTATATCTTTCTACGGGTCGAAAG34,775–34,786 trnT‐psbD 3250
TACCGGTTTTCAAGACCGGGTCACAAATGGGCATGCTGGAAAAAT38,160–38,174 trnS‐psbZ 3201
ACCCATCAATCATTCGATTCGTGAAAGATCTTTCCTTGGGGGAAAAG47,627–47,638 ycf3‐trnS 3168
No suitable primers foundNo suitable primers foundAAAT49,610–49,625 trnT‐trnL 4NA
No suitable primers foundNo suitable primers foundAATT50,016–50,027 trnT‐trnL 3NA
CCACTGAACAAGGGAGAGCCACCAAGGCAAACCCATGGAAAAAATT75,475–75,492 clpP‐psbB 3128
TGAATCACTGCTTTTCTTTGACTCTAGGCGGTTCTCGAAAGAAGAAAAAT77,155–77,169 psbB‐psbT 3183
TTCAATCTCGGGATTCTTTGAGATCGCCTGCGAAAACTTAACTA85,073–85,085 rpl16‐trnH 13246
TCGATCAATCCCTTTGCCCTCGTACTCCTCGCTCAATGAGAAAAT102,172–102,183 rps12‐trnV 3248
TGGAGCACCTAACAACGCATAGACCTCCGGGAAAAGCATGA106,208–106,219 trnL intron 12119
AGAGTAAACACAAGATACAAGGGTGTGGGTTAGGTCAATCGGGAAACTT117,345–117,359 rpl32‐trnL 3194
AGTCAACGTCAAAATTAATGAATGGTAGGTTGAACGCGAGCGATATAT117,609–117,622 rpl32‐trnL 7177
AAATAACTCCCGCGGTCCAGGCTTCTCTTGCATTACCGGGAAAT119,729–119,740 ndhD 3240
No suitable primers foundNo suitable primers foundAAT122,820–122,831 ndhG‐ndhI 4NA
No suitable primers foundNo suitable primers foundAAT122,843–122,854 ndhG‐ndhI 4NA
AACCCGCTTCAAGCCATGATAAACGGCTTATAAATTCGCAGTAATC125,271–125,282 ndhA intron 3130

NA = not applicable.

Sequences have been deposited to GenBank (BioProject SUB2740669).

Figure 2

Sliding 600‐site window analyses on the Lecythidaceae plastome alignment of 24 species showing nucleotide diversity (π) (top) and alignment site‐wise differences in log‐likelihood (LD) calculated from the chloroplast topology versus the average scores of 1000 random trees (bottom). Regions with π and LD above the 95th percentiles are indicated with dashed lines. Continuous vertical lines indicate the boundaries, from left to right, among the large single copy, the inverted repeat, and the small single copy.

Table 4

Regions of the plastome alignment (windows of 600 sites) with significantly high (above the 95th percentile) nucleotide diversity and/or site‐wise log‐likelihood score differences.b

Location in the alignment Bertholletia plastome locationClosest flanking expressed regionRegionπLD
5′3′
1–6001–490 trnH psbA LSC a
5401–60004885–5373 trnK‐UUU rps16 LSC a
34,801–35,40030,925–31,450 petN trnD‐GUC LSC a
35,401–36,00031,451–31,967 psbM trnD‐GUC LSC a a
37,201–37,80033,027–33,573 trnE‐UUC trnT‐GGU LSC a a
39,601–40,20034,893–35,433 trnT‐GGU psbD LSC a
43,801–44,40038,798–39,254 psbZ trnfM‐CAU LSC a a
44,401–45,00039,255–39,744 trnfM‐CAU psaB LSC a a
61,201–61,80054,771–55,275 trnV‐UAC atpE LSC a
78,601–79,20070,230–70,771 psaJ rps18 LSC a
89,401–90,00080,536–81,103 petD rpoA LSC a
95,401–96,00085,455–85,906 rpl16 rps3 LSC a
131,401–132,000119,237–119,759 ccsA ndhD SSC a
140,401–141,000127,827–128,402 rps15 ycf1 SSC a
144,001–144,600131,283–131,868 ycf1 ycf1 SSC a a
144,601–145,200131,869–132,446 ycf1 ycf1 SSC a a

π = nucleotide diversity (see main text); LD = log‐likelihood score differences; LSC = large single copy; SSC = small single copy.

Signifies regions with high (above the 95th percentile) nucleotide diversity or site‐wise log‐likelihood score differences.

Coding regions are indicated in windows that have the same 5ʹ‐ and 3ʹ‐expressed flanking regions in column 3. Notice that no regions are reported for the inverted repeat (IR). Coordinates are given on the alignment and the Bertholletia excelsa plastome that are assembled with the standard LSC‐SSC‐IR structure.

Table 5

Nucleotide diversity and differences in log‐likelihood scores of the informative windows identified in this study and of previously proposed barcode markers

Regiona πb LDb
ccsA‐ndhD 0.0258 247.12
matK 0.0153136.92
petD‐rpoA 0.0246 260.79
petN‐trnD 0.0228 361.07
psaJ‐rps18 0.0176 309.10
psbM‐trnD 0.0292 330.41
psbZ‐trnfM 0.0246 373.97
rbcL 0.010595.03
rpl16‐rps3 0.0345 275.89
rpoB 0.0097120.53
rpoC1 0.0103178.60
rps15‐ycf1 0.0212 284.57
trnE‐trnT 0.0241 522.51
trnfM‐psaB 0.0254 375.76
trnH‐psbA 0.0126 310.47
trnK‐rps16 0.0164 350.44
trnL 0.0106192.27
trnT‐psbD 0.0239 291.15
trnV–atpE 0.0128 379.77
ycf1 (1) 0.0273 462.53
ycf1 (2) 0.0469 313.12

π = nucleotide diversity; LD = differences in log‐likelihood scores.

Informative windows identified in this study are indicated in bold.

High values (above the 95th percentile) for π and LD are indicated in bold.

Table 6

Primer sequences designed to amplify the 10 most polymorphic Lecythidaceae plastome regions, as sorted by decreasing nucleotide diversity

Window in the alignmentπRegionForward primer sequence (5′–3′)Reverse primer sequence (5′–3′)Length (bp)a
144,103–145,4870.04691 ycf1(1)AGAACCTTTGATTATGTCTCGACGAGAGACATGCTATAAAAATAGCCCA1186
95,034–95,7410.03446 rpl16‐rps3 AGAGTTTCTTCTCATCCAGCTCCGCTTAGTGTGTGACTCGTTGG1014
35,585–36,4130.02920 psbM‐trnD CCGTTCTTTCTTTTCTATAACCTACCCACGCTGGTTCAAATCCAGCT1093
143,235–144,1020.02733 ycf1(2)TGATTCGAATCTTTTAGCATTAKAACTKCGTCGAGACATAATCAAAGGT1189
131,180–132,0540.02576 ccsA‐ndhD CCGAGTGGTTAATAATGCACGTGCTTCTCTTGCATTACCGGG1180
44,398–45,1320.02537 trnG‐psaB TCGATYCCCGCTATCCGCCGCCAATTTGATTCGATGGAGAGA883
89,032–89,6880.02464 petD‐rpoA TGGGAGTGTGTGACTTGAACTTGACCCATCCCTTTAGCCAA824
43,412–44,3970.02456 psbZ‐trnfM TCCAATTGRCTGTTTTTGCATTAATTGCCTTGAGGTCACGGGTTCAA706
37,444–38,3450.02409 trnE‐trnT AGACGATGGGGGCATACTTGCCACTTACTTTTTCTTTTGTTTGTTGA1324
38,346–40,0850.02391 trnT‐psbD GGCGTAAGTCATCGGTTCAACCCAAAGCGAAATAGGCACA1717

π = nucleotide diversity.

The product size (length) references the Bertholletia excelsa plastome.

Primers for the amplification of simple sequence repeats in the plastome of Bertholletia excelsa. All primer pairs amplify non‐coding sequences with the exception of ndhD.a NA = not applicable. Sequences have been deposited to GenBank (BioProject SUB2740669). Sliding 600‐site window analyses on the Lecythidaceae plastome alignment of 24 species showing nucleotide diversity (π) (top) and alignment site‐wise differences in log‐likelihood (LD) calculated from the chloroplast topology versus the average scores of 1000 random trees (bottom). Regions with π and LD above the 95th percentiles are indicated with dashed lines. Continuous vertical lines indicate the boundaries, from left to right, among the large single copy, the inverted repeat, and the small single copy. Regions of the plastome alignment (windows of 600 sites) with significantly high (above the 95th percentile) nucleotide diversity and/or site‐wise log‐likelihood score differences.b π = nucleotide diversity (see main text); LD = log‐likelihood score differences; LSC = large single copy; SSC = small single copy. Signifies regions with high (above the 95th percentile) nucleotide diversity or site‐wise log‐likelihood score differences. Coding regions are indicated in windows that have the same 5ʹ‐ and 3ʹ‐expressed flanking regions in column 3. Notice that no regions are reported for the inverted repeat (IR). Coordinates are given on the alignment and the Bertholletia excelsa plastome that are assembled with the standard LSC‐SSC‐IR structure. Nucleotide diversity and differences in log‐likelihood scores of the informative windows identified in this study and of previously proposed barcode markers π = nucleotide diversity; LD = differences in log‐likelihood scores. Informative windows identified in this study are indicated in bold. High values (above the 95th percentile) for π and LD are indicated in bold. Primer sequences designed to amplify the 10 most polymorphic Lecythidaceae plastome regions, as sorted by decreasing nucleotide diversity π = nucleotide diversity. The product size (length) references the Bertholletia excelsa plastome.

Phylogenetics of the plastomes and the developed markers

The ML analysis of the plastome alignment for Lecythidaceae (145,487 sites) yielded a fully resolved phylogeny with high BS for all clades (Fig. 3). Of the genera in which the sampling included multiple species, Eschweilera and Lecythis Loefl. were polyphyletic, whereas Allantoma Miers, Corythophora R. Knuth, Couratari Aubl., and Gustavia L. were monophyletic (Bertholletia is monospecific, and only one species each of Couroupita Aubl., Cariniana Casar., and Grias L. were included in the analysis). The trees obtained from individual markers with high π had an average BS of 73 throughout their nodes, whereas the trees obtained from two or more concatenated regions had an average BS of 89 (Fig. 4, Appendix S1). None of the gene trees, single or combined (Appendix S1), recovered the topology obtained using the complete plastome matrix (none of the gene trees obtained an RF = 0; Fig. 5). In general, matrices with concatenated markers (mean RF = 6) outperformed single markers (mean RF = 13.8; Fig. 5).
Figure 3

Maximum likelihood phylogeny inferred from plastomes of 23 Neotropical Lecythidaceae. Numbers at nodes indicate bootstrap support.

Figure 4

Average bootstrap support for trees inferred from either independent or concatenated regions with high nucleotide diversity, sorted in ascending order.

Figure 5

Robinson–Foulds distances (RF) for trees inferred from either independent or concatenated regions with high nucleotide diversity, sorted in descending order. Lower RF distances, which measure the number of different taxa bipartitions from the complete plastome topology, indicate better accuracy.

Maximum likelihood phylogeny inferred from plastomes of 23 Neotropical Lecythidaceae. Numbers at nodes indicate bootstrap support. Average bootstrap support for trees inferred from either independent or concatenated regions with high nucleotide diversity, sorted in ascending order. Robinson–Foulds distances (RF) for trees inferred from either independent or concatenated regions with high nucleotide diversity, sorted in descending order. Lower RF distances, which measure the number of different taxa bipartitions from the complete plastome topology, indicate better accuracy.

DISCUSSION

Genetic markers from the Lecythidaceae plastome

We are publishing the first full plastome for Lecythidaceae, including high‐depth coverage of the Brazil nut tree (B. excelsa) and 23 draft genomes representing all Lecythidoideae genera and a Paleotropical outgroup taxon. We found no significant gene losses or major rearrangements when the plastome of B. excelsa was compared with that of C. sinensis var. pubilimba, a closely related plastome (Theaceae). We inferred a robust backbone phylogeny for Lecythidoideae using the 24 aligned plastomes. All nodes in our topology had 100% BS except for a node that connects three closely related species of Eschweilera (Fig. 3). The topology agreed with previous but weakly supported (<50% BS) Lecythidaceae phylogenies based on chloroplast and nuclear ITS sequences (Mori et al., 2007; Huang et al., 2015), indicating that Eschweilera and Lecythis are polyphyletic. Although the polyphyly of these two genera is well supported with all available data, some inferred species‐level relationships may change with increased taxonomic sampling and the inclusion of nuclear genomic data. We measured π and a proxy for phylogenetic signal using an LD modified from Walker et al. (2017). These calculations helped us to evaluate the performance of specific chloroplast regions as potential phylogenetic markers. The core plant DNA barcodes matK and rbcL did not exhibit high π or LD in our analysis (Table 5). Of the secondary plastome barcodes mentioned in the literature (rpoC1, rpoB, trnL, and psbA‐trnH; Kress et al., 2005; Lahaye et al., 2008; Hollingsworth et al., 2009; Li et al., 2011), only psbA‐trnH showed high LD (Table 5), although it did not exhibit exceptionally high values of π. In contrast, the regions ycf1, rpl16‐rps3, psbM‐trnD, ccsA‐ndhD, trnG‐psaB, petD‐rpoA, psbZ‐trnfM, trnE‐trnT, and trnT‐psbD displayed the highest values of π and LD and therefore outperformed all of the previously proposed plant DNA barcodes. Phylogenetic trees calculated from concatenated marker sets (based on the π rank) outperformed single regions in terms of support (BS) and accuracy (RF; Figs. 4, 5). In fact, tree topologies using single markers deviated from the complete plastome tree (mean RF = 13.8). The most well‐performing concatenated matrix contained all 10 regions for which we developed primers. However, the combination of ycf1 and rpl16‐rps3 produced an average BS of ~90 (Fig. 4) with reasonable accuracy (RF = 4, Fig. 5); we conclude that these two regions, amplified in three PCRs (Table 6), are promising markers for DNA barcoding, phylogeny, and phylogeography in Lecythidaceae. Although barcoding efficiency in species‐rich clades (i.e., Eschweilera/Lecythis) might decline with the addition of more samples, ycf1 and rpl16‐rps3 effectively distinguished between three closely related species within the Eschweilera parvifolia Mart. ex DC. clade (see branch lengths in Appendix S1), suggesting that these markers might effectively distinguish between many other closely related species. Our results and conclusions agree with those of Dong et al. (2015), who proposed ycf1 as a universal barcode for land plants. The 19 cpSSR markers developed for noncoding portions of the B. excelsa plastome provide a useful resource for population genetic studies. Because of their fast stepwise mutation rate relative to single‐nucleotide polymorphisms, cpSSRs can also be used for finer‐grain phylogeographic analyses (e.g., Lemes et al., 2010; Twyford et al., 2013). This may be especially useful for species that exhibit little geographic structuring across parts of their ranges. Because they are maternally transmitted and can be variable within populations, the cpSSRs may also be used to track the dispersal of seeds and seedlings relative to the maternal source trees. Because of their high level of polymorphism and phylogenetic signal content, we anticipate using the cpDNA markers presented here to study the phylogeography of widespread Lecythidaceae species such as Couratari guianensis Aubl. and Eschweilera coriacea, which range from the Amazon basin into Central America.

Barcoding of tropical trees

The DNA barcoding of tropical trees has been useful for several applications (Dick and Kress, 2009), including community phylogenetic analyses (Kress et al., 2009), inferring the species identity of the gut content (diet) of herbivores (García‐Robledo et al., 2013), and for species identification of seedlings (Gonzalez et al., 2009). The power of DNA barcodes to discriminate among species should be high if the studied species are distantly related; for example, Kress et al. (2009) were able to discriminate 281 of 296 tree and shrub species from Barro Colorado Island using standard DNA barcodes, but they were not able to discriminate among some congeneric species in the species‐rich genera Inga Mill. (Fabaceae), Ficus L. (Moraceae), and Piper L. (Piperaceae). Gonzalez et al. (2009) encountered similar challenges with Eschweilera species in their study of trees and seedlings in Paracou, French Guiana. The latter study tested a wide range of putative DNA barcode regions (rbcLa, rpoC1, rpoB, matK, trnL, psbA‐trnH, and ITS) but did not include the markers presented in this article.

Limitations of plastome markers for phylogeny and species identification

These newly identified plastome markers are not free of limitations. First, plastome‐based phylogenies should be interpreted with caution, as they can disagree with nuclear markers and species trees as a result of introgression and/or lineage‐sorting issues (Rieseberg and Soltis, 1991; Sun et al., 2015; Vargas et al., 2017). These same processes limit the cpDNA for species identification. For example, cpDNA haplotypes of Nothofagus Blume, Eucalyptus L'Hér., Quercus L., Betula L., and Acer L. were more strongly determined by geographic location than by species identity because of the occurrence of localized introgression within these groups (Petit et al., 1993; Palme et al., 2004; Saeki et al., 2011; Premoli et al., 2012; Nevill et al., 2014; Thomson et al., 2015). To date, the occurrence of haplotype sharing in closely related Lecythidaceae species has not been examined at a large scale and it is therefore not possible to conclude to what extent introgression or incomplete lineage sorting might affect this group. We suggest that future studies utilizing cpDNA barcodes for Neotropical Lecythidaceae test species from several shared geographic localities to examine to what extent haplotypes tend to be shared among species at the same localities. Nuclear DNA markers may also be used to examine phylogenetic incongruence and to identify cases where introgression might have occurred.

DATA ACCESSIBILITY

DNA sequences have been deposited to GenBank (accession no. MF359935–MF359958 and BioProject SUB2740669). Plastome alignment, gene alignments, trees, and R code are available at https://bitbucket.org/oscarvargash/lecythidaceae_plastomes . Click here for additional data file.
SpeciesVoucherNo. of reads% of ref seqa Mean coverageNo. of contigsAverage length of assembled contigs (bp)Minimum contig length (bp)Maximum contig length (bp)N50GenBank accession no.
Allantoma decandra (Ducke) S. A. Mori, Ya Y. Huang & PranceMori 25640527,44999.202139157,957105242,44722,733 MF359949
A. lineata (Mart. & O. Berg) MiersChevalier 10101697,74699.602958158,44940034,63332,463 MF359941
Barringtonia edulis Seem.Tsou 1552519,37796.1023010152,805263646,99132,608 MF359956
Bertholletia excelsa Bonpl.Mori 256371,036,87410064614160,472461160,472160,472 MF359948
Cariniana estrellensis (Raddi) KuntzeNee 52828759,0428529270130,03727822,2373803 MF359938
Corythophora amapaensis Pires ex S. A. Mori & PranceMori 24148690,54599.603024159,222664375,00242,750 MF359955
C. labriculata (Eyma) S. A. Mori & PranceMori 25518606,72899.602605158,819659674,89642,691 MF359946
Couratari macrosperma A. C. Sm.Janovec 2506340,696991449156,981110745,97542,740 MF359944
C. stellata A. C. Sm.Mori 24093493,77799.402116158,3121374109,344109,344 MF359936
Couroupita guianensis Aubl.Mori 25516503,41796.5031412154,792107147,69318,977 MF359935
Eschweilera alata A. C. Sm.Prévost 4607851,6831003582158,98161,05197,93097,929 MF359940
E. caudiculata R. KnuthCornejo 8185273,05398.3011611156,630111737,15421,598 MF359957
E. integrifolia (Ruiz & Pav. ex Miers) R. KnuthCornejo 8211440,144991879157,206110542,49721,591 MF359942
E. micrantha (O. Berg) MiersMori 25410289,77598.9012011157,890114342,55118,021 MF359958
E. pittieri R. KnuthCornejo 8208160,625971668154,54755174,87624,636 MF359954
E. wachenheimii (Benoist) SandwithPrévost 4252367,63198.9015111157,757117937,14121,748 MF359939
Grias cauliflora L.Aguilar 7961520,48094.9032641150,76831428,1898102 MF359952
Gustavia augusta L.Mori 24255761,64095.5047633152,60135828,35311,657 MF359943
G. serrata S. A. MoriCornejo 8184534,14398.9033410157,746103535,58622,762 MF359947
Lecythis ampla MiersCornejo 8229606,51894.1024139149,46434828,7596527 MF359951
L. congestiflora BenoistMolino 20191,073,56796.9040527154,88230221,26411,580 MF359937
L. corrugata Poit.Mori 24265544,83190.7024350143,85931728,7414496 MF359950
L. minor Jacq.Tsou 1542666,35594.8041626151,56835431,18814,768 MF359945
L. pneumatophora S. A. MoriMori 25728690,20299.503014158,83211,78275,01949,184 MF359953

Percentage of the sequence recovered in relation to Bertholletia excelsa.

  26 in total

1.  Evolution of Lecythidaceae with an emphasis on the circumscription of neotropical genera: information from combined ndhF and trnL-F sequence data.

Authors:  Scott A Mori; Chi-Hua Tsou; Chi-Chih Wu; Bodil Cronholm; Arne A Anderberg
Journal:  Am J Bot       Date:  2007-03       Impact factor: 3.844

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  DendroPy: a Python library for phylogenetic computing.

Authors:  Jeet Sukumaran; Mark T Holder
Journal:  Bioinformatics       Date:  2010-04-25       Impact factor: 6.937

4.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

5.  Geographic structure of chloroplast DNA polymorphisms in European oaks.

Authors:  R J Petit; A Kremer; D B Wagner
Journal:  Theor Appl Genet       Date:  1993-10       Impact factor: 5.699

6.  Phylogeographically concordant chloroplast DNA divergence in sympatric Nothofagus s.s. How deep can it be?

Authors:  Andrea C Premoli; Paula Mathiasen; M Cristina Acosta; Victor A Ramos
Journal:  New Phytol       Date:  2011-09-01       Impact factor: 10.151

7.  Deep phylogenetic incongruence in the angiosperm clade Rosidae.

Authors:  Miao Sun; Douglas E Soltis; Pamela S Soltis; Xinyu Zhu; J Gordon Burleigh; Zhiduan Chen
Journal:  Mol Phylogenet Evol       Date:  2014-11-18       Impact factor: 4.286

8.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

9.  Tropical plant-herbivore networks: reconstructing species interactions using DNA barcodes.

Authors:  Carlos García-Robledo; David L Erickson; Charles L Staines; Terry L Erwin; W John Kress
Journal:  PLoS One       Date:  2013-01-08       Impact factor: 3.240

10.  ChloroMitoSSRDB 2.00: more genomes, more repeats, unifying SSRs search patterns and on-the-fly repeat detection.

Authors:  Gaurav Sablok; G V Padma Raju; Suresh B Mudunuri; Ratna Prabha; Dhananjaya P Singh; Vesselin Baev; Galina Yahubyan; Peter J Ralph; Nicola La Porta
Journal:  Database (Oxford)       Date:  2015-09-27       Impact factor: 3.451

View more
  2 in total

1.  Characterizing gene tree conflict in plastome-inferred phylogenies.

Authors:  Joseph F Walker; Nathanael Walker-Hale; Oscar M Vargas; Drew A Larson; Gregory W Stull
Journal:  PeerJ       Date:  2019-09-24       Impact factor: 2.984

2.  Parahellenia, a new genus segregated from Hellenia (Costaceae) based on phylogenetic and morphological evidence.

Authors:  Juan Chen; Sijin Zeng; Linya Zeng; Khang Sinh Nguyen; Jiawei Yan; Hua Liu; Nianhe Xia
Journal:  Plant Divers       Date:  2022-03-08
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.