Literature DB >> 27098088

Sequencing of whole plastid genomes and nuclear ribosomal DNA of Diospyros species (Ebenaceae) endemic to New Caledonia: many species, little divergence.

Barbara Turner1, Ovidiu Paun2, Jérôme Munzinger3, Mark W Chase4, Rosabelle Samuel2.   

Abstract

BACKGROUND AND AIMS: Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species.
METHODS: Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices. KEY
RESULTS: The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species.
CONCLUSIONS: In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree because the total number of variable sites was much lower than in the entire plastid genome. The geographical clustering of the individuals against a background of overall low sequence divergence could indicate transfer of plastid genomes due to hybridization and introgression following secondary contact.
© The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company.

Entities:  

Keywords:  Diospyros; New Caledonia; complete plastid genomes; genome skimming; island floras; next-generation sequencing; nuclear ribosomal DNA; rapid radiation

Mesh:

Substances:

Year:  2016        PMID: 27098088      PMCID: PMC4904177          DOI: 10.1093/aob/mcw060

Source DB:  PubMed          Journal:  Ann Bot        ISSN: 0305-7364            Impact factor:   4.357


INTRODUCTION

New Caledonia comprises an archipelago in the southern Pacific known for its characteristic, rich endemic flora (Lowry, 1998; Morat ). Due to its complex geological history, New Caledonia features a mosaic of soil types (Pelletier, 2006; Maurizot and Vendé-Leclerc, 2009), which, in combination with its elevational and climatic heterogeneity, results in many different habitats. One of the genera that has adapted to a wide range of these habitats is Diospyros (Ebenaceae). Diospyros is a large genus of woody dioecious plants found worldwide in the tropics and subtropics, including 31 species in New Caledonia. Previous studies based on plastid markers (Duangjai ) showed that Diospyros colonized New Caledonia at least four times via long-distance dispersal. Two of the successful dispersal events each resulted in a single species that still persists; an additional dispersal event led to a small clade comprising five species; and yet another event gave rise to a putatively rapidly radiating group of 24 endemic species. These 24 species have been shown to be highly similar genetically using low-copy nuclear and plastid markers (Duangjai ; Turner ). Most of these closely related species are morphologically and ecologically clearly differentiated, and current species delimitations (White, 1993) have been generally confirmed by analyses of amplified fragment length polymorphisms (AFLPs; Turner ) and restriction site-associated DNA sequencing (RADseq; Paun ). On New Caledonia, Diospyros species are found in many habitats, but they often grow in proximity to each other. At some localities, several species are microsympatric, which allows interspecific gene flow if reproductive isolation is still incomplete. Dating analysis based on four plastid and two low-copy nuclear DNA regions showed that the ancestors of this group of New Caledonian Diospyros species arrived in New Caledonia around 9 million years ago (Turner ). Given that Diospyros includes long-lived perennial plants, it becomes obvious that they have evolved relatively recently. Resolving the phylogenetic relationships in such a young group of rapidly radiating and potentially hybridizing taxa poses significant challenges (Glor, 2010). We test here the usefulness of next-generation sequencing (NGS)-based genome skimming to obtain phylogenetic data, in particular by sequencing whole plastomes and the full-length nuclear ribosomal DNA region (i.e. nrDNA). The plastid genome has proved useful for molecular phylogenetic investigations of plants at different taxonomic levels. In the past two decades, sequences from the plastid genome have been extensively used to infer phylogenetic relationships among plants (e.g. Chase ; Barfuss ; Duangjai ; Russell ). Uniparental inheritance, low mutation rates and high copy number are well-known features of plastid genomes and the basis for their standard usage in plant systematics. Due to the slow rate of evolution of the plastid genome, the level of variation is often low compared with nuclear DNA in general and mitochondrial markers in animals (Schaal ). The high level of conservation has prevented the development of a universally applicable single barcoding region in plants (CBOL Plant Working Group, 2009). Recently, whole-plastid genome sequencing has become affordable, and this has been employed to generate more highly resolved phylogenetic trees (e.g. Ku ; Yang ; Barrett ; Malé ). Genes coding for nuclear ribosomal RNA are found in the genome in multiple copies arranged in tandem repeats, and therefore it is feasible to obtain full-length sequences from low-coverage NGS approaches. Because of concerted evolution, these thousands of copies mostly behave as single-copy genes, and potential evidence of hybridization is normally eliminated within a few generations (Chase ). Each nrDNA repeat consists of coding and non-coding elements. In plants, the four rRNA genes are arranged in two clusters. One of these clusters comprises three genes (18S, 5·8S and 26S) separated by two internal transcribed spacers (ITSs), the external transcribed spacer (ETS) and the non-transcribed spacer (NTS). The second cluster includes the coding region of one rRNA (5S) and a spacer between the repeats. The nrDNA genes are relatively conserved but contain enough variation that they have been used for phylogenetic reconstructions at higher taxonomic levels (Maia ). The non-coding spacers, ITS and ETS, between the genes are much more variable and represent a useful source of phylogenetic information among closely related species (e.g. Devos ; Sanz ; Akhani ; Nürk ; Zhu ). Here we aimed to use whole plastid genomes as well as complete sequences of the nrDNA for phylogenetic analyses for species of the New Caledonian Diospyros to investigate whether this approach would produce an improved estimate of phylogenetic relationships in this putatively rapidly radiating group.

MATERIALS AND METHODS

Leaf material from New Caledonian Diospyros species was collected on the main island, Grande Terre, and on one smaller island in the south, Île des Pins (Fig. 1). For the widespread species, at least two representative individuals were sequenced (Table 1). In total we included here 36 individuals of New Caledonian Diospyros species (corresponding to 21 identified and one unidentified species) as well as one individual of Diospyros olen (from New Caledonia, but not closely related to the other New Caledonian Diospyros species) and one individual of Diospyros ferrea from Thailand. Diospyros olen and D. ferrea were used as outgroups. Wherever possible, we used the exact same individuals for which Sanger sequence data are available from previous studies (Turner ). However, because of poor DNA quality (unsuitable for NGS) or unavailability for some of the accessions, we had to include in this study another accession from the same population or locality.
F

Map of New Caledonia indicating the 26 sampling localities for this study. Numbered dots indicate sampling sites (see also Table 1). Dots are coloured according to sampling region (north, green; middle, blue; south, red).

T

Table of accessions including all individuals used in this study. The identification numbers of sampling localities are given in Fig. 1. Voucher codes: JMXXXX: collection number J. Munzinger; Tree No. XXXXX: Tree of New Caledonian Plant Inventory and Permanent Plot Network (NC-PIPPN, Ibanez ); KUFF, Herbarium of the Faculty of Forestry Kasertsat University Bangkok; MPU, Herbarium of the University of Montpellier; NOU, Herbarium of IRD Nouméa; P, Herbarium of the Natural History Museum Paris; WU, Herbarium of the University of Vienna

TaxonAccession numberSampling locationVoucher
D. calciphila F.WhiteBT31325MPU026746
D. cherrieri F.WhiteBT29323NOU079547
D. erudita F.WhiteBT28021WU062858
D. ferrea (Wild.) Bakh.Eb045Duangjai 106 (KUFF)
D. flavocarpa (Vieill. ex P.Parm.) F.WhiteBT1309MPU026741
BT15611MPU026737
D. glans F.WhiteBT0935Turner et al. 093 (MPU)
D. impolita F.WhiteBT1026NOU019538
D. inexplorata F.WhiteBT30824NOU005818
D. labillardierei F.WhiteBT1229NOU052188
D. minimifolia F.WhiteBT13510NOU019556
BT23317NOU019554
BT26320NOU079549, WU062872
D. olen HiernBT0011NOU052191
D. pancheri Kosterm.BT0283MPU026742
BT0313MPU026742
D. parviflora (Schltr.) Bakh.BT0414Turner et al. s.n. (NOU)
BT0905NOU2519
BT14710NOU052175
BT18713NOU031409
BT25019Tree no. 23109
BT29022NOU079550
D. perplexa F.WhiteBT0041MPU026738
D. pustulata F.WhiteBT1117NOU019572
BT14010NOU052177
BT26120NOU079544
D. revolutissima F.WhiteBT1208NOU023189
BT22116NOU084762
D. tridentata F.WhiteBT20714NOU052179
D. trisulca F.WhiteBT18513NOU031344
D. umbrosa F.WhiteBT17612JM6635 (NOU)
BT24719NOU023234
D. veillonii F.WhiteBT22717NOU019582
D. vieillardii (Hiern) Kosterm.BT0252Turner et al. s.n. (NOU)
BT1005NOU006676
BT21515NOU023242
D. yahouensis (Schltr.) Kosterm.BT23818P00057340
D. sp. Pic N’gaBT31826NOU054315
Map of New Caledonia indicating the 26 sampling localities for this study. Numbered dots indicate sampling sites (see also Table 1). Dots are coloured according to sampling region (north, green; middle, blue; south, red). Table of accessions including all individuals used in this study. The identification numbers of sampling localities are given in Fig. 1. Voucher codes: JMXXXX: collection number J. Munzinger; Tree No. XXXXX: Tree of New Caledonian Plant Inventory and Permanent Plot Network (NC-PIPPN, Ibanez ); KUFF, Herbarium of the Faculty of Forestry Kasertsat University Bangkok; MPU, Herbarium of the University of Montpellier; NOU, Herbarium of IRD Nouméa; P, Herbarium of the Natural History Museum Paris; WU, Herbarium of the University of Vienna DNA was extracted from silica gel-dried leaf material using a modified sorbitol/high-salt cetyltrimethylammonium bromide (CTAB) method (Tel-Zur ). Extracts were purified using the NucleoSpin gDNA Clean-up kit (Marcherey-Nagel, Germany), according to the manufacturer’s protocol. From each sample, 300 ng of DNA was sheared (in two cycles of 45 s with 30 s break between the two shearing runs) using an ultrasonicator (Bioruptor Pico, Diagenode, Belgium), targeting a mean fragment size of 400 bp. Library preparation was performed using the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, USA) according to the manufacturer’s protocol. All individuals were barcoded and pooled to reach an equal representation of each individual in the final libraries. In total, two libraries (containing 14 and 24 samples per library, respectively) were prepared for the 38 samples used. The two libraries were sequenced on an Illumina HiSeq as 100-bp paired-end reads at the VBCF (Vienna Biocenter Core Facilities, Vienna, Austria; http://www.vbcf.ac.at/facilities/next-generation-sequencing/). Demultiplexing of the raw data was performed, allowing for a maximum of one mismatch using the Picard BamIndexDecoder (included in the Picard Illumina2bam package; available from https://github.com/wtsi-npg/illumina2bam). The number and quality of raw reads obtained from each individual were evaluated with FastQC (Andrews, 2010).

Assembling and annotating plastid genomes

Reads originating from the plastid genome (pt DNA) were filtered using a multistep and iterative in-house established pipeline. First, the individual raw files were imported into the CLC GENOMIC WORKBENCH v. 6.5 (Qiagen) and trimmed by quality at P <0·05, retaining reads of at least 30 bp. Then the reads of D. ferrea were mapped on the complete plastid genome of Camellia sinensis (Theaceae, Ericales, GenBank: KC143082.1). For this initial mapping of both coding and non-coding regions (the latter comprising introns and intergenic spacers), a mismatch cost of 2 and insertion and deletion cost of 3 were used, requiring at least 80 % of a read to be at least 90 % similar to the target for each successful mapping. With these settings, 403 980 reads of D. ferrea mapped to the Camellia plastome. We re-extracted the initial paired-end read data corresponding to these reads using FastQ.filter.pl (Rodriguez-R and Konstantinidis, 2016, available at https://github.com/lmrodriguezr/enveomics/) and have assembled them de novo in the CLC GENOMIC WORKBENCH, with automatic optimization of the word and bubble sizes and updating the contigs after mapping back the reads. We obtained three contigs, which have been concatenated by aligning them to the C. sinensis reference sequence manually in the program BioEdit v. 7.2.5 (Hall, 1999). From the coverage information and by comparison with the C. sinensis reference we were able to identify both inverted repeats and confirm their presence in the plastid genome of Diospyros. Both inverted repeats were reconstructed together and duplicated to represent a complete plastid genome. The base composition of the assembled contigs was extracted from the alignments using BioEdit v. 7.2.5 (Hall, 1999). The plastid genomes of the rest of the Diospyros species were obtained in a similar way, but the initial mapping step was performed on the assembled D. ferrea genome. Finally, annotation of coding regions was performed using DOGMA (Wyman ) using only the D. vieillardii BT025 plastid genome. The circular plastid genome map (Fig. 2) was visualized with OGDRAW (Lohse ).
F

Graphic representation of the annotated plastid genome of Diospyros vieillardii.

Graphic representation of the annotated plastid genome of Diospyros vieillardii. Due to the difficulties with assembling the mitochondrial genome of plants as well as the low level of phylogenetic information in plant mtDNA (Malé ), we did not attempt to assemble the mitochondrial genome of Diospyros.

Assembly of nrDNA repeats

Reads containing sequences from the nrDNA region were collected and assembled using the program MITObim v. 1.7 (Hahn ). As initial seed, we used previously generated Sanger sequences from the ITS region of Diospyros vieillardii BT025 (B. Turner, unpubl. res.). Characteristics of assemblies such as the number of reads assembled in each contig and coverage were inspected from the MITObim output files. Assembled sequences were aligned using Muscle v. 3.8 (Edgar, 2004). The alignment was manually inspected using the program BioEdit v. 7.2.5 (Hall, 1999). The beginning and end of each coding region were estimated by comparing the Diospyros alignment with annotated sequences of Solanum lycopersicum (GenBank: AY366529, AY552528). We also extracted the 5S nrDNA region using the procedure described above. As initial seed we used the 120-bp coding fragment of 5S nrDNA sequences of Actinidia chinensis (GenBank: AF394578).

Phylogenetic analyses

For the phylogenetic analyses of the plastid sequence data, only one copy of the inverted repeat was included in the final alignment. For analyses of the nrDNA sequences, only the transcribed regions were used. Parsimony analyses including bootstrapping were performed using PAUP* v. 4b10 (Swofford, 2003). They were run using a heuristic search with stepwise addition, random sequence addition (1000 replicates) and tree bisection–reconnection. Gaps were treated as missing data. To estimate clade support, bootstrapping with 1000 replicates was performed using the same settings as above (including 1000 random replicates per bootstrap replicate). Estimation of consistency index (CI) and retention index (RI) was done with PAUP. Likelihood analysis including bootstrapping was performed using RAxML v. 8.1.3 (Stamatakis, 2014). We used the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method to optimize generalized time reversible (GTR) rate parameters, the gamma model of rate heterogeneity and 1000 rapid bootstrap inferences with a subsequent thorough maximum likelihood (ML) search. The results were visualized with FigTree 1·4 (available from http://tree.bio.ed.ac.uk/software/figtree/). We rooted the trees obtained with D. olen according to earlier results (Turner ). To reduce the file size and speed up analyses in the combined analyses, D. olen and constant positions were removed using Mesquite v. 3.01 (Maddison and Maddison, 2014). In trees conducted from the combined data set D. ferrea was used as outgroup. Parsimony and likelihood analyses were performed as described for the individual data sets. In addition, we conducted a Bayesian analysis for the combined data set using BEAST v. 1.8.1 (Drummond ). The best evolutionary models for the two subsets (ptDNA and nrDNA) were evaluated using jModeltest v. 2.1.6 (Guindon and Gascuel, 2003; Darriba ). For the plastid partition, the transversional model with equal frequencies (TVMef; Posada, 2003) showed the best fit to the data, and for the nrDNA partition Tamura and Nei’s model (TrN; Tamura and Nei, 1993) with among-site rate variation modelled with a gamma distribution (TrN+Γ) showed the best fit. Base frequencies (uniform), substitution rates among bases (gamma shape 10) and alpha (gamma shape 10) were inferred by jModeltest for each data set. For flexibility, we used a relaxed uncorrelated log-normal clock model (Drummond ). As a speciation model, we used a Yule model because the investigated group is so young that we expected a low proportion of lineage extinction (Yule, 1925; Gernhard, 2008). For further details regarding the parameters used see Supplementary Data Fig. S1. Two independent Metropolis-coupled Markov chain Monte Carlo (MCMC) analyses, each with 20 million generations, were run, sampling each 1000th generation. The initial 10 % of trees obtained from each MCMC run were removed as burn-in; the remaining trees of both runs were used to calculate the maximum clade credibility tree. For directly comparing the results of the present study with previous ones (Turner ) based on four plastid markers (atpB, rbcL, trnK-matK, trnS-trnG; 5979 bp) and two low-copy nuclear markers (ncpGS, 717 bp; PHYA, 1189 bp), a subset of those data corresponding to the individuals included in this study was analysed in the same way as described above for the Bayesian analysis. In most cases, to represent each species we used either the same accession or individuals from the same population. In the few cases where no data from individuals of the same population were available, then the geographically closest individual assigned to the same species was used. Due to these differences, results of the previous studies are not strictly comparable to those produced here, but we have compared them nonetheless due to their overall similarity. A general pattern of geographical clustering was visually observed in the resulting trees, in particular in the plastid data. Based on the geographical coordinates of samples and the distance matrix of pairwise uncorrected P-values (calculated with SplitsTree), we tested the significance of geographical clustering of the samples in the trees using a Mantel test. We estimated the correlation of geographical and genetic discontinuities in the data by performing this analysis on the plastid and the combined data sets with Isolation by Distance web service (IBDWS) (Jensen ). To investigate the relationships between populations we constructed a neighbour network based on the plastid markers, using SplitsTree4 v. 4.13.1 (Hudson and Bryant, 2006) and uncorrected P distance estimates. Based on the results from the Mantel test [Table 2 and RADseq results (Paun )], we excluded samples of D. olen, D. ferrea, D. vieillardii, D. umbrosa, D. flavocarpa, D. cherrieri and D. veillonii from this analysis to get a clearer picture of the relationships within the recently and rapidly radiated species group.
T

The extent of geographical clustering in the data, as evidenced with Mantel tests performed on IBDWS

DatasetMantel’s rSignificance
Plastid data
  Including all individuals, except the outgroups D. olen and D. ferrea0·103P = 0·09
  Excluding D. olen, D. ferrea and all D. vieillardii0·242P < 0·001
  Excluding D. olen, D. ferrea, D. vieillardii, D. umbrosa and D. flavocarpa0·383P < 0·001
  Excluding D. olen, D. ferrea, D. vieillardii, D. umbrosa, D. flavocarpa, D. cherrieri and D. veillonii0·427P < 0·001
Combined
  Including all individuals except the outgroups D. olen and D. ferrea0·079P = 0·10
  Excluding D. olen, D. ferrea and all D. vieillardii0·121P < 0·05
  Excluding D. olen, D. ferrea, D. vieillardii, D. umbrosa and D. flavocarpa0·302P < 0·001
  Excluding D. olen, D. ferrea, D. vieillardii, D. umbrosa, D. flavocarpa, D. cherrieri and D. veillonii0·374P < 0·001
The extent of geographical clustering in the data, as evidenced with Mantel tests performed on IBDWS Details of samples used here for sequencing of whole plastid genomes and nrDNA

RESULTS

After the demultiplexing step, the number of raw Illumina sequences ranged from 10 to 29 million paired-end reads per individual. Details of sample characteristics are given in Table 3.
T

Details of samples used here for sequencing of whole plastid genomes and nrDNA

Raw readsPlastid genome
Nuclear ribosomal DNA
Percentage of reads mappingCoverage (×)GC (%)Percentage of reads mappingCoverage (×)GC (%)
D. calciphila BT313195499681·3416237·370·1744459·24
D. cherrieri BT293151646583·3030937·360·1122958·52
D. erudita BT280126121183·0323637·360·1524659·36
D. ferrea Eb04599436464·8130037·340·5155958·46
D. flavocarpa BT130101295140·996237·360·2933157·07
D. flavocarpa BT156156567920·535137·370·1021958·53
D. glans BT093144363160·918137·360·1938658·48
D. impolita BT102186363981·1413137·360·0820959·26
D. inexplorata BT308162010561·0610637·360·2451159·65
D. labillardierei BT122265740120·9515637·360·1553858·6
D. minimifolia BT135266853141·1919937·360·0931458·82
D. minimifolia BT233255260860·8413437·350·1338458·24
D. minimifolia BT263161546301·9519837·350·2961659·14
D. olen BT001249666881·5123637·440·38108458·32
D. pancheri BT028275901240·8013637·350·0931659·38
D. pancheri BT031294530860·7113037·360·0936158·98
D. parviflora BT041271783160·8314037·360·0935658·54
D. parviflora BT090254329780·406337·370·1965659·34
D. parviflora BT147178873040·657137·360·2869058·69
D. parviflora BT187266489840·457437·360·2169056·6
D. parviflora BT250175888280·434737·360·3787759·02
D. parviflora BT290191873560·394737·360·2873759·08
D. perplexa BT004180855060·748237·360·3059557·43
D. pustulata BT111159584182·4624237·360·2846757·69
D. pustulata BT140170295061·8018837·360·2658059·52
D. pustulata BT261155850901·1911437·360·3774559·94
D. revolutissima BT221208675761·5019337·370·2055157·98
D. revolutissima BT120171117701·4615437·350·2146659·16
D. tridentata BT207141902920·897737·350·2135657·79
D. trisulca BT185172978160·758037·360·2971058·67
D. umbrosa BT176188566420·859837·380·4863255·92
D. umbrosa BT247288457560·5910437·360·1038559·19
D. veillonii BT227265941581·8029737·360·30105658·65
D. vieillardii BT025265957760·9115237·340·38116058·29
D. vieillardii BT100226493442·1029737·330·1132758·52
D. vieillardii BT215277100300·8013837·320·1450158·94
D. yahouensis BT238292427160·8014437·360·1557859·68
D. sp. Pic N’ga BT318185153504·6353137·360·2150059·37

Plastid genomes

We obtained between 75 262 (47× average coverage, for D. parviflora BT250) and 857 039 (531× coverage, for D. sp. Pic N’ga BT318) pairs of reads per individual that mapped to the plastid genome (for details see Table 3). The GC content of the plastid genomes of Diospyros (∼37 %) is similar to those of many other angiosperms [average ∼37 %; e.g. Ardisia (Ku ); Camellia (Yang ); Potentilla (Ferrarini ); Musa (Martin ); Zingiberales (Barrett )]. The size (∼ 157 kb) and gene order of the plastid genome of D. vieillardii (Fig. 2) is similar to that of C. sinensis (GenBank: KC143082.1). This plastid genome is the first fully sequenced plastid genome of Ebenaceae reported in the literature. The plastid matrix of Diospyros used for phylogenetic analyses (including only one of the inverted repeats) included 133 210 characters, of which 1295 variable positions were parsimony-uninformative and 384 (0·3 %) were potentially parsimony-informative. Phylogenetic reconstruction using parsimony produced 1127 equally parsimonious trees (results not shown) with a CI of 0·93 and RI of 0·87. Phylogenetic relationships between the species were in many cases not well supported, and in several cases individuals of the same species failed to form well-supported clusters. Similar results were obtained using maximum likelihood (Supplementary Data Fig. S2).

Nuclear ribosomal DNA

Between 0·08 % (D. impolita BT102) and 0·51 % (D. ferrea Eb045) of the total reads pertained to the nrDNA region (for details see Table 3). The contigs obtained had average coverages ranging from 209× (for D. impolita BT102) to 1159× (for D. vieillardii BT025). The NTS of the intergenic spacer was difficult to align and therefore excluded from further analyses. The aligned nrDNA matrix of Diospyros included 7233 characters, of which 368 variable positions were parsimony-uninformative and 141 (1·9 %) were potentially parsimony-informative. The parsimony phylogenetic reconstruction with only the nrDNA sequences produced 84 equally parsimonious trees (results not shown) with a CI of 0·66 and RI of 0·53. Phylogenetic relationships generally disagreed with results obtained from other markers, but these incongruent relationships are all weakly supported (results not shown). Several species failed to form unique groups. Similar results were obtained using maximum likelihood (Supplementary Data Fig. S3). The assembled 5S nrDNA region containing coding and non-coding parts was short (less than 500 bp) and therefore did not contain many informative characters. Phylogenetic trees based on this fragment were poorly resolved, and therefore the trees are not presented or discussed further. The 5S nrDNA region was also not included in the combined analysis.

Analyses of the combined data set

In addition to the individual analyses of the two data sets, we also combined them to determine whether this approach provides better resolution/support. The combined matrix (i.e. reduced to variable positions as explained in Materials and methods) included 1136 characters, of which 437 were potentially parsimony-informative. Phylogenetic reconstruction using parsimony produced a single most parsimonious tree of 1580 steps (Supplementary Data Fig. S4) with a CI of 0·73 and RI of 0·52. Phylogenetic relationships depicted in the combined analysis were better resolved than in the trees obtained from the individual analyses. A comparable pattern was found in the Bayesian analysis of the combined data set (Fig. 3). All three individuals of D. vieillardii clustered together and were sister to the rest of the New Caledonian endemic species group. Individuals from D. flavocarpa and D. umbrosa formed unique groups and were sister to the rest of the remaining accessions, among which D. cherrieri and D. veillonii were sister to the rest. The results from the combined analysis were also in agreement with earlier results based on plastid and nuclear markers (Turner ).
F

Phylogenetic trees resulting from the Bayesian analysis of the (A) combined data set of the present study (variable nucleotide positions in the whole plastid genome and nrDNA) and (B) combined data set of a previous study including four plastid and two low-copy nuclear markers (Turner ). The numbers indicate posterior probabilities >90. Trees are scaled to the same branch length. Samples are coloured according to sampling region (see Fig. 1).

Phylogenetic trees resulting from the Bayesian analysis of the (A) combined data set of the present study (variable nucleotide positions in the whole plastid genome and nrDNA) and (B) combined data set of a previous study including four plastid and two low-copy nuclear markers (Turner ). The numbers indicate posterior probabilities >90. Trees are scaled to the same branch length. Samples are coloured according to sampling region (see Fig. 1). There is a trend of geographical clustering visible in the Bayesian tree (Fig. 3) and in the neighbour network (Supplementary Data Fig. S5). The neighbour network (Fig. S5) clearly shows that individuals and populations from the south and the middle of New Caledonia clustered according to their sampling region. The Mantel test (Table 2) confirmed a significant geographical clustering of the genetic information (Fig. 3), in particular in the plastid data across the crown group.

DISCUSSION

Previous standard approaches to phylogenetic analysis of Diospyros species including samples from New Caledonia used nine (Duangjai ) and four (Turner ) plastid markers (alignment length >8 and 6·4 kb, respectively). They demonstrated low levels of sequence divergence among these species, indicating a fairly recent and rapid radiation. Inclusion of low-copy nuclear markers, which have been shown to be highly informative and useful for resolving phylogenetic relationships at lower taxonomic levels in some taxa [e.g. Paeonia (Tank and Sang, 2001) and Passiflora (Yockteng and Nadot, 2004)], only partly improved resolution among the New Caledonian Diospyros species (Turner ). Similar results have been observed in two genera of Cunoniaceae from New Caledonia [Spiraeanthemum (Pillon ) and Codia (Pillon )]. AFLP markers are typically used at low taxonomic level in closely related species and population analyses (e.g. Despré ; Tremetsberger ; Gaudeul ). In the case of the New Caledonian Diospyros species, we used AFLPs to evaluate species limits (Turner ), and in most cases we found congruence with the species concepts of White (1993). However, the AFLP approach was not useful in resolving phylogenetic relationships among these species (Turner ) because there were few of the individual AFLP fragments shared by two or more species. It would appear from the AFLP results that fragmentation of an original widespread population occurred in many regions of New Caledonia more or less simultaneously, resulting in genetically distinct populations that correspond to the morphologically based species delimitations of White (1993) without leaving much evidence for interspecific relationships. The phylogenetic tree obtained here based on the whole plastid genome (Fig. S2) is similar to the phylogenetic tree previously based on four plastid markers (Turner ). The combined tree of this study (Fig. 3A) is similar both in resolution and structure to the phylogenetic tree based on four plastid and two low-copy nuclear markers (Fig. 3B). Although not always represented by the same individuals, general relationships of species are in agreement. The nuclear ribosomal region included a higher percentage of parsimony-informative characters (1·9 versus 0·3 %) than the plastid DNA, which is in agreement with findings in other plant groups (Hamby and Zimmer, 1992; Doyle, 1993; Malé ). Despite this variability, nrDNA was still too short to contain enough variation (141 versus 384 informative sites) and failed to resolve phylogenetic relationships among the New Caledonian Diospyros species (Fig. S3). Our results clearly show that for this group of species, in which standard plastid and nuclear markers were not helpful for resolving the phylogenetic relationships, using the whole plastid genome does not greatly increase resolution and support. There are only a few other studies available in which whole plastid genomes have been used to resolve phylogenetic relationships at the intraspecific level among closely related species. Some studies [e.g. in Chrysobalanaceae (Malé ) and eucalypts (Myrtaceae) (Bayly )] have revealed that this approach can be useful for resolving phylogenetic relationships among genera, but they failed to resolve phylogenetic relationships among closely related species and to group together individuals of the same taxonomic species. In cases of recently radiating species groups, in particular following an extreme bottleneck associated with a long-distance dispersal event such as the arrival of Diospyros in New Caledonia, plastid genomes appear to be insufficient for inference of phylogenetic relationships. The basis of the rapid radiation of Diospyros in New Caledonia is not yet clear, but it has been speculated that it is an adaptive origin associated with different soil types (Paun ), as recently shown for the genus Geissois in Cunoniaceae (Pillon ). The individuals of D. vieillardii, D. umbrosa, D. flavocarpa, D. cherrieri and D. veillonii form a minimally isolated group in the combined analyses (Fig. 3). These species form clusters that are successively sister to the rest of the taxa, which are well supported collectively but form a highly unresolved central cluster. This unresolved central cluster is less than 6 million years old (Paun ) and could be the result of two lineages that developed in isolation and then subsequently colonized some of the same habitats. This too could be the result of simultaneous parallel divergence in different parts of New Caledonia combined with effects of local and more recent hybridization (as was indicated in the AFLP study; Turner ). Retention of ancient polymorphisms present in the original colonizing population, which probably also underwent a severe bottleneck, could not produce such a geographically structured pattern. Comparisons of the phylogenetic tree based on plastid sequences (Fig. S2) with the tree derived from RAD data [Paun (Supplementary Data Fig. S6)] showed several clusters of individuals (D. trisulca and D. parviflora from L13, D. pustulata and D. minimifolia from L20, D. pancheri and D. parviflora from L3 and 4; Table 1, Fig. 1) that occur with high statistical support in the plastid results, but are not present in the nuclear tree. These clusters consist of individuals found in the same or very nearby locations, which could indicate introgressive hybridization and transfer of plastid genomes as a relevant phenomenon (Naciri and Linder, 2015, and references therein). Geographical rather than taxonomic clustering was observed for all populations of D. minimifolia and D. parviflora that also failed to form unique groups in nuclear results [AFLP (Turner ) RAD (Paun )]. Phenomena like introgressive hybridization and transfer of plastid genomes could also explain the geographical clustering of individuals observed in the plastid data set (Figs S2 and S5; Table 2), whereas in nuclear-based data sets [AFLP (Turner ), RAD (Paun )] no such geographical clustering was observed. Similar geographical clustering for plastid results has previously been reported in other plant groups [e.g. Nothofagus (Acosta and Premoli, 2010)]. Although there are more than 140 kb of DNA sequence included in this study, it effectively corresponds to only two markers (plastid genome and rDNA region). As not all genes evolve in the same mode and at the same tempo, phylogenies based on different genes might show different phylogenetic relationships (Heled and Drummond, 2010). It is therefore important to use many phylogenetic markers for reconstruction of relationships to overcome the limitations of individual genes and produce results as close as possible to the real species tree. Phylogenetic trees based on plastid data should always be viewed as gene trees because of the typical evolutionary pathways of these organelles (Naciri and Linder, 2015). The trees presented here may hence show the evolutionary history of the particular region investigated, and may differ from the species trees. In the case of the New Caledonian Diospyros species, we consider the SNP data derived from RAD (Paun ) as the best approximation of the species tree, because it involves nearly 8500 independent markers from the nuclear genome.

Conclusions

Although New Caledonian Diospyros are morphologically and ecologically diverse, they show little genetic divergence. For these rapidly radiating Diospyros species, in which standard plastid and nuclear markers were not helpful for resolving the phylogenetic relationships, using the whole plastid genome does not greatly increase resolution and support. Plastid markers grouped accessions according to geographical provenance, which could result from local transfer of plastid genomes due to hybridization and introgression following secondary contact. We are now conducting additional nuclear genome studies (both coding and non-coding regions) to determine whether other approaches could help us determine the potential adaptive nature of this radiation, which has thus far defeated our attempts using standard and next-generation methods.

SUPPLEMENTARY DATA

Supplementary data are available online at www.aob.oxfordjournals.org and consist of the following. Figure S1: BEAST input file for the Bayesian analysis of the combined data set including all settings and priors of both data sets. Figure S2: maximum likelihood tree based on plastid sequences. Figure S3: maximum likelihood tree based on nrDNA sequences. Figure S4: single most parsimonious tree based on the combined data matrix. Figure S5: neighbour network based on the plastid data set. Individuals are coloured according to their sampling region. Figure S6: maximum likelihood tree based on RAD SNP data.
  37 in total

1.  Using AFLP to resolve phylogenetic relationships in a morphologically diversified plant species complex when nuclear and chloroplast sequences fail to reveal variability.

Authors:  Laurence Després; Ludovic Gielly; Bénédicte Redoutet; Pierre Taberlet
Journal:  Mol Phylogenet Evol       Date:  2003-05       Impact factor: 4.286

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  AFLP markers provide insights into the evolutionary relationships and diversification of New Caledonian Araucaria species (Araucariaceae).

Authors:  Myriam Gaudeul; Germinal Rouhan; Martin F Gardner; Peter M Hollingsworth
Journal:  Am J Bot       Date:  2011-12-19       Impact factor: 3.844

4.  Evidence of chloroplast capture in South American Nothofagus (subgenus Nothofagus, Nothofagaceae).

Authors:  M Cristina Acosta; Andrea C Premoli
Journal:  Mol Phylogenet Evol       Date:  2009-08-14       Impact factor: 4.286

5.  Using MODELTEST and PAUP* to select a model of nucleotide substitution.

Authors:  David Posada
Journal:  Curr Protoc Bioinformatics       Date:  2003-02

6.  Phylogenetic relationships in subfamily Tillandsioideae (Bromeliaceae) based on DNA sequence data from seven plastid regions.

Authors:  Michael H J Barfuss; Rosabelle Samuel; Walter Till; Tod F Stuessy
Journal:  Am J Bot       Date:  2005-02       Impact factor: 3.844

7.  Genome skimming by shotgun sequencing helps resolve the phylogeny of a pantropical tree family.

Authors:  Pierre-Jean G Malé; Léa Bardon; Guillaume Besnard; Eric Coissac; Frédéric Delsuc; Julien Engel; Emeline Lhuillier; Caroline Scotti-Saintagne; Alexandra Tinaut; Jérôme Chave
Journal:  Mol Ecol Resour       Date:  2014-04-02       Impact factor: 7.090

8.  Phylogenetic relationships among Passiflora species based on the glutamine synthetase nuclear gene expressed in chloroplast (ncpGS).

Authors:  Roxana Yockteng; Sophie Nadot
Journal:  Mol Phylogenet Evol       Date:  2004-04       Impact factor: 4.286

9.  Isolation by distance, web service.

Authors:  Jeffrey L Jensen; Andrew J Bohonak; Scott T Kelley
Journal:  BMC Genet       Date:  2005-03-11       Impact factor: 2.797

10.  Molecular phylogenetics of New Caledonian Diospyros (Ebenaceae) using plastid and nuclear markers.

Authors:  Barbara Turner; Jérôme Munzinger; Sutee Duangjai; Eva M Temsch; Reinhold Stockenhuber; Michael H J Barfuss; Mark W Chase; Rosabelle Samuel
Journal:  Mol Phylogenet Evol       Date:  2013-07-12       Impact factor: 4.286

View more
  6 in total

1.  Testing the Complete Plastome for Species Discrimination, Cryptic Species Discovery and Phylogenetic Resolution in Cephalotaxus (Cephalotaxaceae).

Authors:  Jie Wang; Chao-Nan Fu; Zhi-Qiong Mo; Michael Möller; Jun-Bo Yang; Zhi-Rong Zhang; De-Zhu Li; Lian-Ming Gao
Journal:  Front Plant Sci       Date:  2022-05-04       Impact factor: 6.627

2.  Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny.

Authors:  Xiang-Qin Yu; Bryan T Drew; Jun-Bo Yang; Lian-Ming Gao; De-Zhu Li
Journal:  PLoS One       Date:  2017-06-02       Impact factor: 3.240

3.  Our love-hate relationship with DNA barcodes, the Y2K problem, and the search for next generation barcodes.

Authors:  Jeffrey M Marcus
Journal:  AIMS Genet       Date:  2018-01-17

4.  The complete plastome sequence of Diospyros blancoi A. DC. (Ebenaceae).

Authors:  Sangjin Jo; Hoe-Won Kim; Young-Kee Kim; Se-Hwan Cheon; Ki-Joong Kim
Journal:  Mitochondrial DNA B Resour       Date:  2016-09-01       Impact factor: 0.658

5.  Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification.

Authors:  Kristine Bohmann; Siavash Mirarab; Vineet Bafna; M Thomas P Gilbert
Journal:  Mol Ecol       Date:  2020-06-29       Impact factor: 6.185

6.  On the applicability of the Tubulin-Based Polymorphism (TBP) genotyping method: a comprehensive guide illustrated through the application on different genetic resources in the legume family.

Authors:  Luca Braglia; Floriana Gavazzi; Laura Morello; Silvia Gianì; Peter Nick; Diego Breviario
Journal:  Plant Methods       Date:  2020-06-12       Impact factor: 4.993

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.