| Literature DB >> 26442024 |
James A Nicholls1, R Toby Pennington2, Erik J M Koenen3, Colin E Hughes3, Jack Hearn4, Lynsey Bunnefeld4, Kyle G Dexter5, Graham N Stone4, Catherine A Kidner6.
Abstract
Evolutionary radiations are prominent and pervasive across many plant lineages in diverse geographical and ecological settings; in neotropical rainforests there is growing evidence suggesting that a significant fraction of species richness is the result of recent radiations. Understanding the evolutionary trajectories and mechanisms underlying these radiations demands much greater phylogenetic resolution than is currently available for these groups. The neotropical tree genus Inga (Leguminosae) is a good example, with ~300 extant species and a crown age of 2-10 MY, yet over 6 kb of plastid and nuclear DNA sequence data gives only poor phylogenetic resolution among species. Here we explore the use of larger-scale nuclear gene data obtained though targeted enrichment to increase phylogenetic resolution within Inga. Transcriptome data from three Inga species were used to select 264 nuclear loci for targeted enrichment and sequencing. Following quality control to remove probable paralogs from these sequence data, the final dataset comprised 259,313 bases from 194 loci for 24 accessions representing 22 Inga species and an outgroup (Zygia). Bayesian phylogenies reconstructed using either all loci concatenated or a gene-tree/species-tree approach yielded highly resolved phylogenies. We used coalescent approaches to show that the same targeted enrichment data also have significant power to discriminate among alternative within-species population histories within the widespread species I. umbellifera. In either application, targeted enrichment simplifies the informatics challenge of identifying orthologous loci associated with de novo genome sequencing. We conclude that targeted enrichment provides the large volumes of phylogenetically-informative sequence data required to resolve relationships within recent plant species radiations, both at the species level and for within-species phylogeographic studies.Entities:
Keywords: Inga; hybrid capture; next-generation sequencing; phylogenomics; population genomics; radiation; targeted enrichment
Year: 2015 PMID: 26442024 PMCID: PMC4584976 DOI: 10.3389/fpls.2015.00710
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Assembly metrics for transcriptomes from three .
| Total length of reads (bp) | 4,860,152,926 | 5,854,098,168 | 5,094,403,337 | 15,808,654,431 |
| Length of assembled transcriptome (bp) | 54,758,785 | 62,265,951 | 87,692,013 | 123,506,371 |
| Total number of contigs | 65,927 | 62,830 | 81,260 | 138,263 |
| N50 length (bp) | 1356 | 1668 | 1836 | 1615 |
| N50 number | 13,671 | 11,907 | 15,067 | 23,656 |
| Maximum contig length (bp) | 13,370 | 14,577 | 22,182 | 17,299 |
| % GC | 42.5 | 42.9 | 41.9 | 42.1 |
| % of CEGMA proteins present as complete copies | 94 | 97 | 97 | 96 |
| Average number of complete orthologs per CEG | 2.0 | 2.2 | 2.2 | 2.0 |
| % of CEGMA proteins present as partial copies | 97 | 99 | 97 | 99 |
| Average number of partial orthologs per CEG | 2.3 | 2.4 | 2.4 | 2.4 |
Read counts and percentage of reads that map to either the target locus set or the .
| FG82 | 736,094 | 77.66 | 5.98 | |
| FG156 | 829,432 | 76.45 | 11.69 | |
| FG113 | 772,882 | 76.01 | 6.70 | |
| FG185 | 678,681 | 77.21 | 6.75 | |
| FG_198 | 668,910 | 71.68 | 8.71 | |
| FG_200 | 661,645 | 73.07 | 9.59 | |
| KGD465 | 812,966 | 74.81 | 7.75 | |
| FG35 | 798,644 | 77.77 | 6.93 | |
| FG192 | 808,684 | 76.34 | 6.53 | |
| KGD386 | 596,175 | 77.40 | 11.03 | |
| KGD345 | 782,120 | 80.33 | 8.92 | |
| FG89 | 777,591 | 77.54 | 8.86 | |
| KGD398 | 806,046 | 80.70 | 13.15 | |
| KGD355 | 532,080 | 76.34 | 8.79 | |
| FG23 | 704,139 | 77.01 | 7.94 | |
| FG83 | 741,408 | 78.66 | 6.47 | |
| FGIntype | 717,932 | 75.02 | 6.09 | |
| FG21 | 749,113 | 79.89 | 5.84 | |
| KGD475 | 709,821 | 75.87 | 8.57 | |
| KGD388 | 709,968 | 78.53 | 7.93 | |
| BCI97 | 768,190 | 78.35 | 18.93 | |
| KGD343 | 632,574 | 77.41 | 8.47 | |
| FG94 | 846,133 | 77.13 | 11.71 | |
| KGD110 | 693,986 | 74.93 | 5.45 | |
| FG92 | 750,949 | 77.05 | 8.00 | |
| BCI_103 | 705,519 | 72.08 | 11.74 | |
| FG_160 | 737,371 | 71.46 | 8.51 | |
| FG_180 | 695,691 | 72.99 | 9.25 | |
| FG_AN | 640,046 | 71.17 | 9.01 | |
| FG_AP | 648,202 | 71.37 | 7.57 | |
| FG_AR | 594,861 | 73.20 | 9.54 | |
| FG_AZ | 646,031 | 74.50 | 8.09 | |
| FG_AAA | 531,997 | 73.58 | 9.95 | |
| FG_I | 700,899 | 73.29 | 9.25 | |
| KD_401 | 689,439 | 74.43 | 9.18 | |
| KD_882 | 678,002 | 72.47 | 9.11 | |
| KD_882_replicate | 504,504 | 61.93 | 11.06 | |
| KD_1059 | 624,086 | 72.73 | 8.64 | |
| KD_1316 | 618,017 | 70.93 | 7.71 | |
| TAKPDC_1272 | 665,173 | 76.01 | 9.07 | |
| TAKPDC_1318 | 544,312 | 78.45 | 11.36 | |
| TI_52 | 722,946 | 73.43 | 13.27 | |
| TI_908 | 691,496 | 71.89 | 10.57 | |
| TI_990 | 596,969 | 71.28 | 11.77 | |
| Yas_63659 | 618,876 | 72.72 | 10.04 | |
| Zygia917 | 680,243 | 72.65 | 10.29 |
Figure 1Proportion of variable (gray bars) and parsimony informative (white bars) sites across the 183 . Solid arrows below the x-axis indicate the percentage of variable sites within the Sanger-sequenced ITS and concatenated plastid loci; dashed arrows indicate the respective percentages of parsimony informative sites.
Figure 2Majority-rule consensus trees of 22 . (A) Analysis of concatenated next-generation data applying a single substitution model with no molecular clock; (B) majority-rule consensus cladogram of 22 Inga species based on a species tree analysis implemented in ASTRAL. Numbers next to nodes indicate bootstrap support; (C) analysis of concatenated Sanger data with gene-specific substitution models and no molecular clock; (D) species tree analysis of two loci (all plastid data and ITS) with locus-specific substitution models and relaxed clocks. Numbers next to nodes indicate posterior probability support.
Figure 3Majority-rule consensus tree of 46 . Numbers next to nodes indicate posterior probability support. Geographic origins and chemotypes of I. umbellifera individuals are indicated, as is the technical replicate.
Figure 4Proportion of variable (gray bars) and parsimony informative (white bars) sites at the 251 .
Figure 5(A) Schematic of the four different population models tested using 168 loci obtained through targeted enrichment in I. umbellifera. Codes for populations in the two different test sets are: Panama, BCI; French Guiana, FG; Ecuador, EC; and Peru, PE. Diagram modified from (Lohse et al., 2012). (B) The expected difference in support (ΔlnL) between a full (three-population) model and each nested model as a function of the number of loci for older (left panel) and more recent (right panel) scenarios. The two horizontal black lines show the ΔlnL needed to reject the simpler model at α = 0.05 when the simpler model has one fewer parameter (polytomy and two-population models, lower line) or two fewer parameters (panmixis, upper line).