| Literature DB >> 25225629 |
Kevin Weitemier1, Shannon C K Straub1, Richard C Cronn2, Mark Fishbein3, Roswitha Schmickl4, Angela McDonnell3, Aaron Liston1.
Abstract
PREMISE OF THE STUDY: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • METHODS ANDEntities:
Keywords: Hyb-Seq; genome skimming; nuclear loci; phylogenomics; species tree; target enrichment
Year: 2014 PMID: 25225629 PMCID: PMC4162667 DOI: 10.3732/apps.1400042
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Hyb-Seq target enrichment probe design and bioinformatics pipeline. A script combining and detailing the steps of the probe design process, Building_exon_probes.sh, is provided in the supplementary materials (Appendix S1).
| Steps | Description | Primary program or custom script |
| Match | Find genome and transcriptome sequences with 99% identity. | BLAT |
| Filter | Retain single hits of substantial length. | Part of Building_exon_probes.sh |
| Cluster | Remove isoforms and loci sharing >90% identity. | CD-HIT-EST |
| Filter | Retain loci with long exons summing to desired length. | blat_block_analyzer.py |
| Cluster | Remove exons sharing >90% identity. | CD-HIT-EST |
| Read processing | Adapter trimming, quality filtering | Trimmomatic |
| Exon assembly | Reconstruct a sequence for each sample, for each exon. | YASRA |
| Identify assembled contigs | If contig identity is unknown, identify which targeting exon(s) it corresponds to. | BLAT |
| Sequence alignment I: Collate exons | Cluster orthologous exons across samples. | assembled_exons_to_fasta.py |
| Sequence alignment II: Perform alignment | Align homologous bases within each exon. | MAFFT |
| Concatenate exons | For each locus, concatenate the aligned exons. | catfasta2phyml.pl |
| Gene tree construction | For each locus, estimate the maximum likelihood gene tree. | RAxML |
| Species tree construction | Estimate the species tree from independent gene trees in a coalescent framework. | MP-EST |
Kent (2002).
New scripts written for this protocol, an example data set, and any future updates are available at https://github.com/listonlab/.
Li and Godzik (2006).
Bolger et al. (2014).
Ratan (2009).
Straub et al. (2011).
Katoh and Toh (2008).
Nylander (2011).
Stamatakis (2006).
Liu et al. (2010).
Voucher information for species of Asclepias and related genera used in this study.
| Species | Voucher specimen [Herbarium] | Collection locality | GPS coordinates |
| Weitemier 12-23 [OSC] | Grant Co., Oregon, USA | 44.47970, −119.57758 | |
| Lynch 11224 [LSUS] | Barber Co., Kansas, USA | 37.3, −98.7 | |
| Lynch 10923 [LSUS] | Lassen Co., California, USA | 41.09, −121.30 | |
| Zuloaga & Morrone 7069 [OKLA] | Dist. Jujuy, Argentina | −24, −63.35 | |
| Fishbein 5596 [OKLA] | Polk Co., Florida, USA | 27.761, −81.465 | |
| Lynch 12050 [LSUS] | Apache Co., Arizona, USA | 36.7, −109.7 | |
| Fishbein 3101 [OKLA] | Mpio. Comondu, Baja California Sur, Mexico | 24.63, −112.14 | |
| Fishbein 2445 [ARIZ] | Pima Co., Arizona, USA | 31.80, −110.81 | |
| Fishbein 5137 [OKLA] | Mpio. Cuautitlán, Jalisco, Mexico | 19.561, −114.203 | |
| Fishbein 5608 [MISSA] | Franklin Co., Florida, USA | 29.916, −84.369 | |
| Fishbein 5427 [OKLA] | Cultivated | ||
| Rein 106 [OKLA] | Angelina Co., Texas, USA | 31.07995, −94.27735 |
GPS coordinates reported to the accuracy recorded or based on coarse geo-referencing based on the collection locality.
Success of Hyb-Seq for targeted sequencing and assembly of nuclear genes combined with genome skimming of high-copy targets in Asclepias and related species of Apocynaceae.
| Species | Reads | Quality-filtered reads | Unique, on-target, quality-filtered reads (%)[ | Assembly length (Mbp) | Splash zone assembly length (Mbp) | Single-copy gene exons assembled | Single-copy genes assembled | % Divergence from single-copy gene probes | % Missing data in matrix | % Completion of plastome | % Completion of nrDNA cistron |
| 1,174,294 | 1,149,278 | 746,909 (65.0) | 3.2 | 1.6 | 3349 | 768 | 0.9 | 7.4 | 99.7 | 100 | |
| 1,943,370 | 1,804,956 | 523,477 (29.0) | 2.7 | 1.0 | 3359 | 767 | 0.8 | 3.6 | 97.8 | 98.3 | |
| 393,048 | 384,595 | 72,200 (18.8) | 1.1 | 0.5 | 2260 | 762 | 0.9 | 69.0 | 81.9 | 94.4 | |
| 1,457,860 | 1,301,608 | 397,798 (30.6) | 2.2 | 0.8 | 3313 | 768 | 1.5 | 14.7 | 98.4 | 100 | |
| 918,608 | 843,463 | 234,502 (27.8) | 2.0 | 0.8 | 3163 | 768 | 1.0 | 27.1 | 93.1 | 97.0 | |
| 664,820 | 645,580 | 139,407 (21.6) | 1.7 | 0.7 | 2978 | 768 | 0.9 | 41.8 | 90.5 | 99.4 | |
| 1,097,532 | 971,606 | 270,123 (27.8) | 2.1 | 0.9 | 3275 | 768 | 1.4 | 30.6 | 99.1 | 100 | |
| 2,482,686 | 2,295,691 | 558,822 (24.3) | 2.4 | 0.8 | 3369 | 768 | 0.9 | 2.1 | 96.0 | 100 | |
| 1,345,732 | 1,295,739 | 384,451 (29.7) | 2.4 | 0.8 | 3314 | 768 | 1.0 | 4.9 | 98.7 | 100 | |
| 1,248,940 | 1,111,909 | 310,020 (27.9) | 2.1 | 0.8 | 3208 | 768 | 0.9 | 26.7 | 95.2 | 99.7 | |
| 1,172,456 | 1,135,014 | 380,155 (33.5) | 2.6 | 1.0 | 3287 | 768 | 3.2 | 5.0 | 96.0 | 100 | |
| 418,590 | 388,064 | 208,835 (53.8) | 1.7 | 0.4 | 2718 | 757 | 4.5 | n/a | 99.4 | 100 | |
| Average | 1,190,419 | 1,110,625 | 352,225 (32.5) | 2.2 | 0.8 | 3133 | 767 | 1.5 | 21.2 | 95.5 | 99.1 |
Most samples were sequenced in a single MiSeq run (11-plex 2 × 251-bp version 2 chemistry) except for A. cryptoceras and M. cynanchoides, which were each sequenced in different MiSeq runs (12-plex 2 × 251-bp version 2 chemistry and 15-plex 2 × 76-bp version 3 chemistry, respectively).
These values were calculated using the entire probe set, including single-copy gene, defense and floral development genes, and SNPs.
These estimates are lower than the true overall efficiency due to quality filtering and the removal of duplicate reads. Except for A. cryptoceras and M. cynanchoides, the libraries were made with internal barcodes, which apparently contributed to suboptimal base calling and lower-quality scores, leading to apparent suboptimal target capture efficiency.
These estimates are based on a minimum 90% sequence identity to the A. syriaca probes, and are therefore conservative; especially so for C. procera and M. cynanchoides, which are expected to have higher sequence divergence.
These estimates are based on a minimum 75% sequence identity to the A. syriaca probes.
Fig. 1.Histogram of exon sequence divergence between the species used for probe design, Asclepias syriaca, and four other species: the most divergent species of Asclepias, A. flava; another member of Asclepiadinae (Asclepiadeae: Asclepiadoideae), Calotropis procera; a member of Gonolobinae (Asclepiadeae: Asclepiadoideae), Matelea cynanchoides; and a member of a different subfamily, Catharanthus roseus (Rauvolfioideae). Note that a maximum sequence divergence of 75% was allowed for BLAT and that exons with >10% divergence were less likely to be observed in Calotropis and Matelea because they were less likely to be enriched by the probes, while the Catharanthus data were from transcriptome sequences of multiple tissues and not subject to target enrichment bias.
Number of single-copy genes recovered for phylogenomic analysis with different data analysis pipelines.
| Species | Hyb-Seq | phyluce | phyluce with Alignreads contigs |
| 768 | 16 | 145 | |
| 767 | 69 | 201 | |
| 762 | 10 | 23 | |
| 768 | 28 | 109 | |
| 768 | 27 | 62 | |
| 768 | 3 | 24 | |
| 768 | 8 | 38 | |
| 768 | 13 | 198 | |
| 768 | 69 | 186 | |
| 768 | 21 | 54 | |
| 768 | 84 | 203 | |
| 757 | 51 | 98 | |
| Average | 767 | 33 | 112 |
Fig. 2.Comparison of the species tree of Asclepias based on 761 putatively single-copy loci and the whole plastome phylogeny. The MP-EST tree is shown at left, and the difference between this topology and that recovered through an analysis of the concatenated nuclear gene data set is indicated by the red arrow. Solid lines connect each species to its placement in the plastome phylogeny (right). Values near the branches are bootstrap support values (* = 100%). Colors reflect the plastid clades of Fishbein et al. (2011): temperate North America (green), unplaced (orange), highland Mexico (purple), series Incarnatae sensu Fishbein (pink), Sonoran Desert (blue), and outgroup (black).