| Literature DB >> 29732268 |
Michael R McKain1, Matthew G Johnson2, Simon Uribe-Convers3, Deren Eaton4, Ya Yang5.
Abstract
The past decade has seen a major breakthrough in our ability to easily and inexpensively sequence genome-scale data from diverse lineages. The development of high-throughput sequencing and long-read technologies has ushered in the era of phylogenomics, where hundreds to thousands of nuclear genes and whole organellar genomes are routinely used to reconstruct evolutionary relationships. As a result, understanding which options are best suited for a particular set of questions can be difficult, especially for those just starting in the field. Here, we review the most recent advances in plant phylogenomic methods and make recommendations for project-dependent best practices and considerations. We focus on the costs and benefits of different approaches in regard to the information they provide researchers and the questions they can address. We also highlight unique challenges and opportunities in plant systems, such as polyploidy, reticulate evolution, and the use of herbarium materials, identifying optimal methodologies for each. Finally, we draw attention to lingering challenges in the field of plant phylogenomics, such as reusability of data sets, and look at some up-and-coming technologies that may help propel the field even further.Entities:
Keywords: RAD‐seq; genome skimming; microfluidics; phylogenomics; sequence capture; transcriptomes
Year: 2018 PMID: 29732268 PMCID: PMC5895195 DOI: 10.1002/aps3.1038
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Comparison of cost and utility of phylogenomic methods for plants.a
| Aspect of method | Sanger‐based methods | Microfluidic PCR | Restriction enzyme–based methods | Genome skimming | Target enrichment | Transcriptome |
|---|---|---|---|---|---|---|
| Upfront investment | Design and optimizing PCR primers. Timeframe: weeks; cost: $50–100 |
Some genomic data (e.g., shotgun libraries). | Optional: test alternative restriction digestions for optimal range of fragment sizes. Timeframe: weeks; cost: $100–1000 | None | Transcriptome and/or genomes from closely related organisms. Timeframe: months; cost: $100–1000 | Freezers and liquid nitrogen containers; logistics for tissue collecting. Timeframe: weeks to months; cost: $1000–10,000 |
| Tissue for sampling: herbarium, silica preserved, flash‐frozen, living | All four types but reduced success from low‐yield herbarium tissue extractions | All four types, but reduced success from low‐yield herbarium tissue extractions | All four types, but reduced success from low‐yield herbarium tissue extractions | All four types, but potential reduced success from low‐yield herbarium tissue extractions. See Saeidi et al., | All four types, but reduced success from low‐yield herbarium tissue extractions | Flash‐frozen or living tissue preserved in RNA |
| Sequence information type | Coding region, short introns, and short intergenic spacers | Coding region, introns, and short intergenic spacers | Anonymous or reference‐mapped short‐length loci | Organellar, some nuclear | Coding region and flanking intron | Coding region |
| Cost per extraction + library prep + sequencing (varies depending on platform) | $1–5 for standard DNA extraction + $0 + $3 for a single read of 800–1000 bp | $1–5 for standard DNA extraction + $0 + $0.40 per microfluidic reaction (~$800 per 48 × 48 plate) | $1–5 for standard DNA extraction + $5–50 + $1200–1800 (HiSeq sequencing 48–384 samples) | $1–5 for standard DNA extraction + $25–150 + $50 (assume 1.5–2 Gb of data per sample) | For 96 samples: $200 for probes, $1–5 for standard DNA extraction + $16 (library in 1/3 volumes) + $1800 for sequencing (MiSeq 2 × 300) | $5–15 for RNA extraction + $50–150 + $200 (assume 25 million reads per transcriptome) |
| Assembly and cleaning data | Easy | Easy | Relatively easy | Moderately computationally intensive | Moderately computationally intensive | Computationally intensive |
| Ability to resolve reticulation/hybridization/introgression | Yes | Yes, potential to recover single alleles from nuclear loci | Yes, significant power to test for genome‐wide or localize admixture | Sometimes, potentially identify hybridization but only if it is biparentally inherited | Yes, can extract alleles if long reads are used | Yes |
| Ability to infer polyploidy | Yes, but needs time‐consuming cloning | Yes, potential to recover single alleles from nuclear loci | Sometimes, can detect polyploidy from read depths, but low potential to separate paralogs | No | Sometimes, can detect abundance of paralogous sequences | Sometimes, if polyploidy event is old enough that homeologs can be separated during transcriptome assembly |
| Best use | First pass; when maximizing the number of samples is the priority | Closely related species; studies that need specific loci and complete data matrices | Shallow phylogenetic scale with aim to sample many individuals | Deep or shallow phylogenetic scale; detecting parental heritage; potential genome diversity | Deep or shallow phylogenetic scale for up to a few samples per species | When detecting genome duplication and gene family evolution are of interest beyond reconstructing species relationship |
| Reusability of data | Yes, if using the same loci | Yes, fully reusable within focus group using same loci, limitations with increase in phylogenetic distance | Sometimes, reusable for studies within same study system, not reusable between distant clades | Yes, fully reusable | Sometimes, partially reusable across studies if same loci are targeted | Yes, fully reusable |
Costs are given in U.S. dollars (US$) as of 2018.
Recommendations for data sharing and archiving
| Archive format/platform | Best practice | Information to include | Special considerations |
|---|---|---|---|
| Vouchers | Deposit specimen with appropriate characteristics to identify to species level in a herbarium | GPS point, locality data, collector, collection number | Permits often required. Special permission for living collections (e.g., botanical gardens, arboreta) |
| NCBI Short Read Archive | Submit all raw read data | Taxon (as specific as possible); voucher information; tissue type; methods for collection, extraction, and library preparation; read type (paired or single end) | Submit biological replicates separately if sampling for RNA‐Seq experiment. Link populations and accessions together in a BioProject |
| Dryad | Provide details to reproduce results including commands, scripts, program versions, and log files. Major steps in data analysis should be included. Provide final data sets from which major conclusions are drawn | Cleaned and assembled reads; intermediate and final analysis files; parameters for analyses; scripts as used in associated analyses; details not presented in manuscript but necessary to replicate results | Links to Github, Bitbucket, or other online repositories for updated versions of scripts; simply stating “custom scripts” is not acceptable. Provide documentation of code used and parameters, such as a Jupyter Notebook ( |