Literature DB >> 27092154

Will Benchtop Sequencers Resolve the Sequencing Trade-off in Plant Genetics?

Alex D Twyford1.   

Abstract

Entities:  

Keywords:  Illumina MiniSeq; benchtop sequencers; genomics; next generation sequencing; parentage analysis; phylogenetics; population genetics

Year:  2016        PMID: 27092154      PMCID: PMC4822345          DOI: 10.3389/fpls.2016.00433

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


× No keyword cloud information.
An important experimental design consideration in plant genetics is the trade-off between number of individuals and number of loci that can be genotyped (Davey et al., 2011). For any given study, an investigator must choose how they partition research effort and resources, with the generation of many loci usually coming at the expense of many individuals, and vice versa. For example, for parentage and paternity analysis it is usually more important to sample many individuals (e.g., Andrew et al., 2013), while for comparative genome evolution the emphasis is firmly placed on recovering more loci (Figure 1). This trade-off still exits despite the plummeting costs of sequence data, with researchers having to decide the number of individuals feasible for a given sequencing strategy, and how the libraries will be multiplexed across lanes of a next generation sequencing (NGS) platform (Shen et al., 2011).
Figure 1

Diagrammatic representation of the trade-off between number of loci and number of individuals in some typical plant genetic studies. The scale of sequencing for pilot studies is also indicated.

Diagrammatic representation of the trade-off between number of loci and number of individuals in some typical plant genetic studies. The scale of sequencing for pilot studies is also indicated. NGS is well-suited to studies requiring large amounts of sequence data for few individuals, such as de novo genome assembly, or large-scale genome resequencing projects (e.g., Brandvain et al., 2014). At the other extreme, high-throughput sequencing is also ideal for single-locus studies of environmental variation, where universal primers are used to amplify a diverse mix of template DNA representing thousands of individuals (e.g., Shokralla et al., 2012). The sequencing trade-off space traditionally least well-served by NGS is where tens or hundreds of loci need to be generated for many individuals. While Restriction-site Associated DNA (RAD) sequencing and genotyping-by-sequencing (GBS, Elshire et al., 2011) partly fill this gap, there are many applications in population genetics, phylogenetics, DNA barcoding, and parentage analysis where a standard multiplexed RAD library run on a high-throughput sequencer would provide an excessive number of loci or unnecessarily high depth-of-coverage. Therefore, researchers wanting a modest number of loci would be more likely to consider either SNP chips, which can be costly to develop and may produce data with ascertainment bias (Albrechtsen et al., 2010), or continue using conventional markers such as microsatellites, or Sanger Sequencing of individual loci. The uptake of NGS in small to medium-scale studies may be set to increase with the recent announcement of a new benchtop sequencing platform, the Illumina MiniSeq (http://www.illumina.com/miniseq). This sequencer has two key benefits over its larger cousins such as the Illumina HiSeq. Firstly, the MiniSeq fills a gap at the low-read production end of the market, generating 1.8–7.5 Gb of data [8–50 million (M) reads]. These data have a low error rate (>80% bases >Q30), and the platform offers some flexibility over read length configuration [36, 50, 75, 150 bp single end (SE) or paired end (PE) sequences]. Secondly, the MiniSeq is the first Illumina platform designed for smaller research institutions or individual laboratories. The instrument itself costs around $50,000, has a small footprint, relatively short run time, and the capacity to sequence a single sample (rather than the need to fill multiple lanes of a larger flow cell). As such, this sequencer may enable users to avoid queues and administration associated with large sequencing centers, and open up in-house genomics for the first time. The MiniSeq joins a number of other NGS platforms capable of relatively small sequence runs (e.g., 400 Mb–15 Gb), such as the Illumina MiSeq and ThermoFisher's Ion Torrent, and third generation technologies such as Oxford NanoPore and Pacific BioSciences real-time sequencers (for full comparison see http://www.molecularecologist.com/next-gen-fieldguide-2016/). The MiniSeq's small footprint and low upfront cost make it a more attractive option for lab ownership than the MiSeq, and also boasts the lowest reagent costs for small Illumina sequencing runs (MiniSeq mid-output reagents $550 per run). However, the MiniSeq offers no cost benefits for higher-output runs, and has a shorter maximum read length than the MiSeq (150 bp as opposed to 300 bp). Ion Torrent systems such as the Ion S5 are another low-output benchtop alternative to the MiniSeq, and the fast run time make them the platform of preference for clinical diagnostics. Ion Torrent has not widely been used for non-model genomics (though see Recknagel et al., 2015), likely due to some sequence biases, moderate error rates, and difficulty reading homopolymer regions, particular with early release platforms (Loman et al., 2012; Quail et al., 2012; Salipante et al., 2014). Third generation sequencing options are Oxford NanoPore's MinION (Mikheyev and Tin, 2014) or Pacific BioSciences real-time sequencers (Jiao et al., 2013). While the long sequence reads (>5 Kb) make them extremely useful for de novo assembly of small genomes, and scaffolding non-model genomes (English et al., 2012), they have not been widely adopted for other research applications due to their high costs, error rates, and currently limited (but growing) number of bioinformatic pipelines. The potential applications of low-output benchtop sequencers, such as the MiniSeq, are huge. The first important use would be in replacing panels of PCR-based markers in studies relying on modest numbers of loci. In phylogenetics, multiplexed tagged amplicons could be sequenced with sufficient sequencing depth, but at a cheaper cost and without the redundancy of higher-output platforms. For nuclear loci, this approach removes the time-consuming stage of cloning, and can provide directly phased sequences (O'Neill et al., 2013). Similarly, targeted enrichment studies such as those using hybridization-based probes are ideal for low-output sequencers, as sequencing effort is focused on a small subsection of the genome (e.g., Stull et al., 2013). In mating system studies, GBS libraries prepared with an infrequent cutting enzyme could be a time and cost effective way to generate a modest number of loci in many progeny derived from many seed families, leading to accurate estimates of outcrossing (Koelling et al., 2012). In all these cases, the output of the MiniSeq is optimized for part of the sequencing trade-off where many other platforms are not. The second main use would be in genomic studies where few individuals need to be sequenced. MiniSeq runs would be suitable for sequencing small plant genomes (e.g., >50X coverage of 135 Mb Arabidopsis thaliana), or for characterizing features such as GC-content, transposon composition (Sveinsson et al., 2013), and genome size (Simpson, 2014) of non-model species. This output range could also be useful for multiplexed low coverage genome resequencing (“genome skimming,” Straub et al., 2012), which is proving a popular route for complete plastid assembly (e.g., Jackman et al., 2016). The low sequence run cost would also make this ideal for marker discovery and developing microsatellite primers (Zalapa et al., 2012). The third use would be for pilot studies testing new sample assays and for validating libraries constructed from difficult samples. Low-output sequencing runs would be extremely valuable to verify the number of tags and the sequencing coverage in test RAD libraries. Similarly, targeted enrichment strategies could be tested at low coverage to check the efficacy of the enrichment and the proportion of off-bait targets. This information can then be used to pick the depth of coverage for large-scale sequencing efforts, with the same Illumina-compatible libraries being transferable across sequencing platforms. For validating samples, low-output sequencing runs could be used to assess the number of informative reads and the extent of sample contamination in dietary or environmental samples (e.g., Willerslev et al., 2014). In studies using degraded herbarium samples, the extent of C → T/G → A miscoding lesions caused by DNA degradation (Staats et al., 2011), could be assessed. This is particularly important as this may not be captured by other quality control metrics, such as those produced by the Agilent TapeStation or Bioanalyser. In all these cases, the small datasets would be able to address issues that would otherwise only come to light with greater sequencing effort. NGS is providing a number of important solutions to the sequencing trade-off in plant genetic studies, with benchtop sequencers such as the Illumina MiniSeq potentially facilitating day-to-day low-output sequencing. However, the success of such platforms is far from guaranteed. The most cost-effective sequencing comes from high-output platforms such as the Illumina HiSeq 4000, and highly multiplexed libraries or pools of individuals (Pool-seq, Schlötterer et al., 2014) run on such systems have the lowest per-megabase costs. Therefore, current high-output systems may continue to meet most researcher's needs, leaving only a small gap in the market for these platforms. Another issue is the methodological challenges and costs associated with preparing NGS libraries (often $30–100/sample), and the bioinformatics involved in calling reliable variant sites, which may outweigh the benefits of conventional markers for some small-scale studies where these platforms could be useful. A final concern is whether research groups want to own and run their own sequencer, when technical assistance is available at larger sequencing hubs. As such, while the MiniSeq has great potential on paper, whether it really resolves the sequencing trade-off at the low-output end of the market remains to be seen.

Author contributions

The author confirms being the sole contributor of this work and approved it for publication.

Conflict of interest statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  25 in total

1.  Performance comparison of benchtop high-throughput sequencing platforms.

Authors:  Nicholas J Loman; Raju V Misra; Timothy J Dallman; Chrystala Constantinidou; Saheer E Gharbia; John Wain; Mark J Pallen
Journal:  Nat Biotechnol       Date:  2012-05       Impact factor: 54.908

Review 2.  Next-generation sequencing technologies for environmental DNA research.

Authors:  Shadi Shokralla; Jennifer L Spall; Joel F Gibson; Mehrdad Hajibabaei
Journal:  Mol Ecol       Date:  2012-04       Impact factor: 6.185

Review 3.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Authors:  John W Davey; Paul A Hohenlohe; Paul D Etter; Jason Q Boone; Julian M Catchen; Mark L Blaxter
Journal:  Nat Rev Genet       Date:  2011-06-17       Impact factor: 53.242

Review 4.  A road map for molecular ecology.

Authors:  Rose L Andrew; Louis Bernatchez; Aurélie Bonin; C Alex Buerkle; Bryan C Carstens; Brent C Emerson; Dany Garant; Tatiana Giraud; Nolan C Kane; Sean M Rogers; Jon Slate; Harry Smith; Victoria L Sork; Graham N Stone; Timothy H Vines; Lisette Waits; Alex Widmer; Loren H Rieseberg
Journal:  Mol Ecol       Date:  2013-05       Impact factor: 6.185

5.  Performance comparison of Illumina and ion torrent next-generation sequencing platforms for 16S rRNA-based bacterial community profiling.

Authors:  Stephen J Salipante; Toana Kawashima; Christopher Rosenthal; Daniel R Hoogestraat; Lisa A Cummings; Dhruba J Sengupta; Timothy T Harkins; Brad T Cookson; Noah G Hoffman
Journal:  Appl Environ Microbiol       Date:  2014-09-26       Impact factor: 4.792

6.  Double-digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq-ion) with nonmodel organisms.

Authors:  Hans Recknagel; Arne Jacobs; Pawel Herzyk; Kathryn R Elmer
Journal:  Mol Ecol Resour       Date:  2015-04-06       Impact factor: 7.090

7.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

Authors:  Michael A Quail; Miriam Smith; Paul Coupland; Thomas D Otto; Simon R Harris; Thomas R Connor; Anna Bertoni; Harold P Swerdlow; Yong Gu
Journal:  BMC Genomics       Date:  2012-07-24       Impact factor: 3.969

8.  DNA damage in plant herbarium tissue.

Authors:  Martijn Staats; Argelia Cuenca; James E Richardson; Ria Vrielink-van Ginkel; Gitte Petersen; Ole Seberg; Freek T Bakker
Journal:  PLoS One       Date:  2011-12-05       Impact factor: 3.240

9.  Exploring genome characteristics and sequence quality without a reference.

Authors:  Jared T Simpson
Journal:  Bioinformatics       Date:  2014-01-17       Impact factor: 6.937

Review 10.  Sequencing pools of individuals - mining genome-wide polymorphism data without big funding.

Authors:  Christian Schlötterer; Raymond Tobler; Robert Kofler; Viola Nolte
Journal:  Nat Rev Genet       Date:  2014-09-23       Impact factor: 53.242

View more
  1 in total

Review 1.  Strategies for complete plastid genome sequencing.

Authors:  Alex D Twyford; Rob W Ness
Journal:  Mol Ecol Resour       Date:  2016-11-28       Impact factor: 7.090

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.