Jasin Hodzic1, Lejla Gurbeta1,2, Enisa Omanovic-Miklicanin1,3, Almir Badnjevic1,2,4. 1. Department of Genetics and Bioengineering, International Burch University, Sarajevo, Bosnia and Herzegovina. 2. Verlab Ltd, Sarajevo, Bosnia and Herzegovina. 3. Faculty of Agriculture and Food Science, University of Sarajevo, Bosnia and Herzegovina. 4. Technical Faculty Bihac, University of Bihac, Bosnia and Herzegovina.
Abstract
INTRODUCTION: Major advancements in DNA sequencing methods introduced in the first decade of the new millennium initiated a rapid expansion of sequencing studies, which yielded a tremendous amount of DNA sequence data, including whole sequenced genomes of various species, including plants. A set of novel sequencing platforms, often collectively named as "next-generation sequencing" (NGS) completely transformed the life sciences, by allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor. PURPOSE: of this paper is to present an overview NGS platforms used to produce the current compendium of published draft genomes of various plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and to determine the most frequently used platform for the whole genome sequencing of plants in light of genotypization of immortelle plant. MATERIALS AND METHODS: 45 papers were selected (with 47 presented plant genome draft sequences), and utilized sequencing techniques and NGS platforms (Roche/454, ABI/SOLiD and Illumina/Solexa) in selected papers were determined. Subsequently, frequency of usage of each platform or combination of platforms was calculated. RESULTS: Illumina/Solexa platforms are by used either as sole sequencing tool in 40.42% of published genomes, or in combination with other platforms - additional 48.94% of published genomes, followed by Roche/454 platforms, used in combination with traditional Sanger sequencing method (10.64%), and never as a sole tool. ABI/SOLiD was only used in combination with Illumina/Solexa and Roche/454 in 4.25% of publications. CONCLUSIONS: Illumina/Solexa platforms are by far most preferred by researchers, most probably due to most affordable sequencing costs. Taking into consideration the current economic situation in the Balkans region, Illumina Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region.
INTRODUCTION: Major advancements in DNA sequencing methods introduced in the first decade of the new millennium initiated a rapid expansion of sequencing studies, which yielded a tremendous amount of DNA sequence data, including whole sequenced genomes of various species, including plants. A set of novel sequencing platforms, often collectively named as "next-generation sequencing" (NGS) completely transformed the life sciences, by allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor. PURPOSE: of this paper is to present an overview NGS platforms used to produce the current compendium of published draft genomes of various plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and to determine the most frequently used platform for the whole genome sequencing of plants in light of genotypization of immortelle plant. MATERIALS AND METHODS: 45 papers were selected (with 47 presented plant genome draft sequences), and utilized sequencing techniques and NGS platforms (Roche/454, ABI/SOLiD and Illumina/Solexa) in selected papers were determined. Subsequently, frequency of usage of each platform or combination of platforms was calculated. RESULTS: Illumina/Solexa platforms are by used either as sole sequencing tool in 40.42% of published genomes, or in combination with other platforms - additional 48.94% of published genomes, followed by Roche/454 platforms, used in combination with traditional Sanger sequencing method (10.64%), and never as a sole tool. ABI/SOLiD was only used in combination with Illumina/Solexa and Roche/454 in 4.25% of publications. CONCLUSIONS: Illumina/Solexa platforms are by far most preferred by researchers, most probably due to most affordable sequencing costs. Taking into consideration the current economic situation in the Balkans region, Illumina Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region.
Entities:
Keywords:
base sequence; high-throughput nucleotide sequencing; plant genome
The beginning of the new millennium was marked with yet another achievement in the field of genetics, specifically plant genomics - the sequencing of Arabidopsis thaliana. While this breakthrough set foundations to numerous developments in plant genomics and fostered better understanding of the plant genome, the methodology of sequencing remained a major setback to this field of scientific research. Namely, the sequence was obtained through the conventional Sanger method, which is rather simple, but requires extensive time, labor and finances when sequencing a whole genome. It would remain as such until 2004, when 454 Life Sciences marketed a paralleled version of pyrosequencing which revolutionized sequencing technology, albeit still requiring previous amplification of the sample (1).Year 2006 was marked by emergence of Illumina, a company which introduced a sequencing-by-synthesis approach that is even today a staple of whole genome sequencing (2). All sequencing approaches that followed the Sanger method are contemporarily named as next-generation sequencing (NGS), and are known for allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor (3).NGS indeed triggered a revolution in life sciences, labeled by numerous publications of genomes of different organisms, despite the fact that NGS had several drawbacks such as large genome size, high CG content, emergence of homopolymers etc. Different strategies for overcoming these obstacles have been eventually developed, various sequencing methods introduced or further refined, creating a scientific trend that continues as we speak. These high-throughput sequencing methods fostered numerous valuable discoveries in fields of molecular and evolutionary biology, medicine, forensics, agriculture etc. When it comes to application of NGS to plant species, the broadest and most prominent is whole genome sequencing (WGS) in quest for revealing the full sequence of plant genomes, their genetic make-up as well as genetic background of desirable traits in agricultural production. WGS is especially useful for producing draft genomes of plants that are sequenced for the first time (3). The researchers are predominantly interested in staple cereal, vegetable and fruit species, for obvious economic gain such research may potentially bring. However, many species of interest have incredibly large and complex genomes which make de novo sequencing of the whole genome labor-intensive and occasionally completely impracticable. Due to this, strategies such as sequencing of only one or several chromosomes of interest, transcriptome sequencing, or exome sequencing are some of the alternatives researchers embrace in quest of specific plant genome data (4). Nevertheless, currently there is several dozens of published draft plant genomes, including a number of plants of great agricultural and industrial significance (Table 1).
Table 1
Published Draft Genomes of several Agriculturally and Industrially Significant Plants
Published Draft Genomes of several Agriculturally and Industrially Significant PlantsApart from direct use of various cultivars for production of food and beverages, animal feed, fabric, ropes etc., pharmaceutics and cosmetics are significant areas of industrial production that involve numerous plants whose genomics have not been yet elucidated. One such plant is immortelle (Helichrysium arenarium), a perennial plant widely spread on the Adriatic coast of Croatia, Bosnian and Herzegovina and Montenegro (5, 6, 7). Due to high market value of essential oil of immortelle, and low investments required for its production, agricultural production of this plant has expanded rapidly in the aforementioned regions, especially in in-land Herzegovina, where producers were able to achieve average annual revenues of 19,715.58 BAM (approximately 10,000 EUR) per 1 ha of immortelle (8). Elucidating the genome of this plant in quest of better understanding of its desirable traits for agricultural production would undoubtedly have a great impact on the economy of the region.This paper focuses on comparing and contrasting three commercial technologies most commonly applied for WGS of plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and presenting the applications areas of each in terms of WGS of plant genomes in light of genotypization of immortelle plant.
2. MATERIALS AND METHODS
This article has descriptive character and present systematic review of literature focusing on sequencing technologies utilized in sequencing of plant genomes. For the purpose of this study, we examined the published draft genomes of plants. Since publicly available data only cover researches published up to year 2014, any more recent research was not included in the study. Criteria for paper selection were: a) the paper must introduce a draft genome of a plant species; b) the paper must be in English; c) sequencing method utilized is disclosed in a freely available form; d) sequencing method is either Roche/454, ABI/SOLiD, Illumina/Solexa or any combination of those (combinations with traditional Sanger method were also included); e) the results must indicate the size of obtained draft genome.Applying these criteria, 45 papers (9-53) were selected (with 47 presented plant genome draft sequences), and frequency of application of each NGS platform determined. The papers and corresponding plant genomes are listed in Appendix 1. The frequencies were subsequently brought in correlation with technical characteristics and cost of sequencing, in order to deduce the best approach to sequencing immortelle (Helichrysium arenarium). For comparison purposes technical properties (Table 2) and sequencing cost data (Table 3) obtained from Liu, et al. (2) were used. Most frequently used platforms or platform combinations were then analyzed in terms of size of genomes deduced through their application.
Table 2
Technical properties of the three sequencing platforms
Table 3
Cost of the three sequencing platforms
Technical properties of the three sequencing platformsCost of the three sequencing platforms
3. RESULTS
Papers included in the study were firstly examined in terms of utilized NGS platform (or combination of platforms, including combining with traditional Sanger method), as shown in Table 4 and Figure 1 below.
Table 4
Frequency of used NGS platforms among draft genomes of plants published until 2014
Figure 1
Most frequently used NGS platforms, or platform combinations (including combinations with Sanger sequencing) in draft plant genome publications until 2014. Illumina/Solexa (I/S) has been solely utilized for sequencing the largest number of genomes, a total of 19. (I/S=Illumina/Solexa; R/454=Roche/454; A/S=ABI/SOLiD; SS=Sanger sequencing)
Frequency of used NGS platforms among draft genomes of plants published until 2014Most frequently used NGS platforms, or platform combinations (including combinations with Sanger sequencing) in draft plant genome publications until 2014. Illumina/Solexa (I/S) has been solely utilized for sequencing the largest number of genomes, a total of 19. (I/S=Illumina/Solexa; R/454=Roche/454; A/S=ABI/SOLiD; SS=Sanger sequencing)Illumina/Solexa is by far most frequently used platform (with different models of the technology) either as a sole NGS platform used (in 19 studies, or 40.42%), or in combination with other platforms and techniques, such as Roche/454, ABI/SOLiD and Sanger sequencing (for a total of 42 draft plant genomes published using Illumina/Solexa platform to some extent). Roche/454 has been utilized in combination with Sanger sequencing, and other two NGS platforms in 25.53% of the published plant genome sequencing projects. Interestingly, none of the examined papers involves sole usage of Roche/454 or ABI/SOLiD platforms. ABI/SOLiD was only used in two publications, in combination with Roche/454 and Illumina/Solexa, and has by far the lowest overall inclusion in plant genome sequencing projects, 4.25%.The results demonstrate that the combining of different platforms is a very common practice, as 59.58% of published studies where the sequence was obtained through combining several (up to three) different sequencing approaches. Combining the platforms appears to be due to the practical and infrastructural reasons, and employed on wide range of genome sizes, however, it is undoubtedly helpful in mitigating some of the setbacks that each platform on its own has. Although the focus of this study is on the NGS platforms, it is important to point out that we can still find a significant portion of sequencing projects that involve traditional Sanger sequencing approach. We have found that a total of 21 studies utilize Sanger sequencing (and always in combination with NGS platforms). Accordingly, 44.69% of published draft genomes of plant have been elucidated using Sanger sequencing to some extent.Genome size is another parameter taken into consideration when selecting sequencing techniques for a research. Table 5 and Figure 2 present the most frequently used NGS platforms (or platform combinations) in terms of size of the genomes deduced through their usage.
Table 5
Sizes of plant genomes published using different NGS platforms (or combinations of platforms)
Figure 2
Size and number of genomes obtained using different NGS platforms. Illumina/Solexa (I/S) has been solely utilized for sequencing the broadest range of genome sizes
Sizes of plant genomes published using different NGS platforms (or combinations of platforms)Size and number of genomes obtained using different NGS platforms. Illumina/Solexa (I/S) has been solely utilized for sequencing the broadest range of genome sizesFigure 2 clearly demonstrates that the size of the genome, although an important consideration when assembling the sequence (54), is not effective to the choice of the sequencing platform by researchers. Illumina/Solexa platform (marked dark blue in Figure 2), being the most frequently used, is also used on the broadest range of genome sizes, from as low as 200-299 megabases to massive 17,000-megabase genome of bread wheat. Next in the line of broadness of application in terms of the size of sequenced genomes is the combination of traditional Sanger sequencing, Roche/454 and Illumina/Solexa, with the range of 0-99 to 1000-1999 Mb. Overall, no particular trend can be detected when comparing the size of the genome and sequencing platform researchers opt for.Since genome size of Helichrysium arenarium has not been reported yet in the literature as of the end of 2016, we will rely on the 2014 study of several other species from the Helichrysium genus, published by Azizi et al. Since species displayed various degrees of polyploidy, genome size range for genus Helichrysium can be, based on the available data, confined to roughly 8,000-18,000 Mb. Based on this estimation and the results of our study of the NGS platform usage in plant genome sequencing, it is clear that Illumina/Solexa is the most viable choice, especially if we consider the financial weight of such sequencing project. If we refer back to the Table 3, the calculated price range for the estimated Helichrysium arenarium genome size range is 560-1,260 USD for a single run using Illumina/Solexa platform, 1,040-2,340 USD using ABI/SOLiD and 80,000-180,000 USD using Roche/454. However, additional run costs including the necessary chemicals, utensils, labor etc. must be taken into consideration. Although the results of this paper also indicate that combining of different platforms is a widespread practice, making aforementioned price estimations somewhat incomplete, it does not take away from the fact that sole usage of Illumina/Solexa is by far the most affordable approach. Another fact that adds to such a conclusion is that Illumina/Solexa is the most widespread sequencing platform, meaning that the majority of research institutions most likely possess a device manufactured by this company. Additionally, this makes Illumina/Solexa platforms most readily reachable to the researchers that do not have a sequencer at disposal in institutions they are affiliated with.
4. DISCUSSION
Examination of technical properties can be helpful for proper selection of a sequencing platform. Since longer reads are preferable for accurate assembling and for interpreting repetitive sequences, such as those of many plants of interest, the Sanger method would be the most suitable. However, it is avoided due to high cost, time and labor requirements. Sanger sequencing is often used in combination with NGS platforms, for library sequencing, sequencing of genome portions that are improperly sequenced by NGS, subsequent proofreading of certain genome portions etc. Turktas, et al. suggest that Roche/454 technology, offering the longest read-length capacity and highest speed among NGS platforms, appears as the method of choice for plant WGS without considering the total sequencing cost (3). This study shows, however, that this is not the case, and that Illumina/Solexa platforms are by far most preferred by researchers, either on their own, or in combination with other platforms. In fact, Roche/454 has been used only in combination with Sanger sequencing on 5 projects, and never as a sole sequencing tool, presumably due to the high cost. Another indicator of the cost being the key consideration is the fact that researchers favored Illumina/Solexa platforms in a widest range of genome sizes (Table 5, Figure 2). The failure rates of Illumina/Solexa platforms are generally compensated for by their deep coverage, albeit it is not sufficient for avoiding gap generation when repetitive sequence is longer than the read length. Shatz, et al. suggest using paired-end sequencing (54), and Illumina/Solexa platforms are all capable of performing it. This may be another indicator why researchers favor Illumina/Solexa to the extent presented in this study. Additionally, it has been reported on several occasions that in recent years the Illumina sequencing platform has been the most successful platform in terms of market share and widespreadness, to the point of near monopoly, which only adds to the fact that it is the most popular choice of scientists performing the plant genome sequencing (55, 56). Although Azizi et al. reported genome sizes of several species of genus Helichrysium (56), there are no exact data regarding the size of H. arenarium genome. Hence, we relied on estimated size range when calculating the estimated price range of a single sequencing run using Illumina/Solexa platform. Clearly, karyotype and genome size analysis for H. arenarium would provide valuable information prior to the sequencing and reduce the risk of project failure due to financial reasons.
5. CONCLUSION
Among 47 published draft plant genomes, 19 were obtained through sole usage of Illuina/Solexa platforms (40.42%), while additional 23 sequences were obtained through combining Illumina/Solexa with other platforms and techniques (48.64%). Usage of Illumina/Solexa platforms also encompasses nearly entire range of sizes of published genomes. Since Illumina/Solexa platforms are the most affordable in terms sequencing cost, this appears to be the key determinant in nearly all published sequencing projects. Taking into consideration the current economic situation in the Balkans region, as well as the estimated range for the size of immortelle genome, Illumina/Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region.
Authors: Rajeev K Varshney; Chi Song; Rachit K Saxena; Sarwar Azam; Sheng Yu; Andrew G Sharpe; Steven Cannon; Jongmin Baek; Benjamin D Rosen; Bunyamin Tar'an; Teresa Millan; Xudong Zhang; Larissa D Ramsay; Aiko Iwata; Ying Wang; William Nelson; Andrew D Farmer; Pooran M Gaur; Carol Soderlund; R Varma Penmetsa; Chunyan Xu; Arvind K Bharti; Weiming He; Peter Winter; Shancen Zhao; James K Hane; Noelia Carrasquilla-Garcia; Janet A Condie; Hari D Upadhyaya; Ming-Cheng Luo; Mahendar Thudi; C L L Gowda; Narendra P Singh; Judith Lichtenzveig; Krishna K Gali; Josefa Rubio; N Nadarajan; Jaroslav Dolezel; Kailash C Bansal; Xun Xu; David Edwards; Gengyun Zhang; Guenter Kahl; Juan Gil; Karam B Singh; Swapan K Datta; Scott A Jackson; Jun Wang; Douglas R Cook Journal: Nat Biotechnol Date: 2013-01-27 Impact factor: 54.908
Authors: Nicolas Sierro; James N D Battey; Sonia Ouadi; Lucien Bovet; Simon Goepfert; Nicolas Bakaher; Manuel C Peitsch; Nikolai V Ivanov Journal: Genome Biol Date: 2013-06-17 Impact factor: 13.583
Authors: Nicolas Sierro; James N D Battey; Sonia Ouadi; Nicolas Bakaher; Lucien Bovet; Adrian Willig; Simon Goepfert; Manuel C Peitsch; Nikolai V Ivanov Journal: Nat Commun Date: 2014-05-08 Impact factor: 14.919