Literature DB >> 20386747

The next generation becomes the now generation.

Diego A Martinez¹, Mary Anne Nelson.

Abstract

Entities: Chemical Disease Species

Mesh：

Substances：
DNA, Fungal

Year: 2010 PMID： 20386747 PMCID： PMC2851573 DOI： 10.1371/journal.pgen.1000906

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

× No keyword cloud information.

In recent years, several so-called next-generation DNA sequencing platforms have begun to challenge the well-established Sanger sequencing method. In two important ways—cost and speed—these next-gen technologies provide improvements over Sanger sequencing. Several technical drawbacks (short read length, lack of paired end reads, and quality problems, particularly with homonucleotide stretches [1]), however, render assembly difficult and limit the use of post-Sanger sequencing. These obstacles limited the effective use of next-generation sequencing to the sequencing of prokaryotes [2], the resequencing of individuals [3], and transcriptomics studies, recently termed RNA-Seq [4] and effectively precluded de novo eukaryotic sequencing. Realizing the shortcomings of next-generation technology, manufacturers have continued to improve the read length and have recently implemented paired end methods. Capitalizing on these improvements, the publication by Nowrousian et al. describes the team's success in completely bypassing Sanger sequencing to produce a de novo assembly (to draft quality) of a complete genome, that of the filamentous fungus Sordaria macrospora [5], using Solexa sequencing-by-synthesis and 454 pyrosequencing. The technical merits of this publication make it an excellent starting point for future genome sequencing using post-Sanger platforms. The assembly phase has been a particular sticking point for de novo genome sequencing in eukaryotes, as the complexity of the genomes makes it difficult to correctly place short reads. By sequencing to high depth (nearly 100 times the length of the genome), the authors were able to pull the assembly together in large pieces (contigs) and obtain a reasonable N50 = 117 kb (defined as the smallest length of the longest contigs that cover 50% of the genome). The authors also experimented with different levels of coverage and different combinations of reads to produce assemblies of various qualities. They determined that the depth to which S. macrospora was sequenced may not be necessary, and that closing gaps with 454 reads resulted in a large improvement. Interestingly, this is similar to the blend of long- and short-insert libraries that were used for the whole genome shotgun version of the human genome project [6]. By leveraging the short inexpensive Solexa reads for the bulk of the genome, the longer 454 reads can add valuable contig order and orienting information and vastly improve quality while dramatically reducing the associated cost. Nowrousian et al. [5] have provided the assembly statistics for various depths and platforms, paving the way for future studies using high throughput sequencing. The researchers also showed that post-Sanger sequencing technologies can be used to reliably assemble difficult areas of the genome. One region of the genome, that which controls nonself recognition, could have been a particularly troublesome stumbling block. Anastomosis is a process by which hyphae, the thread-like projections of filamentous fungi, fuse and bring genetically distinct nuclei into contact. Fungi from the same species with different het (heterokaryon incompatibilty) loci will fuse, but the resulting heterokaryotic cells are subject to either severely restricted growth or cell death. This process has benefits that the authors describe briefly. Although incompatibility has never been observed in S. macrospora, the investigators report that the genome contains apparent heterokaryon incompatibility genes, with the twist that the region is inverted and contains duplications of key genes near the ends of the inversion. Such a duplication might be difficult to resolve with short Solexa data and even the longer 454 reads. However, the authors used polymerase chain reaction (PCR) to amplify across the boundaries of the inverted and duplicated region, and end-sequenced the PCR products to confirm the genome structure predicted by the genome assembler Velvet [7]. Given this demonstrated success in resolving a difficult region containing duplicate genes, researchers and physicians can consider the previously unfeasible next-gen sequencing technologies when deciding whether to sequence an entire genome. The quality of sequence produced, and ability to compare the Sanger and post-Sanger sequence scores, were additional sticking points to relying completely on the lower cost next-gen technologies. On this front, Nowrousian's team gave us a glimpse of the error rate and how it compares to that of Sanger sequencing by choosing several possible frame shifts in predicted coding regions for resequencing. The outcome of this investigation, although based on a small (21 kb total) sample, shows that the next-gen technologies can achieve error rates similar to those of Sanger sequencing. This leaves no obvious reason to use any Sanger sequencing for future whole genome sequencing projects.

Beer, Wine, and Advancements in Science and Technology

The selection of organism to sequence in this venture was critical, and a wise choice was made. Fungi, as the authors mention, are not only important to broad areas from ecology and agriculture to medicine and biotechnology, but are also important test platforms due to several characteristics of the genomes inherent to the fungal kingdom. Such traits were important in selecting the yeast Saccharomyces cerevisiae as the first sequenced eukaryote, a fungus only distantly related to the filamentous S. macrospora. Similar attributes are of value here, chiefly low-repeat content (critical for clean assemblies) and manageable size (S. macrospora genome of approximately 40 Mb). The low-repeat content in the genome of S. macrospora is possibly due to the effect of repeat-induced point mutation or RIP [8], which has been well documented in the closely related Neurosopora crassa [9]. The authors suggest that RIP might have been active at some point in its evolutionary history, but that S. macrospora may no longer have an active RIP process. Still, by some mechanism S. macrospora is able to keep repeat elements low in copy number. In addition, haploid genomes are much more easily assembled because of a lack of allelic heterozygosity. It remains to be seen how amenable large, diploid genomes will be to assembly using similar technologies. For one other key reason, S. macrospora was an excellent candidate for this next-gen sequencing effort. The close relation to N. crassa offers both a good companion for comparative genomics as well as a verification of assembly quality, as large sections of the genomes were known to be similar enough to align extensively [10]. This relationship was also used to pull the assembled fragments together and produce a very clean high-quality assembly with few scaffolds (152 in total).

Terabyte Is the New Gigabyte

Now that any academic department or perhaps even lab around the world can sequence a draft quality genome inexpensively, the amount of sequence data will predictably explode. While the number of genomes sequenced to date is more than one thousand (Figures 1 and 2) [11]—if we count both eukaryotic and prokaryotic projects—this advancement opens the door to an exponential expansion in the number of available genomes. Can we handle it? The National Center for Biotechnology Information (NCBI) currently deals well with several strains of the same species, but are we ready for individuals of the same strain? While technical hurdles to individual sequencing (the need for multiple copies of the same genome to fragment) remain for single-celled organisms, for fungi, and other eukaryotes with small genomes, this is a likely next level of study. Clearly the expected flood of data and the potential for finding answers to biological questions on this new level make it imperative to develop robust tools for referencing and storing sequence information on an individual by individual basis, and perhaps doing away with the current system of using a single reference genome.

Figure 1

Number of genomes entered into GenBank by year as of September 2009.

Adapted from http://www.genomesonline.org/ [11].

Figure 2

Number of projects per phylogenetic group as of September 2009.

Adapted from http://www.genomesonline.org/ [11].

Number of genomes entered into GenBank by year as of September 2009.

Adapted from http://www.genomesonline.org/ [11].

Number of projects per phylogenetic group as of September 2009.

Adapted from http://www.genomesonline.org/ [11]. At least for the fungal research community, the quality, cost, and speed of next-gen sequencing technologies are now such that we can sequence at will and add to the rapidly growing list of available fungal genomes, as shown in Figure 2. This may be the case for mammalian genomes as well, as suggested in a recent publication (the giant panda [12]). Still, we have not yet attained the “1,000-dollar genome” widely thought to be necessary for broad medical use in diagnosis and selection of treatments [13]. What is the new next-gen sequencing? One answer to this question might come from Pacific Biosciences Corporation. In a recent publication [14], it appears they are able to detect the addition of a nucleotide to a growing strand of DNA by the polymerase enzyme. This “real-time” sequencing technology may be the next point in the race for fast and inexpensive whole-genome sequencing. Additional companies such as Complete Genomics and Ion Torrent Systems are unveiling new instruments and techniques and it is likely the speed with which data are produced will continue to increase while the costs will decrease. Until then, we will have plenty of data to sift through while we wait.

14 in total

1. Anticipating the 1,000 dollar genome.

Authors: Elaine R Mardis
Journal: Genome Biol Date: 2006 Impact factor: 13.583

2. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

3. The complete genome of an individual by massively parallel DNA sequencing.

Authors: David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg
Journal: Nature Date: 2008-04-17 Impact factor: 49.962

4. The genome sequence of the filamentous fungus Neurospora crassa.

Authors: James E Galagan; Sarah E Calvo; Katherine A Borkovich; Eric U Selker; Nick D Read; David Jaffe; William FitzHugh; Li-Jun Ma; Serge Smirnov; Seth Purcell; Bushra Rehman; Timothy Elkins; Reinhard Engels; Shunguang Wang; Cydney B Nielsen; Jonathan Butler; Matthew Endrizzi; Dayong Qui; Peter Ianakiev; Deborah Bell-Pedersen; Mary Anne Nelson; Margaret Werner-Washburne; Claude P Selitrennikoff; John A Kinsey; Edward L Braun; Alex Zelter; Ulrich Schulte; Gregory O Kothe; Gregory Jedd; Werner Mewes; Chuck Staben; Edward Marcotte; David Greenberg; Alice Roy; Karen Foley; Jerome Naylor; Nicole Stange-Thomann; Robert Barrett; Sante Gnerre; Michael Kamal; Manolis Kamvysselis; Evan Mauceli; Cord Bielke; Stephen Rudd; Dmitrij Frishman; Svetlana Krystofova; Carolyn Rasmussen; Robert L Metzenberg; David D Perkins; Scott Kroken; Carlo Cogoni; Giuseppe Macino; David Catcheside; Weixi Li; Robert J Pratt; Stephen A Osmani; Colin P C DeSouza; Louise Glass; Marc J Orbach; J Andrew Berglund; Rodger Voelker; Oded Yarden; Michael Plamann; Stephan Seiler; Jay Dunlap; Alan Radford; Rodolfo Aramayo; Donald O Natvig; Lisa A Alex; Gertrud Mannhaupt; Daniel J Ebbole; Michael Freitag; Ian Paulsen; Matthew S Sachs; Eric S Lander; Chad Nusbaum; Bruce Birren
Journal: Nature Date: 2003-04-24 Impact factor: 49.962

Review 5. RIP: the evolutionary cost of genome defense.

Authors: James E Galagan; Eric U Selker
Journal: Trends Genet Date: 2004-09 Impact factor: 11.639

6. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

Authors: Minou Nowrousian; Christian Würtz; Stefanie Pöggeler; Ulrich Kück
Journal: Fungal Genet Biol Date: 2004-03 Impact factor: 3.495

7. The sequence of the human genome.

Authors: J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal: Science Date: 2001-02-16 Impact factor: 47.728

8. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

Authors: Anjana Srivatsan; Yi Han; Jianlan Peng; Ashley K Tehranchi; Richard Gibbs; Jue D Wang; Rui Chen
Journal: PLoS Genet Date: 2008-08-01 Impact factor: 5.917

9. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

Authors: Konstantinos Liolios; Konstantinos Mavromatis; Nektarios Tavernarakis; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2007-11-02 Impact factor: 16.971

10. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.

Authors: Brian T Wilhelm; Samuel Marguerat; Stephen Watt; Falk Schubert; Valerie Wood; Ian Goodhead; Christopher J Penkett; Jane Rogers; Jürg Bähler
Journal: Nature Date: 2008-05-18 Impact factor: 49.962

12 in total

1. Dissecting plant genomes with the PLAZA comparative genomics platform.

Authors: Michiel Van Bel; Sebastian Proost; Elisabeth Wischnitzki; Sara Movahedi; Christopher Scheerlinck; Yves Van de Peer; Klaas Vandepoele
Journal: Plant Physiol Date: 2011-12-23 Impact factor: 8.340

Review 2. Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems.

Authors: Minou Nowrousian
Journal: Eukaryot Cell Date: 2010-07-02

3. Cost of cancer diagnosis using next-generation sequencing targeted gene panels in routine practice: a nationwide French study.

Authors: Patricia Marino; Rajae Touzani; Lionel Perrier; Etienne Rouleau; Dede Sika Kossi; Zou Zhaomin; Nathanaël Charrier; Nicolas Goardon; Claude Preudhomme; Isabelle Durand-Zaleski; Isabelle Borget; Sandrine Baffert
Journal: Eur J Hum Genet Date: 2018-01-24 Impact factor: 4.246

4. A metagenomic analysis of pandemic influenza A (2009 H1N1) infection in patients from North America.

Authors: Alexander L Greninger; Eunice C Chen; Taylor Sittler; Alex Scheinerman; Nareg Roubinian; Guixia Yu; Edward Kim; Dylan R Pillai; Cyril Guyard; Tony Mazzulli; Pavel Isa; Carlos F Arias; John Hackett; Gerald Schochetman; Steve Miller; Patrick Tang; Charles Y Chiu
Journal: PLoS One Date: 2010-10-18 Impact factor: 3.240

5. Identification of human HK genes and gene expression regulation study in cancer from transcriptomics data analysis.

Authors: Meili Chen; Jingfa Xiao; Zhang Zhang; Jingxing Liu; Jiayan Wu; Jun Yu
Journal: PLoS One Date: 2013-01-31 Impact factor: 3.240

6. Finding single copy genes out of sequenced genomes for multilocus phylogenetics in non-model fungi.

Authors: Nicolas Feau; Thibaut Decourcelle; Claude Husson; Marie-Laure Desprez-Loustau; Cyril Dutech
Journal: PLoS One Date: 2011-04-13 Impact factor: 3.240

7. HSA: a heuristic splice alignment tool.

Authors: Jingde Bu; Xuebin Chi; Zhong Jin
Journal: BMC Syst Biol Date: 2013-12-17

8. Targeted Next Generation Sequencing as a Reliable Diagnostic Assay for the Detection of Somatic Mutations in Tumours Using Minimal DNA Amounts from Formalin Fixed Paraffin Embedded Material.

Authors: Wendy W J de Leng; Christa G Gadellaa-van Hooijdonk; Françoise A S Barendregt-Smouter; Marco J Koudijs; Ies Nijman; John W J Hinrichs; Edwin Cuppen; Stef van Lieshout; Robert D Loberg; Maja de Jonge; Emile E Voest; Roel A de Weger; Neeltje Steeghs; Marlies H G Langenberg; Stefan Sleijfer; Stefan M Willems; Martijn P Lolkema
Journal: PLoS One Date: 2016-02-26 Impact factor: 3.240

9. Rank and order: evaluating the performance of SNPs for individual assignment in a non-model organism.

Authors: Caroline G Storer; Carita E Pascal; Steven B Roberts; William D Templin; Lisa W Seeb; James E Seeb
Journal: PLoS One Date: 2012-11-20 Impact factor: 3.240

10. Validation of next generation sequencing technologies in comparison to current diagnostic gold standards for BRAF, EGFR and KRAS mutational analysis.

Authors: Clare M McCourt; Darragh G McArt; Ken Mills; Mark A Catherwood; Perry Maxwell; David J Waugh; Peter Hamilton; Joe M O'Sullivan; Manuel Salto-Tellez
Journal: PLoS One Date: 2013-07-26 Impact factor: 3.240