Literature DB >> 32491162

Education in the genomics era: Generating high-quality genome assemblies in university courses.

Stefan Prost^1,2, Sven Winter^3,4, Jordi De Raad^1,3, Raphael T F Coimbra^3,4, Magnus Wolf^3,4, Maria A Nilsson^1,4, Malte Petersen¹, Deepak K Gupta¹, Tilman Schell¹, Fritjof Lammers^1,3,4, Axel Janke^1,3,4.

Abstract

Recent advances in genome sequencing technologies have simplified the generation of genome data and reduced the costs for genome assemblies, even for complex genomes like those of vertebrates. More practically oriented genomic courses can prepare university students for the increasing importance of genomic data used in biological and medical research. Low-cost third-generation sequencing technology, along with publicly available data, can be used to teach students how to process genomic data, assemble full chromosome-level genomes, and publish the results in peer-reviewed journals, or preprint servers. Here we outline experiences gained from 2 master's-level courses and discuss practical considerations for teaching hands-on genome assembly courses.

Entities: Gene Species

Keywords: MinION; Oxford Nanopore Technologies; genome assembly; teaching; university education

Year: 2020 PMID： 32491162 PMCID： PMC7268781 DOI： 10.1093/gigascience/giaa058

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Background

The number of published genome assemblies has increased exponentially since the publication of the human genome in 2001 [1]. Back then, large international consortia and vast amounts of funding were required to complete this task. Today even small research groups can generate high-quality genome assemblies up to full chromosome level. An important step in this progression was the advent of “third-generation” sequencing with the release of Pacific Biosciences’ and Oxford Nanopore Technologies’ sequencing platforms. These technologies perform real-time sequencing of long, single DNA molecules and require no amplification to increase the sequencing detection signal strength [2]. The inclusion of these technologies has drastically improved the quality of generated genome assemblies and substantially decreased costs, thus enabling even small research groups to realise high-quality genome assemblies. The ongoing “genomic revolution” has increased the need for practical university courses on genome assembly and genomics in general. One of these third-generation sequencing platforms is Oxford Nanopore Technologies' MinION. It is a USB drive–sized sequencer that has gained in popularity over the past 5 years owing to its potential to generate very long reads, its relatively low cost, and its portability [3, 4]. Sequencing is performed by measuring ionic current changes when a single-stranded DNA molecule passes through a “nanopore” in the device's biological membrane [5]. Its portability, ease of use, and relatively low cost make it an ideal and effective teaching tool in classroom settings [6, 7], as well as in the field [8]. Studies on the educational use of the MinION device have mainly focused on methods such as DNA- or metabarcoding, or genome sequencing of bacteria or bacteriophages. These approaches have the advantage that they are easy to conduct, do not require large servers for the data processing, and are relatively inexpensive. However, today, the combination of (i) nanopore-based sequencing along with (ii) new efficient bioinformatic assembly pipelines and (iii) (publicly available) short-read data offer the possibility of generating chromosome-level assemblies, even of complex vertebrate genomes, as part of university courses. Here we report our experience and provide practical aspects of teaching hands-on vertebrate genome assembly courses at the master's level.

Practical Courses in the Genomics Era

Over the past 2 years, we have taught 2 master's level courses, each 6 weeks long, focusing on the assembly and analysis of vertebrate genomes, and the required theoretical and biological background. In these courses, students gained practical experience in extracting high molecular weight DNA (hmwDNA) and preparing sequencing libraries and subsequently sequenced the genomic DNA on the MinION device in the first part of the course (see Table 1 and Supplementary Material 1). These data, in combination with either publicly available or previously generated short-read data, were then used to generate highly continuous vertebrate genome assemblies in the second part of the course (see Table 1 and Supplementary Material 1). The inclusion of dedicated sessions on the basics of laboratory work and bioinformatics enables the active participation of students from a variety of disciplines (such as biology or bioinformatics) even without prior knowledge in these fields. We recommend keeping the number of students <15 to enable direct interactions. This is important especially for students who have no or little bioinformatics or laboratory experience and might otherwise struggle to keep up with the course.

Table 1:

Example of a course outline

Part 1—Laboratory Processing and Basic Training
Week 1	Theory: Introduction to genome sequencing techniques and analyses
	Hands-on: Laboratory work (laboratory safety guidelines, DNA isolation, quality assessment, nanopore library preparation, sequencing)
	Background lecture series: Molecular evolution
Week 2	Hands-on: Introduction to the Bash command line environment
	Hands-on: Introduction to base-calling and quality assessment of sequencing data
	Background lecture series: Molecular evolution
Part 2—Genome Assembly, Annotation, and Downstream Analyses
Week 3	Hands-on: Genome assembly (long-read assembly) and polishing (long-read/short-read)
	Hands-on: Transcriptome assembly
	Background lecture series: Molecular evolution
Week 4	Hands-on: Assembly quality assessment (assembly statistics, BUSCO)
	Hands-on: Scaffolding to chromosome level using Hi-C data
	Hands-on: Genome annotation (repetitive elements, genes)
	Background lecture series: Molecular evolution
	Seminar: Current research examples in vertebrate genomics
Week 5	Hands-on: Introduction to comparative analyses, e.g., methods of phylogenetic tree reconstruction or population genomics
	Background lecture series: Molecular evolution
Week 6	Theory: Introduction to scientific writing
	Theory: Overview of the scientific publishing process (from manuscript writing to publication)
	Hands-on: Manuscript writing
	General quality assessment session

Detailed information can be found in Supplementary Material 1.

Example of a course outline Detailed information can be found in Supplementary Material 1. The selection of a species for a MinION-based genome assembly course should be based on a few characteristics: (i) the availability of relatively fresh material for the extraction of hmwDNA; (ii) prior testing of its ability to be sequenced on a MinION because some taxa cannot effectively be sequenced on a MinION, probably owing to the presence of biological molecules in the DNA extraction that interfere with the sequencing process; (iii) genome size because this dictates how much data and computational resources are needed for a successful assembly; (iv) the interest of the community in the species; and (v) availability of short-read data for polishing or chromosome-level scaffolding. In these courses, we have focused on teleost fish genomes for which we have established hmwDNA extraction and MinION sequencing protocols [9]. Among vertebrates, many teleost fish species have relatively small genomes (∼400–700 Mb), and low coverage (20–30×) of long-read data is usually sufficient to generate high-quality genome assemblies for these. We highly recommend using available databases such as the Animal Genome Size Database [http://genomesize.com/] to look up genome size estimations for a target species during course planning. Alternatively, short-read data or flow cytometry can be used to estimate genome sizes. There are a variety of genome assemblies available online (databases: NCBI Genome [www.ncbi.nlm.nih.gov/genome], DNA Zoo [www.dnazoo.org], GigaDB [www.gigadb.org], etc.) that are based on short-read libraries. These would benefit from more continuous assemblies with long-read data to allow for more in-depth analyses such as on genome architecture evolution or speciation. Because NCBI and other genome databases require all the accompanying raw read data to be deposited on their SRA database (https://www.ncbi.nlm.nih.gov/sra), these reads could also be used for genome polishing during the course. Even though individual read error rates (5–25%, reviewed in [2, 3]) for the MinION have decreased over recent years, it is still recommended to polish the resulting genome assemblies using highly accurate (0.1–1% error rate, reviewed in [2]) short-read data. To produce chromosome-level genome assemblies, so-called proximity-ligation sequencing data are needed. Several companies offer kits to generate these. However, the library construction is complicated and usually requires 2 full days, so we do not recommend generating these during the course. Alternatively, public data platforms such as DNA Zoo offer useful resources for proximity-ligation data. Furthermore, these databases include numerous species for which high-quality, continuous assemblies are unavailable. The development of time- and resource-effective genome assembly tools, such as wtdbg2 [10], allows students to generate genome assemblies of vertebrate genomes within hours, even on small servers. Depending on the genome size and amount of read data, it might even be possible to assemble the genomes on a desktop computer or laptop. It is advantageous to teach the students first how to run these tools on a subset of the data and then to have them process the complete data in smaller groups. This way, relatively time-intensive steps can be processed overnight, and the results checked and discussed with the students the following day. Subsequently, they can be introduced to post-assembly steps such as repeat or gene annotation. The generation and assembly of genomic data as part of university courses also makes it possible to involve students in the publishing process of peer-reviewed scientific publications. To achieve this, we allocated time to include scientific writing and publishing in the curriculum. The students were given the task to draft a genome announcement paper under the supervision of the course trainers, which was then submitted to bioRxiv (https://www.biorxiv.org/) (see, e.g., [9]) and a peer-reviewed scientific journal. This way, students are involved in every step of the process, from generating the genomic data to assembly and annotation and to publishing the scientific article. They will not only learn how to write and publish scientific manuscripts but are also involved in publishing scientific peer-reviewed articles very early in their scientific careers. This keeps the motivation high because they have the opportunity to work on new data and obtain meaningful results compared to the analysis of simplified teaching datasets, often used as classroom examples. Collaborations with different research groups can help to find scientifically interesting species to sequence, which will ensure that publication of the genome assembly and annotation is of general interest for the research community.

Conclusions

Here we show that recent advances in portable sequencing technology, ever-decreasing sequencing costs, the development of computationally efficient tools, and the increasing availability of publicly accessible read data can be used for practical teaching of genome assembly and genomics within the frame of university master courses. Selection of species with smaller genomes or more reliance on available data may also enable universities in low-income areas and countries to organize genome assembly courses. Practical training that focuses on newly sequenced or improved genome assemblies will further enable students to gain experience publishing scientific studies early on in their careers.

Additional Files

Supplementary Material 1: Course structure and teaching goals Click here for additional data file. Click here for additional data file. Click here for additional data file. Bernie Pope, Ph.D. -- 4/14/2020 Reviewed Click here for additional data file. Click here for additional data file.

Abbreviations

BUSCO: Benchmarking Universal Single-Copy Orthologs; hmwDNA: high molecular weight DNA; Mb: megabase pairs; NCBI: National Center for Biotechnology Information; SRA: Sequence Read Archive; USB: Universal Serial Bus.

10 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

Review 2. Coming of age: ten years of next-generation sequencing technologies.

Authors: Sara Goodwin; John D McPherson; W Richard McCombie
Journal: Nat Rev Genet Date: 2016-05-17 Impact factor: 53.242

3. Using mobile sequencers in an academic classroom.

Authors: Sophie Zaaijer; Yaniv Erlich
Journal: Elife Date: 2016-04-07 Impact factor: 8.140

4. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community.

Authors: Miten Jain; Hugh E Olsen; Benedict Paten; Mark Akeson
Journal: Genome Biol Date: 2016-11-25 Impact factor: 13.583

5. Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908

6. Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master's Course.

Authors: Stefan Prost; Malte Petersen; Martin Grethlein; Sarah Joy Hahn; Nina Kuschik-Maczollek; Martyna Ewa Olesiuk; Jan-Olaf Reschke; Tamara Elke Schmey; Caroline Zimmer; Deepak K Gupta; Tilman Schell; Raphael Coimbra; Jordi De Raad; Fritjof Lammers; Sven Winter; Axel Janke
Journal: G3 (Bethesda) Date: 2020-07-07 Impact factor: 3.154

7. Fast and accurate long-read assembly with wtdbg2.

Authors: Jue Ruan; Heng Li
Journal: Nat Methods Date: 2019-12-09 Impact factor: 28.547

8. Portable sequencing as a teaching tool in conservation and biodiversity research.

Authors: Mrinalini Watsa; Gideon A Erkenswick; Aaron Pomerantz; Stefan Prost
Journal: PLoS Biol Date: 2020-04-16 Impact factor: 8.029

9. An educational guide for nanopore sequencing in the classroom.

Authors: Alex N Salazar; Franklin L Nobrega; Christine Anyansi; Cristian Aparicio-Maldonado; Ana Rita Costa; Anna C Haagsma; Anwar Hiralal; Ahmed Mahfouz; Rebecca E McKenzie; Teunke van Rossum; Stan J J Brouns; Thomas Abeel
Journal: PLoS Comput Biol Date: 2020-01-23 Impact factor: 4.475

Review 10. Genetic Biomonitoring and Biodiversity Assessment Using Portable Sequencing Technologies: Current Uses and Future Directions.

Authors: Henrik Krehenwinkel; Aaron Pomerantz; Stefan Prost
Journal: Genes (Basel) Date: 2019-10-29 Impact factor: 4.096

10 in total

2 in total

1. Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master's Course.

2. Utilisation of Oxford Nanopore sequencing to generate six complete gastropod mitochondrial genomes as part of a biodiversity curriculum.

Authors: Mattia De Vivo; Hsin-Han Lee; Yu-Sin Huang; Niklas Dreyer; Chia-Ling Fong; Felipe Monteiro Gomes de Mattos; Dharmesh Jain; Yung-Hui Victoria Wen; John Karichu Mwihaki; Tzi-Yuan Wang; Ryuji J Machida; John Wang; Benny K K Chan; Isheng Jason Tsai
Journal: Sci Rep Date: 2022-06-15 Impact factor: 4.996

2 in total