Literature DB >> 29333237

Draft genome of tule elk Cervus canadensis nannodes.

Jessica E Mizzi¹, Zachary T Lounsberry², C Titus Brown³, Benjamin N Sacks⁴.

Abstract

This paper presents the first draft genome of the tule elk ( Cervus elaphus nannodes), a subspecies native to California that underwent an extreme genetic bottleneck in the late 1800s. The genome was generated from Illumina HiSeq 3000 whole genome sequencing of four individuals, resulting in the assembly of 2.395 billion base pairs (Gbp) over 602,862 contigs over 500 bp and N50 = 6,885 bp. This genome provides a resource to facilitate future genomic research on elk and other cervids.

Entities: Chemical Disease Species

Keywords: Cervus elaphus nannodes; genome draft; mammalian genome assembly; tule elk

Year: 2017 PMID： 29333237 PMCID： PMC5747339 DOI： 10.12688/f1000research.12636.2

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

At the initiation of this project, no genome assembly existed for any member of the deer family (Cerivdae). We therefore sought to generate the first such assembly for the tule elk ( Cervus canadensis nannodes). We note that after we completed our project and submitted the intial draft of this manuscript, a full assembly of red deer ( Cervus elaphus hippelaphus) became available online [1]. The present paper presents the first de novo genomic draft of the tule elk. This California-endemic elk subspecies underwent a major genetic bottleneck when its numbers were reduced to as few as three individuals in the 1870s [2, 3]. Although their numbers have increased to >5,000 today [4], the historical bottleneck nevertheless left its mark on the elk’s genome, rendering it more homozygous than other elk subspecies. Our motivation for generating a genomic resource for the tule elk was to create a reference for identifying single nucleotide polymorphisms (SNPs) to develop assays to monitor elk population abundance and for related population genetic applications. Due to the relatively low coverage generated in this work (40X overall with an average of 10X coverage from each individual), we used the MEGAHIT metagenome assembler, which has been found to perform well on low-quality or low-coverage DNA sequencing in bacteria [5].

Methods

Sample collection and library prep

Elk were selected from four geographically distinct populations across northern California to maximize genomic diversity (San Luis Reservoir, California Valley, American Canyon, and the San Luis National Wildlife Refuge [4]). Genomic DNA was extracted from skin biopsies, which were obtained by the California Department of Fish and Wildlife as part of their elk management activities [4]. We extracted DNA from skin using Qiagen DNeasy blood & tissue kits (QIAGEN Inc., Valencia, CA), according to the manufacturer’s instructions. The DNA was then fragmented via sonication using a Bioruptor (Diagenode, Denville, NJ) to 300 to 400 base pairs (bp) prior to adapter ligation. After verification of fragment size range using agarose gel electrophoresis, NEBNext® Ultra™ DNA Library Prep Kit for Illumina® (New England Biolabs, Inc., Ipswich, MA) was used to ligate Illumina adapters. Multiplexed libraries were prepared using NEBNext Multiplex Oligos for Illumina (New England Biolabs) to individually barcode each of four individual elk. Barcodes were annealed using low-cycle polymerase chain reactions during library preparation. To assess library quality, trace analysis was performed using a Bioanalyzer 2100 (Agilent, Santa Clara, CA) and fluorometric DNA quantitation of libraries was performed using a Qubit fluorometer (Invitrogen, Carlsbad, CA) prior to equilibrating sample concentrations and pooling for sequencing. After library quality control, the four samples were pooled in equimolar concentrations and submitted for paired-end sequencing. Samples were sequenced on an Illumina HiSeq 3000 at the DNA Technologies and Expression Analysis Core of the UC Davis Genome Center.

Bioinformatics processing

Sequencing quality on demultiplexed reads was evaluated using FastQC v0.11.3 (RRID:SCR_014583) [6]. The Illumina TruSeq3-PE sequencing adapters were removed using Trimmomatic v0.30 (RRID:SCR_011848) [7] with the ILLUMINACLIP parameter set to TruSeq3-PE.fa:2:40:15. The TruSeq3-PE.fa sequence was downloaded from https://anonscm.debian.org/cgit/debian-med/trimmomatic.git/plain/adapters/TruSeq3-PE.fa. LEADING and TRAILING parameters were set to 2, resulting in the removal of bases with a quality score of 2 or less according to a phred33 quality scoring matrix. The SLIDINGWINDOW parameter of 4:2 was used to clip reads once the quality score fell below 2 within the window. The MINLENGTH parameter set to 25 dropped any reads that fell below that length due to quality trimming. The demultiplexed, quality-filtered reads were interleaved using the interleave-reads.py script in khmer v2.0 (RRID:SCR_001156) [8]. The assembly was performed using MEGAHIT v1.0.5 [9] on interleaved quality filtered reads. Genome statistical analysis was done using QUAST v3.0 (RRID:SCR_001228) [10]. All code used is publicly available at https://github.com/dib-lab/2017-tule-elk/.

Results

We obtained 377,980,276 demultiplexed 150 bp paired-end raw reads, containing a total of 113.394 Gbp of sequence, from which 99.830 Gbp (88%) had quality scores ≥ Q30 (average quality score = 37.2), or approximately 40X coverage of the approximately 3 Gbp tule elk genome. Sequence assembly resulted in the generation of a total genome sequence size of 2.395 Gbp. Reads were assembled into 602,862 contiguous sequences ("contigs") averaging 3,973 bp in length with a minimum contig length of 201 bp. The G+C content of the genome was 41.55%. The N50 was 6,885 bp and maximum contig length was 72,391 bp. Additional assembly statistics are available in Table 1. No contigs (e.g. under a certain size or likely to reflect repeats) were removed from the assembly.

Table 1.

Quality metrics on tule elk ( Cervus canadensis nannodes) assembly, as generated with QUAST v3.0.

Metric	Tule elk assembly
# contigs (≥ 200 bp)	1,367,218
# contigs ≥ 500 bp	602,862
# contigs (≥ 1000 bp)	460,702
# contigs (≥ 5000 bp)	160,229
# contigs (≥ 10000 bp)	51,790
# contigs (≥ 25000 bp)	2,606
# contigs (≥ 50000 bp)	36
Total length (≥ 200 bp)	2,607,088,486
Total length (≥ 1000 bp)	2,295,163,580
Total length (≥ 5000 bp)	1,531,314,985
Total length (≥ 10000 bp)	771,863,493
Total length (≥ 25000 bp)	80,157,993
Total length (≥ 50000 bp)	2,056,962
Largest contig	72,391
Total length	2,395,105,945
GC	41.55%
N50	6,885
N75	3,646
L50	103,346
L75	222,107
# N's per 100 kbp	0

This genome can serve as the basis for further genomic work on tule elk and other cervids, such as the development of a SNP assay to track elk population movement across increasingly developed northern Californian terrain. Furthermore, it is one of the first whole genome assemblies available from the family Cervidae, providing a useful interim reference genome for bioinformatic analyses on other deer and elk species.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Mizzi JE et al. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). Raw reads are available in the SRA under the BioProject ID PRJNA345218. The genome draft is available at https://doi.org/10.6084/m9.figshare.5382565.v1 [11]. Code used in this study have been archived at http://doi.org/10.5281/zenodo.887935 [12] I'm happy with the changes made, no further comments. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors describe the generation of a draft assembly for tule elk in the style of a brief genome announcement. For SNP detection and primer design this assembly is fine. It could e.g. be used in combination with Genotyping by Sequencing on additional individuals. Materials and methods are sound and provided in full. However a quick search of NCBI's taxonomy resource reveals that since June 2017 there is a genome assembly for red deer available https://www.ncbi.nlm.nih.gov/genome/10790. The authors therefore cannot claim to present the first whole genome assembly from the family Cervidae. Please change that statement. Suggested further improvements: Results I would have liked to see a figure for the total amount of sequence after filtering as a simple way of showing how good or bad the sequence run was. Table 1's readability would be improved by getting all figures to align right. I'd also recommend to add another assembly metric to look at the gene content; either using something like BUSCO or by mapping the refseq sequences of a related, well annotated species (e.g. cattle) against the draft genome. Methods Sample collection and library prep I see that each individual has two tissue samples. The authors entered a sample ID into the 'tissue' field of NCBI's BioSample database. I'd recommend removing this and adding the animal ID in the 'isolate' field. Please expand the entries in the 'isolation source' field. It says e.g. "Am. Cyn" which probably means American Canyon. Bioinformatics processing Checking the code I believe the statement "LEADING, TRAILING, and SLIDING parameters were set to 2" should read "LEADING and TRAILING parameters were set to 2". I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Thank you for your review of this paper. Version 2 has been edited to reflect the presence of the red deer genome and a citation to that genome has been made. Table 1 has been reformatted for readability. The changes you’ve requested to the NCBI BioSample entry have been made. The trimmomatic code in the Bioinformatics Processing section has been edited to remove the erroneous “SLIDING” parameter. We’ve added text to the first sentence of the results section that describes the quality of sequence data in terms of standard quality scores. We opted not to provide details on the gene content relative to a related genome as we felt this could be done more comprehensively in the future once the red deer genome has been published and peer-reviewed. This article describes the generation of a draft genome (40X coverage from 4 animals) of the tule elk (Cervus elaphus nannodes). The research methods are fairly standard for the Illumina sequencing used. At 602,862 contigs, the genome is very prelminary and will require quite a bit of additional work in order for it to be applicable to a wide range of applications. The report basically falls into a category of a genome announcement. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Thank you for your review of this paper.

7 in total

1. Development and Characterization of 15 Polymorphic Dinucleotide Microsatellite Markers for Tule Elk Using HiSeq3000.

Authors: Benjamin N Sacks; Zachary T Lounsberry; Tatyana Kalani; Erin P Meredith; Cristen Langner
Journal: J Hered Date: 2016-09-16 Impact factor: 2.645

2. QUAST: quality assessment tool for genome assemblies.

Authors: Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal: Bioinformatics Date: 2013-02-19 Impact factor: 6.937

Review 3. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.

Authors: Dinghua Li; Ruibang Luo; Chi-Man Liu; Chi-Ming Leung; Hing-Fung Ting; Kunihiko Sadakane; Hiroshi Yamashita; Tak-Wah Lam
Journal: Methods Date: 2016-03-21 Impact factor: 3.608

4. Improving ancient DNA genome assembly.

Authors: Alexander Seitz; Kay Nieselt
Journal: PeerJ Date: 2017-04-05 Impact factor: 2.984

5. The khmer software package: enabling efficient nucleotide sequence analysis.

Authors: Michael R Crusoe; Hussien F Alameldin; Sherine Awad; Elmar Boucher; Adam Caldwell; Reed Cartwright; Amanda Charbonneau; Bede Constantinides; Greg Edvenson; Scott Fay; Jacob Fenton; Thomas Fenzl; Jordan Fish; Leonor Garcia-Gutierrez; Phillip Garland; Jonathan Gluck; Iván González; Sarah Guermond; Jiarong Guo; Aditi Gupta; Joshua R Herr; Adina Howe; Alex Hyer; Andreas Härpfer; Luiz Irber; Rhys Kidd; David Lin; Justin Lippi; Tamer Mansour; Pamela McA'Nulty; Eric McDonald; Jessica Mizzi; Kevin D Murray; Joshua R Nahum; Kaben Nanlohy; Alexander Johan Nederbragt; Humberto Ortiz-Zuazaga; Jeramia Ory; Jason Pell; Charles Pepe-Ranney; Zachary N Russ; Erich Schwarz; Camille Scott; Josiah Seaman; Scott Sievert; Jared Simpson; Connor T Skennerton; James Spencer; Ramakrishnan Srinivasan; Daniel Standage; James A Stapleton; Susan R Steinman; Joe Stein; Benjamin Taylor; Will Trimble; Heather L Wiencko; Michael Wright; Brian Wyss; Qingpeng Zhang; En Zyme; C Titus Brown
Journal: F1000Res Date: 2015-09-25

6. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

7. Draft genome of tule elk Cervus canadensis nannodes.

Authors: Jessica E Mizzi; Zachary T Lounsberry; C Titus Brown; Benjamin N Sacks
Journal: F1000Res Date: 2017-09-15

7 in total

6 in total

Review 1. The Landscape of Genetic Content in the Gut and Oral Human Microbiome.

Authors: Braden T Tierney; Zhen Yang; Jacob M Luber; Marc Beaudin; Marsha C Wibowo; Christina Baek; Eleanor Mehlenbacher; Chirag J Patel; Aleksandar D Kostic
Journal: Cell Host Microbe Date: 2019-08-14 Impact factor: 21.023

2. Whole-genome sequencing of Tarim red deer (Cervus elaphus yarkandensis) reveals demographic history and adaptations to an arid-desert environment.

Authors: Buweihailiqiemu Ababaikeri; Shamshidin Abduriyim; Yilamujiang Tohetahong; Tayerjan Mamat; Adil Ahmat; Mahmut Halik
Journal: Front Zool Date: 2020-10-16 Impact factor: 3.172

3. Evaluation of introgressive hybridization among Cervidae in Japan's Kinki District via two novel genetic markers developed from public NGS data.

Authors: Yuki Matsumoto; Toshihito Takagi; Ryosuke Koda; Akira Tanave; Asuka Yamashiro; Hidetoshi B Tamate
Journal: Ecol Evol Date: 2019-04-26 Impact factor: 2.912