| Literature DB >> 31656838 |
Felix D Guerrero1, Kylie G Bendele1, Noushin Ghaffari2, Joseph Guhlin3, Kristene R Gedye4, Kevin E Lawrence4, Peter K Dearden3, Thomas W R Harrop3, Allen C G Heath5, Yanni Lun2, Richard P Metz2, Pete Teel6, Adalberto Perez de Leon1, Patrick J Biggs4,7, William E Pomroy4, Charles D Johnson2, Philip D Blood8, Stanley E Bellgard9, Daniel M Tompkins10.
Abstract
The longhorned tick, Haemaphysalis longicornis, feeds upon a wide range of bird and mammalian hosts. Mammalian hosts include cattle, deer, sheep, goats, humans, and horses. This tick is known to transmit a number of pathogens causing tick-borne diseases, and was the vector of a recent serious outbreak of oriental theileriosis in New Zealand. A New Zealand-USA consortium was established to sequence, assemble, and annotate the genome of this tick, using ticks obtained from New Zealand's North Island. In New Zealand, the tick is considered exclusively parthenogenetic and this trait was deemed useful for genome assembly. Very high molecular weight genomic DNA was sequenced on the Illumina HiSeq4000 and the long-read Pac Bio Sequel platforms. Twenty-eight SMRT cells produced a total of 21.3 million reads which were assembled with Canu on a reserved supercomputer node with access to 12 TB of RAM, running continuously for over 24 days. The final assembly dataset consisted of 34,211 contigs with an average contig length of 215,205 bp. The quality of the annotated genome was assessed by BUSCO analysis, an approach that provides quantitative measures for the quality of an assembled genome. Over 95% of the BUSCO gene set was found in the assembled genome. Only 48 of the 1066 BUSCO genes were missing and only 9 were present in a fragmented condition. The raw sequencing reads and the assembled contigs/scaffolds are archived at the National Center for Biotechnology Information.Entities:
Keywords: Cattle tick; Genome annotation; Pac Bio de novo assembly; Tick genome
Year: 2019 PMID: 31656838 PMCID: PMC6806438 DOI: 10.1016/j.dib.2019.104602
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Statistics of the H. longicornis sequence reads.
| Total SMRT cells | 28 |
| Total Subreads | 21,309,718 |
| Overall Subread Mean length | 11,671 bp |
| Total bp | 248,705,718,800 |
| Genome coverage | 83 X |
| Subread N50 | 9141 bp |
| Maximum SMRT cell Mean Subread length | 13,705 bp |
| Minimum SMRT cell Mean Subread length | 8739 bp |
These are reads ultimately used in the genome assembly.
Based on estimated genome size of 3.0 Gb.
General features of the H. longicornis genome assembly.
| Contig/scaffold count | 34,211 |
| Mean contig/scaffold length | 215,205 bp |
| Contig N50 | 515,769 bp (n = 3395) |
| Contig N90 | 85,735 (n = 17595) |
| Largest contig | 8,678,875 bp |
| Total Length | 7,362,387,268 |
| GC Content | 47.5% |
| Total BUSCO groups searched | 1066 |
| Number of BUSCO complete and single copy (% of total) | 171 (16%) |
| Number of BUSCO complete and duplicated (% of total) | 841 (79%) |
| Number of BUSCO fragmented (% of total) | 8 (1%) |
| Number of BUSCO missing (% of total) | 46 (4%) |
Repeat Modeller Analysis of the H. longicornis genome assembly.
| Element ID | Number | Total Length (bp) | Percent of genome |
|---|---|---|---|
| SINEs | 567,935 | 139,600,732 | 1.9 |
| ALUs | 0 | 0 | 0 |
| MIRs | 0 | 0 | 0 |
| LINEs | 1,123,711 | 811,749,916 | 11.03 |
| LINE1 | 46,225 | 34,076,225 | 0.46 |
| LINE2 | 75,136 | 40,371,988 | 0.55 |
| L3/CR1 | 147,929 | 97,491,849 | 1.32 |
| LTR elements | 360,333 | 313,740,113 | 4.26 |
| ERVL | 0 | 0 | 0 |
| ERVL-MaLRs | 0 | 0 | 0 |
| ERV class I | 1469 | 67,439 | 0 |
| ERV class II | 4841 | 2,961,479 | 0.04 |
| DNA elements | 500,717 | 184,699,582 | 2.51 |
| hAT-Charlie | 22,723 | 6,351,868 | 0.09 |
| TcMar-Tigger | 64,096 | 21,409,145 | 0.29 |
| Unclassified | 6,206,150 | 1,961,583,750 | 26.64 |
| Total interspersed repeats | 8,758,846 | 3,411,374,093 | 46.34 |
| Small RNA | 505,644 | 114,045,117 | 1.55 |
| Satellites | 47,195 | 29,509,026 | 0.40 |
| Simple repeats | 1,629,716 | 133,804,303 | 1.82 |
| Low complexity | 86,281 | 4,409,140 | 0.06 |
Most repeats fragmented by insertions or deletions have been counted as one element.
Number of bases masked was 3,610,450,238 (49.04%).
Specifications Table
| Biology | |
| Genomics | |
| Assembled genome sequences and tables displaying sequencing, assembly, and repeats analysis statistics | |
| Long-read sequencing of very high molecular weight genomic DNA using Pacific Biosciences Sequel and Illumina HiSeq4000 | |
| Pacific Biosciences raw data in bam format, Illumina HiSeq4000 raw data in fastq format | |
| The expected large genome size of this tick necessitated the usage of long read sequencing technology and a genomic DNA isolation technique capable of purifying very high molecular weight DNA. The parthenogenetic nature of New Zealand populations of | |
| Eggs from New Zealand-collected | |
| Institution: | |
| Repository name: |
This assembled genome is the highest quality tick genome publicly available. Researchers studying arachnid and tick genomics, arachnid evolution, and comparative genomics will find the assembled genome valuable. The dataset can be used to study parthenogenesis-related genes, as this tick exclusively utilizes parthenogenetic reproduction in New Zealand. The developers of novel tick control technologies for this and other species of ticks will find this genome very useful. |