| Literature DB >> 31834364 |
João P Marques1,2,3, Fernando A Seixas1,2,3, Liliana Farelo1, Colin M Callahan4, Jeffrey M Good4,5, W Ian Montgomery6, Neil Reid6, Paulo C Alves1,2,5, Pierre Boursot3, José Melo-Ferreira1,2.
Abstract
Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares.Entities:
Keywords: Lagomorpha; Leporids; annotation; de novo assembly; hares; whole-genome sequencing
Mesh:
Year: 2020 PMID: 31834364 PMCID: PMC6951464 DOI: 10.1093/gbe/evz273
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary Statistics of the L. timidus Genome Assembly, Curation and Annotation
| Value | |
|---|---|
| Raw reads | 3,396,221,990 |
| Clean reads | 2,084,055,186 (61%) |
| Raw de novo assembly (ALL-PATHS) | |
| Number of scaffolds | 30,212 |
| Largest (bp) | 4,384,173 |
| Total length (bp) | 2,741,083,031 |
| Contig N50 (bp) | 11,800 |
| Scaffold N50 (bp) | 347,000 |
| GC content (%) | 43.54 |
| Post assembly curation (REAPR) | |
| Number of scaffolds | 70,707 |
| Largest (bp) | 1,921,307 |
| Total length (bp) | 2,662,248,649 |
| N50 (bp) | 156,456 |
| GC content (%) | 43.54 |
| Scaffolding (SSPACE >1 kb)— | |
| Number of scaffolds | 32,294 |
| Largest (bp) | 3,358,433 |
| Total length (bp) | 2,703,715,767 |
| N50 (bp) | 3,358,433 |
| GC content (%) | 43.54 |
| Reference-based scaffolder— | |
| Number of scaffolds | 4,205 (83% on the 22 scaffolds corresponding to the 22 rabbit chromosomes) |
| Largest (bp) | 194,083,885 |
| Total length (bp) | 2,703,257,129 |
| N50 (bp) | 117,222,600 |
| GC content (%) | 43.40 |
|
| |
| Haploid chromosome number | 22 (constrained by the rabbit reference) |
| Estimated genome length | 2.75 Gb |
| Sequencing coverage | 77x |
| Total assembly length | 2.70 Gb |
| Gaps | 18.8% ( |
| Repeats | 28.0% |
| BUSCO genes present (estimation of genome completeness) | 3,793 (92%) |
| Ab-initio gene prediction | 29,238 genes |
| Gene space (exons, introns, etc.) | 196.6 Mb (7.3% of assembly) |
| Gene length (median) | 5.8 kb |
| Gene fragmentation | 137,913 exons |
| Exon space | 26.6 Mb (1.0% of the assembly) |
| Exon length (median) | 193 bp |
| Homology-based gene annotation (details in | 24,578 proteins |
| Uniprot-Swissprot | 23,375 |
| | 18,678 |
| | 15,418 (683 hare transcriptome specific) |
| Functional annotation | |
| Pfam domains | 28,803 (5,068 unique) |
| Gene ontology (GO) | 14,530 (1,382 unique) |
| Interpro | 25,395 (4,740 unique) |
| KEGG | 748 (391 unique) |
| Reactome | 5,066 (1,050 unique) |
1.—PSMC inference of demographic profiles. (a) Lepus timidus demographic profile inferred using different reference genomes: L. timidus re-scaffolded genome, L. timidus assembled genome (prior to re-scaffolding), hare pseudo-reference genome, and the (European) rabbit reference genome. (b) PSMC inference of the demographic profiles of two European hare species—L. timidus and L. granatensis, using the L. timidus re-scaffolded genome (replicating the analysis by Seixas et al. 2018, which used a hare pseudo-reference). Each bold line represents a full run for each species and thin lines represent 50 randomly sampled fragments bootstrap. A rate of 2.8 × 10-9 substitutions per site per generation and a generation time of two years were assumed. Inflection points are denoted by the gray vertical bars, as in Seixas et al. (2018).