| Literature DB >> 35894178 |
Aryn P Wilder1, Olga Dudchenko2,3, Caitlin Curry1, Marisa Korody1, Sheela P Turbek1,4, Mark Daly5, Ann Misuraca1, Gaojianyong Wang6, Ruqayya Khan2, David Weisz2, Julie Fronczek1, Erez Lieberman Aiden2,3,7,8,9, Marlys L Houck1, Debra M Shier1,10, Oliver A Ryder1, Cynthia C Steiner1.
Abstract
High-quality reference genomes are fundamental tools for understanding population history, and can provide estimates of genetic and demographic parameters relevant to the conservation of biodiversity. The federally endangered Pacific pocket mouse (PPM), which persists in three small, isolated populations in southern California, is a promising model for studying how demographic history shapes genetic diversity, and how diversity in turn may influence extinction risk. To facilitate these studies in PPM, we combined PacBio HiFi long reads with Omni-C and Hi-C data to generate a de novo genome assembly, and annotated the genome using RNAseq. The assembly comprised 28 chromosome-length scaffolds (N50 = 72.6 MB) and the complete mitochondrial genome, and included a long heterochromatic region on chromosome 18 not represented in the previously available short-read assembly. Heterozygosity was highly variable across the genome of the reference individual, with 18% of windows falling in runs of homozygosity (ROH) >1 MB, and nearly 9% in tracts spanning >5 MB. Yet outside of ROH, heterozygosity was relatively high (0.0027), and historical Ne estimates were large. These patterns of genetic variation suggest recent inbreeding in a formerly large population. Currently the most contiguous assembly for a heteromyid rodent, this reference genome provides insight into the past and recent demographic history of the population, and will be a critical tool for management and future studies of outbreeding depression, inbreeding depression, and genetic load.Entities:
Keywords: HiFi; chromosome conformation capture; effective population size; mitochondrial genome; runs of homozygosity
Mesh:
Year: 2022 PMID: 35894178 PMCID: PMC9348616 DOI: 10.1093/gbe/evac122
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 4.065
Genome Assembly Statistics Comparing the Final Assembly (HiFi + Omni-C + Hi-C) to the Previous DISCOVAR Assembly
| Final Assembly | DISCOVAR Assembly | |
|---|---|---|
| Number of scaffolds | 6,180 | 2,409,818 |
| Total size of scaffolds | 2,212,099,196 | 2,601,695,796 |
| Longest scaffold | 163,161,067 | 625,731 |
| N50 scaffold length | 72,679,016 | 24,714 |
| L50 scaffold count | 11 | 23,202 |
| N50 contig length | 7,389,774 | 17,686 |
| L50 contig count | 73 | 34,664 |
| GC content | 41.98% | 41.84% |
| Complete BUSCOs | 246 (96.4%) | 183 (71.7%) |
| Complete & Single-Copy BUSCOs | 239 (93.7%) | 175 (68.6%) |
| Complete & Duplicated BUSCOs | 7 (2.7%) | 8 (3.1%) |
| Fragmented BUSCOs | 3 (1.2%) | 46 (18.0%) |
| Missing BUSCOs | 6 (2.4%) | 26 (10.3%) |
Fig. 1.Chromosome-length scaffolds of the PPM genome assembled from HiFi, Omni-C, and Hi-C data. (A) Repeat content in 500 KB windows. Warmer colors show higher repeat content. (B) Mapping depth of short reads from the same individual. (C) Heatmap showing, on the scale from white to red, the frequency of 3D contact between any two loci of the PPM genome as measured by Hi-C sequencing. The same Hi-C data were used for scaffolding the genome to chromosome-length. Twenty-eight squares along the diagonal represent chromosome territories. An interactive version of this figure is available at https://tinyurl.com/25vywofz (Robinson ). (D and E) Synteny between the PPM genome and two kangaroo rats, D. ordii (D) and D. spectabilis (E). Chromosome-length scaffolds of the PPM genome are colored, and scaffolds/contigs of the kangaroo rat genomes are black. Lines between the genomes show alignments >500 bp. Bars above the PPM scaffolds show the location of scaffolds >100 KB from the previous PPM genome assembly. The heterochromatic region of chromosome 18, absent from the larger scaffolds of the previous assembly, is highlighted.
Fig. 2.The distribution of heterozygosity across the PPM genome reflects recent inbreeding. (A) Mean heterozygosity in 500 KB windows (points), sliding mean heterozygosity (lines), and runs of homozygosity (ROH > 1 MB; bars at top). (B) Total lengths of ROH in each 1 MB bin (for 1 MB < ROH < 67 MB). Hatched lines and corresponding labels show the approximate number of generations since inbreeding for ROH of a given size range, assuming 100 MB tracts are inherited per meiosis. (C) N over time, showing population decline at the beginning of the last glacial period followed by stable N ∼ 50,000. Lighter lines are replicate PSMC runs, and the dark line shows the mean of all replicates.