Literature DB >> 36108314

Chromosome-level Genome of the Muskrat (Ondatra zibethicus).

Haimeng Li^1,2,3, Minhui Shi^1,3, Qing Wang^1,3, Tian Xia⁴, Sunil Kumar Sahu³, Yu Zhang⁵, Jiangang Wang³, Tianfeng Li⁵, Yue Ma², Tianlu Liu⁵, Huan Liu^2,3,6, Tianming Lan^2,3, Suying Bai⁵.

Abstract

The muskrat (Ondatra zibethicus) is a semi-aquatic rodent species with ecological, economic, and medicinal importance. Here, we present an improved genome assembly, which is the first high-quality chromosome-level genome of the muskrat with high completeness and contiguity assembled using single-tube long fragment read, BGISEQ, and Hi-C sequencing technologies. The genome size of the final assembly was 2.63 Gb with 27 pseudochromosomes. The length of scaffold N50 reached 80.25 Mb with a Benchmarking Universal Single-Copy Ortholog score of 91.3%. We identified a 66.98 Mb X chromosome and a 1.14-Mb Y-linked genome region, and these sex-linked regions were validated by resequencing 32 extra male individuals. We predicted 19,396 protein-coding genes, among which 19,395 (99.99%) were functionally annotated. The expanded gene families in the muskrat genome were found to be enriched in several organic synthesis- and metabolism-related Gene Ontology terms, suggesting the likely genomic basis for the production and secretion of musk. This chromosome-level genome represents a valuable resource for improving our understanding of muskrat ecology and musk secretion.

Entities: Chemical

Keywords: chromosome-level genome; musk; muskrat; sex chromosome

Mesh：

Year: 2022 PMID： 36108314 PMCID： PMC9539402 DOI： 10.1093/gbe/evac138

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 4.065

This is the first high-quality chromosome-level muskrat genome with high genome contiguity, completeness, and genome annotation, and it provides a useful genomic resource for genome-wide screening related to the genomic basis of musk production and secretion as well as the ecological management of the muskrat as an invasive species.

Introduction

The muskrat (Ondatra zibethicus; Rodentia: Cricetidae) is a medium-sized rodent that is the only species in Ondatra. It is also known as the water rat because it has adapted to live semi-aquatic lifestyle, inhabiting wetlands, ponds, coastal areas, lakes, river banks, and estuaries (Schuster et al. 2021). The muskrat is a highly adaptable species that is native to North America and Canada but has been introduced to Europe, Asia, South America, and Australia (Gintarė Skyrienė and Paulauskas 2012). In China, muskrats were first found in the Heilongjiang border region in the 1950s after they were introduced from the former Soviet Union (Zhang et al. 2020). Although O. zibethicus is considered an invasive species in Europe and Asia, including France, Germany, Poland, Russian, Mongolia, etc. and is thought to be harmful to local ecosystems (Gintarė Skyrienė and Paulauskas 2012), it is also known to have positive effects that help protect ecosystems. In wetland ecosystems, the muskrat is an influential herbivore that strongly affects aquatic vegetation, whereas muskrats are also prey for several carnivores (Ward et al. 2021). Therefore, O. zibethicus is an important ecohydrological indicator species (Ward et al. 2021), and increases and decreases in its population are closely related to the changes in floodplains (Ward et al. 2019). Despite the ecological significance of the species, the genomic background of O. zibethicus is poorly characterized; thus, obtaining the O. zibethicus genome will be important for elucidating the genetic mechanisms underlying the species’ distinct biological characteristics. The muskrat has economic and medicinal value related to its meat and fur but especially its musk (Liu et al. 2019). Male muskrats possess a pair of specialized scent glands between the skin and muscles near their tail that produce a yellowish substance similar to the musk secreted by musk deer (van Dorp et al. 1973; Li et al. 2017). Indeed, the common name of the muskrat is derived from its musk secretion (Cao et al. 2015). The components of muskrat musk are reportedly similar to those of musk deer musk, with the key components including l-muscone and some macrocyclic compounds, such as civetone, cyclododecanone, cyclopentadecanone, fatty acids, esters, and sterol compounds (van Dorp et al. 1973; Li and Song 1994; Lee et al. 2019). Musk deer musk is an essential component of Woohwangcheongsimwon, which is used to prevent and treat stroke, palpitations, hypertension, unconsciousness, and convulsions (Kim et al. 2008). However, the trade of musk deer musk is now prohibited according to the Convention on International Trade in Endangered Species of Wild Fauna and Flora (Lee et al. 2019). Muskrat musk is an ideal substitute for musk deer musk and would be easily obtained because muskrats are easy to manage and breed. Musk of muskrat can be used to treat stroke, swelling, and abscesses because it relieves pain, reduces inflammation, and activates blood (Lee et al. 2019). The scent gland of the muskrat exhibits seasonal changes that are closely related to its reproduction. From March to October, the volume of the scent gland increases substantially and a large amount of musk is secreted. However, the scent gland starts to atrophy and is replaced by adipose tissue from October; consequently, musk is not secreted from October to March the next year (Li et al. 2017). However, the genomic basis for the seasonal changes in the muskrat scent gland is not yet clarified. Here, we assembled the first chromosome-scale genome of the muskrat using single-tube long fragment read (stLFR; Wang et al. 2019) and Hi-C (Belton et al. 2012) technologies. Our assembly shows improved contiguity compared with that of a genome published previously (Zhou et al. 2020). In particular, we identified sex-linked genome regions, which may be closely related to the seasonal changes in muskrat reproductive activities. This improved chromosome-scale genome represents a valuable resource for improving our understanding of muskrat ecology and musk secretion.

Results and Discussion

Chromosome-level Genome Assembly

The estimated genome size of O. zibethicus was 2.69 Gb based on the frequency of 21-mer using short BGISEQ reads (supplementary fig. S1, Supplementary Material online). First, we generated a 2.71-Gb genome with a scaffold N50 of 5.07 Mb using 212.90 Gb of stLFR sequencing data (table 1). Subsequently, 542.59 Gb of Hi-C sequencing data was used for concatenating the primary scaffolds in a chromosome-level assembly. The final genome assembly was 2.63 Gb with 2.33 Gb genome regions assigned to 27 pseudochromosomes, which is consistent with a karyotypic study (2n = 54 (table 1, supplementary table S1 and fig. S2, Supplementary Material online; Pizzimenti 1971). We also identified a 66.98-Mb X chromosome and 1.14 Mb Y-linked regions in this genome by screening sex-linked genes across scaffolds (fig. 1). Both the X chromosome and Y-linked regions were validated using 32 additional male individuals with lower sequencing depth than that of the autosomes (fig. 1). The identification of sex-linked genomic regions provides a basic resource for exploring musk secretion in male muskrats. In total, 91.3% of 9,926 mammalian genes were complete in the muskrat genome according to Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis (supplementary table S2, Supplementary Material online). In addition, 99.66% and 98.97% of reads and BGISEQ short reads and Hi-C reads, respectively, could be mapped to the genome assembly in this study. Taken together, these findings indicate that our improved genome assembly of muskrat is high quality, contiguous, and complete at the chromosome level.

Table 1

Genome Assembly and Annotation Data Related to the Muskrat Genome Assembled in This Study

Item	Category	Number
Sequencing data	stLFR (Gb)	212.90
	WGS (Gb)	130.28
	Hi-C (Gb)	542.59
	Resequencing (32 individuals) (Gb)	1379.49
	RNA-seq (Gb)	105.29
Assembly (stLFR)	Estimated genome size (Gb)	2.69
	Assembled genome size (Gb)	2.71
	Karyotype	2n = 54
	Contig N50 (Kb)	56.15
	Longest scaffold (Mb)	34.52
Assembly (Hi-C)	Assembled genome size (Gb)	2.63
	Scaffold N50 (Mb)	80.25
	Longest scaffold (Mb)	196.46
Annotation	GC content (%)	37.8
	Repeat sequences (%)	34.32
	Number of protein-coding genes	19,396
	Number of functional annotated genes	19,395
	Average gene length (Kb)	31.92
	Average exon length (bp)	181.71
	Average intron length (Kb)	3.90
	Average exon per gene	8.78

Fig. 1.

Genome landscape of the muskrat genome, comparative genomics analysis, and enrichment analysis of expanded gene families. (A) Overview of the chromosome-scale genome of the muskrat. (1) The 27 chromosomes; (2) read depth mapped to the genome; (3) GC content; (4) repeat density; and (5) gene density. (B) Identification of Ylinked regions and the X pseudochromosome. The sequencing depth of the sex-linked genome regions is nearly half that of the autosomes. (C) Divergence time estimation and the inference of expanded/contracted gene families. Green and red numbers on each node represent the number of expanded and contracted gene families, respectively. (D) Significantly enriched GO terms in the muskrat genome.

Genome Annotation

In total, 904.51 Mb repetitive elements were identified in our assembled O. zibethicus genome, accounting for 34.37% of the final genome size (supplementary table S3, Supplementary Material online). Long interspersed nuclear elements were the most dominant repeat element (450.22 Mb), followed by LTRs (169.68 Mb), DNA (3.88 Mb), and short interspersed nuclear elements (126.24 Mb; supplementary tables S4 and S5, Supplementary Material online). All repetitive elements were masked for gene annotation. We predicted 19,396 protein-coding genes with high confidence by combining evidence from homology-based prediction, transcriptome alignment, and ab initio prediction (supplementary table S6, Supplementary Material online). The average gene length, exon length, and intron length were 31.92 kb, 181.71 bp, and 3.89 kb, respectively, which is consistent with other closely related animals (table 1, supplementary fig. S3, Supplementary Material online). BUSCO analysis showed that 90.0% and 1.3% of complete and fragmented BUSCO genes were identified, respectively, indicative of a high-quality gene set (supplementary table S2, Supplementary Material online). In at least one of the five databases used in this study (supplementary fig. S4, Supplementary Material online), 19,395 (99.99%) genes were functionally annotated (table 1, supplementary table S6, Supplementary Material online). Additionally, 775 miRNAs, 3685 tRNAs, 618 rRNAs, and 1559 snRNAs were predicted in the O. zibethicus genome (supplementary table S7, Supplementary Material online).

Phylogenetic Analysis and Gene Family Evolution

We performed a comparative genomic analysis between the muskrat and 11 other species and identified 6,182 single-copy genes shared by these species (supplementary table S8, Supplementary Material online). A phylogenetic tree was constructed using these genes, with divergence times calculated between each pair of species. We found that the muskrat is sister to a clade of the common ancestor of Microtus ochrogaster and Arvicola amphibius with a divergence time of 10.8 Ma, which is much later than the divergence time between the muskrat and mouse (fig. 1). Through a comparison with the common ancestor of the muskrat and mouse, we identified 147 expanded gene families, including 1,191 genes, in the muskrat genome (fig. 1). We performed Gene Ontology (GO) enrichment analysis of these expanded gene families, which showed that they were significantly enriched in 158 GO terms, especially those related to organic synthesis and metabolism, including peptide biosynthetic process (GO:0043043; P = 9.24E − 144), peptide metabolic process (GO:0006518; P = 9.62E − 126), macromolecule biosynthetic process (GO:0009059; P = 1.79E − 96), macromolecule metabolic process (GO:0043170; P = 8.55E − 46), organic substance biosynthetic process (GO:1901576; P = 1.74E − 73), organic substance metabolic process (GO:0071704; P = 5.49E − 30), aromatic compound biosynthetic process (GO:0019438; P = 1.47E − 17), heterocycle biosynthetic process (GO:0018130; P = 4.50E − 17), organic cyclic compound biosynthetic process (GO:1901362; P = 2.10E − 16), and organic cyclic compound metabolic process (GO:1901360; P = 1.11E − 17). These significantly enriched GO terms may represent the genomic basis for musk secretion in muskrats. In Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, we found 85 significantly enriched pathways, including one reproduction-related pathway, four immune-related pathways, and other pathways related to biological characteristics (supplementary table S9, Supplementary Material online).

Materials and Methods

Sample Collection

A male O. zibethicus individual was collected from Heilongjiang Harbin Xinke Farm, China for genome assembly. Lung, kidney, muscle, heart, prostate, and scent gland samples were collected from this individual for RNA sequencing. The muscle sample was selected for stLFR sequencing. The liver sample was selected for Hi-C sequencing. We also collected muscle samples from 32 male muskrats from Heilongjiang Harbin Xinke Farm, China for whole-genome resequencing. Sample collection and the related experiments were approved by the Institutional Review Board of BGI (BGI-IRB E21011). All procedures were conducted according to the guidelines of BGI-IRB.

DNA and RNA Isolation, Library Preparation, and Genome Sequencing

We isolated high-molecular-weight DNA according to the protocol described by Wang et al. (2019), and one stLFR co-barcoding DNA library was constructed using an MGIEasy stLFR Library Prep Kit (MGI, China). The libraries were then sequenced using a BGISEQ-500 sequencer. TRlzol reagent (Invitrogen, USA) was used for total RNA extraction according to the manufacturer's instructions. RNA integrity, purity, and quantity were evaluated using a Qubit 3.0 Fluorometer (Life Technologies, USA) and an Agilent 2100 Bioanalyzer System (Agilent, USA). cDNA libraries were reverse-transcribed using 200–400 bp RNA fragments. Total genomic DNA was extracted using a DNeasy Blood and Tissue Kit (Qiagen, USA). The restriction endonuclease MboI was used for Hi-C library preparation, and these libraries were subjected to paired-end sequencing via a BGISEQ-500 sequencer (MGI).

Genome Survey

Jellyfish (v 2.2.6; Marcais and Kingsford 2012) was used to calculate the occurrence of k-mers with short reads prior to genome assembly. In total, 173,452,189,911 k-mers (K = 21) were identified, and the peak k-mer depth was 42 (supplementary fig. S1, Supplementary Material online). Results from Jellyfish were inputted into GCE (v1.0.2) to estimate genome size, repeat content, and heterozygosity (Liu et al. 2013).

Genome Assembly and Assessment

Supernova (v2.1.1; Weisenfeld et al. 2017) was used with its default parameters and stLFR sequencing data to assemble the primary genome. GapCloser (v1.12-r6) and redundans (v0.14a) were used for gap filling and redundancy removal, respectively. Burrows–Wheeler Aligner (BWA, v0.7.17) was used with its with default parameters for mapping Hi-C reads to the initial genome assembly (Li 2013; Pryszcz and Gabaldón 2016; Weisenfeld et al. 2017). Hi-C data quality control was performed via Juicer (v1.5.7; Durand et al. 2016), and 3d-DNA pipeline (v180922) was used to assign contigs at the chromosome level (Durand et al. 2016). The completeness of the gene set and genome were evaluated using BUSCO (v5.2.2; Simão et al. 2015) analysis with the mammalia_odb10 database. Finally, the BGISEQ short reads and Hi-C reads were mapped to our assembled genome using BWA mem with its default parameters to calculate the mapping rate. First, we used long terminal repeat finder (v1.0.6; Xu and Wang 2007), MITE-hunter (v4.07; Han and Wessler 2010), and RepeatModeler2 (v2.0.1; Flynn et al. 2020) to identify de novo repeat motifs. These repeats were then added to the RepBase in RepeatMasker (v4.1.1; Chen, 2004) as known elements for the identification of transposable elements. Next, we used de novo–, RNA-seq–, and homology-based methods to predict protein-coding genes. The repeat-masked genome was used for de novo gene prediction via GlimmerHMM (v3.0.1), Augustus (v3.0.3), and SNAP (v11/29/2013) (Korf 2004; Majoros et al. 2004; Stanke et al. 2004). The protein sequences of M. ochrogaster, Homo sapiens, Rattus norvegicus, Mus musculus, and Canis lupus familiaris were used for homology-based gene prediction. The final nonredundant gene set was generated using the MAKER pipeline (v3.01.03) (Campbell et al. 2014) by combining homology, de novo, and RNA-seq supported genes. The completeness of the gene set was evaluated via BUSCO analysis with the mammalia_odb10 database.

Phylogenetic and Gene Family Analysis

Homologous genes were identified by performing all-to-all BLASTP with proteins from each of the 12 species using the parameter “-evalue 1e-5.” The identified single-copy genes were then used to construct the phylogenetic tree according to the following procedures: (1) multiple amino acid sequence alignments were performed for a single-copy gene orthogroup using MAFFT (Katoh and Standley 2013; v.7.310); (2) PAL2NAL (v14; Suyama et al. 2006) was used to convert the aligned amino acid sequences to DNA sequence alignments; (3) gaps were removed using trimal (v1.4.1; Capella-Gutierrez et al. 2009); (4) a best-fit substitution model was calculated using ModelFinder (Kalyaanamoorthy et al. 2017); and (5) concatenated super genes were used to construct a maximum-likelihood phylogenetic tree via IQTREE (v1.6.12; Nguyen et al. 2015). Gene families were then identified using Treefam (v1.4; Li et al. 2006). Expanded and contracted gene families were detected using CAFÉ (v4.2.1; De Bie et al. 2006). KEGG and GO enrichment analyses were performed on the expanded gene families with all annotated genes used as the background, and Fisher's exact test was used to improve the accuracy of the conducted χ2 test. Finally, the Benjamini–Hochberg method (Peng et al. 2017) was used to generate adjusted P-values.

Whole-Genome Sequence Alignment

Whole-genome resequencing data from 32 individuals were mapped to our assembled genome using the BWA mem method with its default parameters (Li 2013). The mapping rate and sequencing depth were calculated using SAMtools (v0.1.19; Li et al. 2009) and BamDeal (v0.24; https://github.com/BGI-shenzhen/BamDeal), respectively. Click here for additional data file.

35 in total

1. AUGUSTUS: a web server for gene finding in eukaryotes.

Authors: Mario Stanke; Rasmus Steinkamp; Stephan Waack; Burkhard Morgenstern
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

2. Multiple confidence intervals for selected parameters adjusted for the false coverage rate in monotone dose-response microarray experiments.

Authors: Jianan Peng; Wei Liu; Frank Bretz; Ziv Shkedy
Journal: Biom J Date: 2016-12-26 Impact factor: 2.207

3. LIst of karyotypes of mammals from the northern plains region.

Authors: J J Pizzimenti
Journal: Trans Kans Acad Sci Date: 1971

4. Genome Annotation and Curation Using MAKER and MAKER-P.

Authors: Michael S Campbell; Carson Holt; Barry Moore; Mark Yandell
Journal: Curr Protoc Bioinformatics Date: 2014-12-12

5. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.

Authors: Yujun Han; Susan R Wessler
Journal: Nucleic Acids Res Date: 2010-09-29 Impact factor: 16.971

6. TreeFam: a curated database of phylogenetic trees of animal gene families.

Authors: Heng Li; Avril Coghlan; Jue Ruan; Lachlan James Coin; Jean-Karim Hériché; Lara Osmotherly; Ruiqiang Li; Tao Liu; Zhang Zhang; Lars Bolund; Gane Ka-Shu Wong; Weimou Zheng; Paramvir Dehal; Jun Wang; Richard Durbin
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.

Authors: Lam-Tung Nguyen; Heiko A Schmidt; Arndt von Haeseler; Bui Quang Minh
Journal: Mol Biol Evol Date: 2014-11-03 Impact factor: 16.240

8. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.

Authors: Ou Wang; Robert Chin; Xiaofang Cheng; Michelle Ka Yan Wu; Qing Mao; Jingbo Tang; Yuhui Sun; Radoje Drmanac; Brock A Peters; Ellis Anderson; Han K Lam; Dan Chen; Yujun Zhou; Linying Wang; Fei Fan; Yan Zou; Yinlong Xie; Rebecca Yu Zhang; Snezana Drmanac; Darlene Nguyen; Chongjun Xu; Christian Villarosa; Scott Gablenz; Nina Barua; Staci Nguyen; Wenlan Tian; Jia Sophie Liu; Jingwan Wang; Xiao Liu; Xiaojuan Qi; Ao Chen; He Wang; Yuliang Dong; Wenwei Zhang; Andrei Alexeev; Huanming Yang; Jian Wang; Karsten Kristiansen; Xun Xu
Journal: Genome Res Date: 2019-04-02 Impact factor: 9.043

9. Evolutionary status of the invasive muskrat Ondatra zibethicus revealed by complete mitochondrial genome.

Authors: Lina Zhang; Han Zhang; Yan Hua
Journal: Mitochondrial DNA B Resour Date: 2020-02-03 Impact factor: 0.658

10. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons.

Authors: Zhao Xu; Hao Wang
Journal: Nucleic Acids Res Date: 2007-05-07 Impact factor: 16.971