Literature DB >> 35804506

Genomic Survey and Microsatellite Marker Investigation of Patagonian Moray Cod (Muraenolepis orangiensis).

Eunkyung Choi1, Seung Jae Lee1, Euna Jo1,2, Jinmu Kim1, Steven J Parker3, Jeong-Hoon Kim2, Hyun Park1.   

Abstract

The Muraenolepididae family of fishes, known as eel cods, inhabits continental slopes and shelves in the Southern Hemisphere. This family belongs to the Gadiformes order, which constitutes one of the most important commercial fish resources worldwide, but the classification of the fish species in this order is ambiguous because it is only based on the morphological and habitat characteristics of the fishes. Here, the genome of Patagonian moray cod was sequenced using the Illumina HiSeq platform, and screened for microsatellite motifs. The genome was predicted to be 748.97 Mb, with a heterozygosity rate of 0.768%, via K-mer analysis (K = 25). The genome assembly showed that the total size of scaffolds was 711.92 Mb and the N50 scaffold length was 1522 bp. Additionally, 4,447,517 microsatellite motifs were identified from the genome survey assembly, and the most abundant motif type was found to be AC/GT. In summary, these data may facilitate the identification of molecular markers in Patagonian moray cod, which would be a good basis for further whole-genome sequencing with long read sequencing technology and chromosome conformation capture technology, as well as population genetics.

Entities:  

Keywords:  Illumina; Muraenolepis orangiensis; Patagonian moray cod; SSR; microsatellite

Year:  2022        PMID: 35804506      PMCID: PMC9265078          DOI: 10.3390/ani12131608

Source DB:  PubMed          Journal:  Animals (Basel)        ISSN: 2076-2615            Impact factor:   3.231


1. Introduction

The Muraenolepididae family of fishes, known as eel cods, belongs to the Gadiformes order and inhabits the cold–temperate waters of continental slopes and shelves in the Southern Hemisphere. They dwell near the bottom of the ocean, at the midwater depth, or near the surface. The family has the following morphological characteristics: the dorsal, anal, and caudal fins are all joined; the chin has a barbel; and there are no teeth on the head of some. The family consists of two genera—Notomuraenobathys (Balushkin and Pirodina, 2010) and Muraenolepis (Günther, 1880) [1,2]. The genus of Muraenolepis is regularly caught around South Georgia and the South Sandwich Islands [3], and has been reported to be preyed on by the Patagonian toothfish (Dissostichus eleginoides) [4]. Muraenolepis comprises seven species, and five of these species have been mostly described recently (Muraenolepis marmorata, Günther 1880; Patagonian moray cod, Vaillant 1888; Muraenolepis andriashevi, Balushkin & Prirodina 2005; Muraenolepis trunovi, Balushkin & Prirodina 2006; Muraenolepis kuderskii, Balushkin & Prirodina 2007; Muraenolepis pacifica, Prirodina & Balushkin 2007; and Muraenolepis evseenkoi, Balushkin & Prirodina 2010) [5,6,7,8,9,10]. Gadiformes fishes, such as cod and hake, are among the most important commercial fishes worldwide. Despite their commercial importance, the taxonomic classification of these fishes is still far from clear, because it is based on morphological data [11]. Likewise, Muraenolepis fishes have been classified based on their morphological and habitat characteristics; various views have been reported taxonomically, including Gadiformes [5,12]. To overcome the limitations of the morphology-based taxonomic classification of Muraenolepis fishes, a study has analyzed the mitochondrial 16S and COI sequences of these fish species, and thereby confirmed that Muraenolepis microps is a junior synonym of M. marmorata [13]. Morphological characteristics are taxonomically very important. However, there are species that are difficult to classify accurately based on morphological characteristics alone, so genome-level information is needed to supplement this. In particular, the applicability of microsatellites (also known as simple sequence repeats (SSRs) in differentiating between fish species has been validated [14], and the complete mitochondrial genome of a Muraenolepis fish species has been sequenced [15]. In this study, we assessed the genomic characteristics of Patagonian moray cod via a genomic survey based on next-generation sequencing (NGS) and then identified SSRs that can be used as markers for taxonomic classification. The data from this study may facilitate further genomic characterization of Patagonian moray cod.

2. Materials and Methods

2.1. Sample Preparation and Sequencing

A sample was collected from the Ross Sea (77°05′ S, 170°30′ E on CCAMLR Subarea 88.1), Antarctica, and then transferred to a freezer at −80 °C. The muscle tissue was dissected from the frozen sample for DNA extraction. The traditional phenol–chloroform method was conducted to obtain the DNA. The quality and quantity of the DNA were assessed using a fragment analyzer (Agilent Technologies, Santa Clara, CA, USA) and a Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA), respectively. Then, 1 µg DNA was sheared into 350 bp fragments, using a Covaris S2 system (Covaris, Woburn, MA, USA). DNA library preparation was performed using TruSeq DNA PCR-Free (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol. The library quality was assessed using a Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and then the Illumina Novaseq 6000 platform with 2 × 150 bp format (Illumina, San Diego, CA, USA) was used for sequencing.

2.2. Data Analysis for Genome Survey and Microsatellite Identification

The quality (Q) values (Q20 and Q30) were estimated from the primary data using seqtk version 1.3 (Available online: https://github.com/lh3/seqtk (accessed on 20 January 2020). All of the clean reads were used to conduct the K-mer analysis to estimate the genome size. Jellyfish version 2.1.4 was used for the K-mer analysis with K-values of 17, 19, and 25. Based on the K-mer data, GenomeScope was used to determine the genome size, repeat contents, and heterozygosity rate [16,17]. The draft de novo genome assembly was carried out using Maryland Super-Read Celera Assembler (MaSuRCA version 3.3.4) [18]. For the genome-wide microsatellite investigation, the QDD pipeline (version 3.1.2) [19] was used. The microsatellite repeats were investigated for their lengths and nucleotide repeats (from mononucleotide repeats to hexanucleotide repeats). The analysis was conducted in three steps, with the parameters of -contig 1, -make_cons 0, and -contig 1. The final output was used to select the best primer pairs for the microsatellite repeats, and a total of 83 primer pairs were selected according to the following parameters of the QDD pipeline (version 3.1.2): forward and reverse flanking regions between the SSR and the primer sequences ≥ 20 bp, a maximum primer alignment score of 5, the high-quality primer design defined by the QDD pipeline, and ≥7 motif repeats [20]. Among these primer pairs, 25 pairs were randomly selected for validation via PCR, performed in a Thermal Cycler Dice® Touch (Takara Bio, Shiga, Japan). Each PCR tube (20 µL volume) contained 5 µL genomic DNA (20 ng/µL), 1 µL (10 pmole/L) of each forward and reverse primer, 10 µL of 2 × EmeraldAmp PCR Master Mix (Takara Bio, Kusatsu, Japan), and 3 µL of double-distilled water. The PCR conditions were as follows: 2 min at 94 °C, followed by 35 cycles of 94 °C for 30 s, 60 °C for 30 s, and 70 °C for 1 min, and the last extension at 72 °C for 10 min. The PCR products were visualized via 4% agarose gel electrophoresis with a 20 bp DNA ladder (Takara Bio, Kusatsu, Japan).

3. Results and Discussion

3.1. Genome Size Estimation and Genome Assembly

A total of 54.14 Gb of raw data were generated using the paired-end library method in the Illumina NovaSeq platform. The Q20 and Q30 values were 93.3% and 87.2%, respectively (Table 1). Q20 and Q30 values ≥ 90% and ≥85%, respectively, indicate accurate sequencing [21]. Therefore, the accuracy of sequencing for the Patagonian moray cod was high. In addition, the GC content was 49.5% (Table 1). These data were used to estimate the genome size via K-mer analysis. Based on the 25-mer distribution, the predicted genome size was approximately 748.9 Mb, and the heterozygous rate was estimated at 0.768%. The duplication rate was 1.18%, and the highest frequency was nearly 40× coverage (Table 2 and Figure 1). MaSuRCA was used to perform a genome assembly (Table 3). We obtained 661,719 scaffolds, and the total size of the scaffolds was 711,920,928 bp. The longest scaffold size was 67,330 bp, and the number of scaffolds longer than 1 Kb was 211,863. The N50 scaffold length was 1522 bp, and the GC content was 45.7%. These genome survey results can serve as preliminary data for further whole-genome studies with advanced sequencing technologies using long read sequencing technology and chromosome conformation capture technology to achieve a more thorough assembly.
Table 1

Statistics of the genome-sequencing data.

Raw Data (bp)Q20 (%)Q30 (%)GC Content (%)
54,142,458,22693.387.249.5
Table 2

Genome size estimation via K-mer analysis.

Genome Size (bp)Heterozygosity (%)Duplication Ratio (%)
17-mer709,066,7080.821.3
19-mer723,179,5220.8321.23
25-mer748,978,6870.7681.18
Figure 1

K-mer analysis (K = 25).

Table 3

Statistics of the assembled genomic sequences.

MaSuRCA
Number of scaffolds661,719
Total size of scaffolds711,920,928
Longest scaffolds67,330
Number of scaffolds > 1K nt211,863
Number of scaffolds > 10K nt1199
N50 scaffold length1522
L50 scaffold count120,833
GC content (%)45.7
Len, estimated total genome length; Uniq, unique portion of the genome (not repetitive); Het, heterozygosity rate; Kcov, mean K-mer coverage for the heterozygous bases; Err, error rate; Dup, duplication rate. The blue bars and black line show the observed K-mer distribution and the modeled distribution without the error K-mers (indicated by the red line), respectively, up to a maximum K-mer coverage specified in the model (indicated by the yellow line).

3.2. Identification of Microsatellite Motifs

A total of 4,447,517 microsatellite motifs were identified from the genome survey assembly of Patagonian moray cod. The types of the motifs were dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide, with the fractions of 77.40% (3,430,720), 17.89% (793,122), 3.97% (175,791), and 0.74% (21,801), respectively (Table 4). The most frequent dinucleotide motif was found to be AC/GT (61.21%), followed by AG/CT (24.80%), AT/AT (13.83%), and CG/CG (0.16%). The most abundant trinucleotide motif was found to be AGG/CCT (29.04%), followed by AAT/ATT (18.99%), ACC/GGT (14.34%), AAC/GTT (9.62%), AAG/CTT (9.03%), ATC/GAT (6.49%), AGC/GCT (5.24%), ACT/AGT (3.89%), CCG/CGG (2.69%), and ACG/CGT (0.67%). Regarding the tetranucleotide and pentanucleotide motifs, the most frequent motifs were ACAG/CTGT (27.78%) and AGAGG/CCTCT (32.37%), respectively (Table 4). To compare the microsatellite motifs of Patagonian moray cod with other species, the microsatellite motif analysis results of Trematomus loennbergii and Pogonophryne albipinna were used [22,23]. When comparing the number of microsatellites, Patagonian moray cod had the most SSRs, and there was no significant difference in the number of SSRs between T. loennbergii and P. albipinna. When comparing the repeat type, it was confirmed that the ratio of trinucleotides was slightly higher in Patagonian moray cod than in other species. Additionally, unlike other species, Patagonian moray cod did not have a hexanucleotide type (Table S1). Based on the microsatellite motif analysis and the parameters of the QDD pipeline, 83 primer pairs were selected (Table S2). To ensure the usability of these microsatellite markers, 25 primer pairs were randomly selected and used for PCR with the genomic DNA of Patagonian moray cod. Consequently, 15 primer pairs were observed to yield only a single band (Figure 2).
Table 4

Distribution of the microsatellite motifs.

Repeat Motif Number of Repeats
567891011–20>20Total
Dinucleotide (3,430,720)
AC/GT329,788224,159178,326154,725144,969134,928710,095223,0512,100,041
AG/CT230,644141,95596,28767,78850,97939,120158,77165,210850,754
AT/AT119,67876,69551,22138,04231,33124,631122,87510,068474,541
CG/CG3286135942715614115005384
Trinucleotide (793,122)
AGG/CCT64,59841,23430,10421,65315,24410,40942,3224735230,299
AAT/ATT40,27326,13219,05114,81012,857999926,1601321150,603
ACC/GGT37,00928,55319,32212,185665942885583102113,701
AAC/GTT31,66018,56411,204568734761592370840476,295
AAG/CTT18,42712,484792860324720336011,275738871,614
ATC/GAT14,45110,7716815477135752427790479551,509
AGC/GCT16,94594395433367621411205243332441,596
ACT/AGT951060224239268621891570406656430,846
CCG/CGG10,871542820731244695401616021,328
ACG/CGT263715136402521235311305331
Tetranucleotide (175,791)
ACAG/CTGT12,24585745599347231552310769251043,557
AGGG/CCCT10,965669144202849217618281754030,683
ACGC/GCGT19521653885758607757376178611,159
AAAG/CTTT25321485113152740738025878029851
AAAC/GTTT36922538131986137125825409293
AAAT/ATTT3812190782247628598520898009
ACTC/GAGT1846149985345931982818342157853
ACAT/ATGT2236120792056537548114811717436
AAGG/CCTT247411446073041781366552425740
ATCC/GGAT26431129527344110128473775431
Others14,62476694304251717481238413254736,779
Pentanucleotide (32,801)
AGAGG/CCTCT209010727334676063652378337744
AATGT/ACATT9765543332258212902191
AATCT/AGATT10443922441837242901986
AACAC/GTGTT89735827014969154501803
AGGGG/CCCCT96841916111315 001676
AATAT/ATATT4313192011161296018001436
Others2932173573969426821250907089
Total 1,008,087 649,482 459,698 350,482 291,381 243,972 1,126,900 317,515 4,447,517
Figure 2

PCR products corresponding to the microsatellites. M is the 20 bp DNA marker and 4% agarose gel electrophoresis was used. The number is primer pairs order.

With the advance of NGS technology and bioinformatics tools, genome survey sequencing and K-mer analysis have developed rapidly, assisting with the prediction of genome size and characteristics in non-model species without basic genomic information. According to a flow cytometry-based approach, previous studies have estimated that the genome sizes of several species living in Antarctica were 0.7–1.4 Gb [24]. Recent studies using NGS technology estimated the genome sizes of Antarctic fishes at 0.6–1 Gb [25,26,27,28,29]. In this study, the genome size of Patagonian moray cod was 0.7 Gb and it was within the range of the previously reported genome sizes of Antarctic fishes. Therefore, the genome size of Patagonian moray cod is an acceptable result. For the genome assembly, if the heterozygosity rate is more than 0.5%, it is difficult to assemble. The heterozygosity rate of Patagonian moray cod was 0.7%. It might have been affected by the use of short scaffolds based on short read sequencing technology. In addition, the GC content is also one of the factors affecting sequence bias; if the GC content is more than 65% or less than 25%, sequence bias may influence the genome assembly quality [30,31,32]. The GC content of Patagonian moray cod was 45.7%, and this was within the range from 25% to 65%, which was a normal range. Therefore, the assembly result of this study might be of acceptable range, and the GC content of Patagonian moray cod would be not affected by genome sequence quality. This is the first report of a genome survey of Patagonian moray cod, and these data provide a preliminary understanding of the genome characteristics of Patagonian moray cod. However, a further study with long read sequencing technology and chromosomal-level scaffolding technology is needed to obtain more high-quality genome information for Patagonian moray cod. For identifying the microsatellite markers, using genome survey data has cost- and time-effective advantages compared with the traditional microsatellite marker development method. In this study, the most frequent motifs were dinucleotide repeat motifs, except for the mononucleotide repeat motif, and the less abundant dinucleotide repeat motifs were CG/CG. This result was comparable with other previous studies on microsatellite repeats using Danio rerio, Oreochromis latipes, and O. niloticus [33], and this may be because cytosine is methylated with thymidine [34]. The tendency of the overall motif frequency was quite similar to other studies with fishes [35,36]. This data may be useful in identifying the microsatellite markers of Patagonian moray cod. Moreover, if further studies using various Muraenolepis populations are conducted, more meaningful data for molecular marker development would be obtained.

4. Conclusions

In the present study, the genome of Patagonian moray cod was assembled, and the microsatellite motifs were characterized. Therefore, the genome size of Patagonian moray cod was estimated at 748.9 Mb, based on the K-mer analysis. In addition, SSR analysis yielded 4,447,517 SSRs, and the most abundant repeat motif was found to be a dinucleotide, with the most frequent dinucleotide motif of AC/GT. These genomic data may be useful for verifying taxonomical classifications, and it would be basic information for developing novel molecular markers in various populations. However, further studies, such as a more complete genome assembly at the chromosomal level, and validation experiments using various Muraenolepis populations, would be needed.
  25 in total

1.  The MaSuRCA genome assembler.

Authors:  Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

2.  QDD version 3.1: a user-friendly computer program for microsatellite selection and primer design revisited: experimental validation of variables determining genotyping success rate.

Authors:  Emese Meglécz; Nicolas Pech; André Gilles; Vincent Dubut; Pascal Hingamp; Aurélie Trilles; Rémi Grenier; Jean-François Martin
Journal:  Mol Ecol Resour       Date:  2014-05-26       Impact factor: 7.090

3.  Genome enablement of the notothenioidei: genome size estimates from 11 species and BAC libraries from 2 representative taxa.

Authors:  H William Detrich; Andrew Stuart; Michael Schoenborn; Sandra K Parker; Barbara A Methé; Chris T Amemiya
Journal:  J Exp Zool B Mol Dev Evol       Date:  2010-07-15       Impact factor: 2.656

4.  Genome sequence and genetic diversity of European ash trees.

Authors:  Elizabeth S A Sollars; Andrea L Harper; Laura J Kelly; Christine M Sambles; Ricardo H Ramirez-Gonzalez; David Swarbreck; Gemy Kaithakottil; Endymion D Cooper; Cristobal Uauy; Lenka Havlickova; Gemma Worswick; David J Studholme; Jasmin Zohren; Deborah L Salmon; Bernardo J Clavijo; Yi Li; Zhesi He; Alison Fellgett; Lea Vig McKinney; Lene Rostgaard Nielsen; Gerry C Douglas; Erik Dahl Kjær; J Allan Downie; David Boshier; Steve Lee; Jo Clark; Murray Grant; Ian Bancroft; Mario Caccamo; Richard J A Buggs
Journal:  Nature       Date:  2016-12-26       Impact factor: 49.962

5.  Analysis of CpG suppression in methylated and nonmethylated species.

Authors:  D F Schorderet; S M Gartler
Journal:  Proc Natl Acad Sci U S A       Date:  1992-02-01       Impact factor: 11.205

6.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.

Authors:  Daniel Aird; Michael G Ross; Wei-Sheng Chen; Maxwell Danielsson; Timothy Fennell; Carsten Russ; David B Jaffe; Chad Nusbaum; Andreas Gnirke
Journal:  Genome Biol       Date:  2011-02-21       Impact factor: 13.583

7.  Genome survey and SSR analysis of Apocynum venetum.

Authors:  Guo-Qi Li; Li-Xiao Song; Chang-Qing Jin; Miao Li; Shi-Pei Gong; Ya-Fang Wang
Journal:  Biosci Rep       Date:  2019-06-25       Impact factor: 3.840

8.  The genomic basis for colonizing the freezing Southern Ocean revealed by Antarctic toothfish and Patagonian robalo genomes.

Authors:  Liangbiao Chen; Ying Lu; Wenhao Li; Yandong Ren; Mengchao Yu; Shouwen Jiang; Yanxia Fu; Jian Wang; Sihua Peng; Kevin T Bilyk; Katherine R Murphy; Xuan Zhuang; Mathias Hune; Wanying Zhai; Wen Wang; Qianghua Xu; Chi-Hing Christina Cheng
Journal:  Gigascience       Date:  2019-04-01       Impact factor: 6.524

9.  Chromosomal assembly of the Antarctic toothfish ( Dissostichus mawsoni) genome using third-generation DNA sequencing and Hi-C technology.

Authors:  Seung Jae Lee; Jeong-Hoon Kim; Euna Jo; Eunkyung Choi; Jinmu Kim; Seok-Gwan Choi; Sangdeok Chung; Hyun-Woo Kim; Hyun Park
Journal:  Zool Res       Date:  2020-12-01

10.  Whole genome survey analysis and microsatellite motif identification of Sebastiscus marmoratus.

Authors:  Sheng-Yong Xu; Na Song; Shi-Jun Xiao; Tian-Xiang Gao
Journal:  Biosci Rep       Date:  2020-02-28       Impact factor: 3.840

View more
  1 in total

1.  Whole-Genome Survey and Microsatellite Marker Detection of Antarctic Crocodile Icefish, Chionobathyscus dewitti.

Authors:  Jinmu Kim; Seung-Jae Lee; Euna Jo; Eunkyung Choi; Minjoo Cho; Soyun Choi; Jeong-Hoon Kim; Hyun Park
Journal:  Animals (Basel)       Date:  2022-09-28       Impact factor: 3.231

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.