Literature DB >> 26823974

Draft genome of the Chinese mitten crab, Eriocheir sinensis.

Linsheng Song¹, Chao Bian², Yongju Luo³, Lingling Wang⁴, Xinxin You², Jia Li², Ying Qiu², Xingyu Ma⁵, Zhifei Zhu⁵, Liang Ma⁶, Zhaogen Wang⁶, Ying Lei⁶, Jun Qiang⁷, Hongxia Li⁷, Juhua Yu⁷, Alex Wong⁸, Junmin Xu⁹, Qiong Shi⁹, Pao Xu⁷.

Abstract

BACKGROUND: The Chinese mitten crab, Eriocheir sinensis, is one of the most studied and economically important crustaceans in China. Its transition from a swimming to a crawling method of movement during early development, anadromous migration during growth, and catadromous migration during breeding have been attractive features for research. However, knowledge of the underlying molecular mechanisms that regulate these processes is still very limited.
FINDINGS: A total of 258.8 gigabases (Gb) of raw reads from whole-genome sequencing of the crab were generated by the Illumina HiSeq2000 platform. The final genome assembly (1.12 Gb), about 67.5 % of the estimated genome size (1.66 Gb), is composed of 17,553 scaffolds (>2 kb) with an N50 of 224 kb. We identified 14,436 genes using AUGUSTUS, of which 7,549 were shown to have significant supporting evidence using the GLEAN pipeline. This gene number is much greater than that of the horseshoe crab, and the annotation completeness, as evaluated by CEGMA, reached 66.9 %.
CONCLUSIONS: We report the first genome sequencing, assembly, and annotation of the Chinese mitten crab. The assembled draft genome will provide a valuable resource for the study of essential developmental processes and genetic determination of important traits of the Chinese mitten crab, and also for investigating crustacean evolution.

Entities: CellLine Chemical Disease Species

Keywords: Annotation; Assembly; Crab genome; Genomics

Mesh：

Year: 2016 PMID： 26823974 PMCID： PMC4730596 DOI： 10.1186/s13742-016-0112-y

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Data description

Genomic DNA was extracted from muscle tissue of a single female crab (Eriocheir sinensis; NCBI Taxonomy ID: 95602) after 3 generations of inbreeding that was obtained from a local farm in Panjin, Liaoning Province, China. We used the whole-genome shotgun sequencing strategy and constructed the subsequent short-insert libraries (170, 250, 500 and 800 bp) and long-insert libraries (2, 5, and 10 kb) using the standard protocol provided by Illumina (San Diego, USA). Paired-end sequencing was performed by the Illumina HiSeq 2000 system. In total, we generated 258.8 Gb of raw reads from all constructed libraries. We extracted clean reads of the short-insert libraries (500 or 800 bp) to estimate the crab genome size by k-mer frequency distribution analysis [1]. A k-mer is related to an artificial sequence division of K nucleotides iteratively from sequencing reads. We defined the k-mer length as 17 bp; thus, a L bp-long clean sequence would include (L-17 + 1) k-mers. The frequency of each k-mer can be calculated from the genome sequence reads. Typically, k-mer frequencies were plotted against the sequence depth gradient following a Poisson distribution in any given dataset. The genome size (G), can be deduced from the formula:where N is the total number of reads, and K_depth indicates the frequency that occurrs more often than other frequencies. In our calculations, N was 789,326,187 and K_depth was 40; therefore, the crab genome size was estimated to be 1.66 Gb. For whole-genome assembly, we employed Platanus [2] with optimized parameters (−k 27, −m 200) to construct contigs and original scaffolds. All reads were mapped onto contigs for scaffold building by utilizing the paired-end information. This paired-end information was subsequently applied to link contigs into scaffolds using a stepwise approach. Some intra-scaffold gaps were filled by local software using read-pairs in which one end uniquely mapped to a contig and the other end was located within a gap. Final genome assembly of the Chinese mitten crab is 1.12 Gb in total length, which is about 67.5 % of the estimated genome size. The contig N50 size (i.e., 50 % of the genome is in fragments of this length or longer) is 6.02 kb, and the scaffold (>2 kb) N50 is 224 kb. We constructed a de novo repeat library using RepeatModeller (Version 1.04, default parameter) and LTR_FINDER [3]. To identify known and de novo transposable elements (TEs), we employed RepeatMasker (Version 3.2.9) [4] against the Repbase TE library [5] (Version 14.04) and the de novo repeat library. In addition, we used RepeatProteinMask (Version 3.2.2) implemented in RepeatMasker to detect the TE-relevant proteins. We also predicted tandem repeats utilizing Tandem Repeat Finder [6, 7] (Version 4.04) with parameters set as “Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPerid = 2000”. Finally, we confirmed that the repeat sequences occupy approximately 50.4 % of the crab genome. Among them, the long interspersed elements, occupying 19.0 % of the crab genome, are the most predominant type of repeat sequences. Subsequently, we performed annotation analysis containing four major steps. (1) The homology-based gene prediction: We aligned Homo sapiens, Crassostrea gigas, Caenorhabditis elegans, Drosophila melanogaster and Daphnia pulex proteins (Ensembl release 75) to the crab genome using TblastN with an E-value ≤ 1E-5, and then made use of GeneWise2.2.0 [7] for precise spliced alignment and predicting gene structures. Short genes (<150 bp) and premature or frame-shifted genes were removed. (2) The ab initio prediction: Genome sequences of the crab were repeat-masked, and 1500 full-length, randomly selected genes from their homology gene sets were used to train the model parameters for AUGUSTUS2.5 [8]. We then utilized AUGUSTUS2.5 and GENSCAN1.0 [9] for de novo prediction on repeat-masked genome sequences. Short genes were discarded using the same filter threshold that was used for homology prediction. (3) Gene structure identification using transcriptome reads: We mapped the mixed RNA reads (from hepatopancreas tissue taken from four molting stages) reported in Huang’s study [10] on the crab genome using TopHat1.2 [11]. Subsequently, we sorted and merged the TopHat mapping results and then applied Cufflink [12] software to identify gene structures to assist gene annotation. (4) Gene set integration: All of the above gene sets were merged to form a comprehensive and non-redundant gene set using GLEAN [13]. We obtained a final gene set containing 7,549 genes (Table 1), which is more than the gene number (5,775) identified for horseshoe crab [14]. Meanwhile, the CEGMA [15] evaluation demonstrated the annotation completeness to be 66.9 % (166 of 248 core eukaryote genes were aligned).

Table 1

Summary of genome annotations

		Number	Average transcript length (bp)	Average coding sequence length (bp)	Average exons per gene	Average exon length (bp)	Average intron length (bp)
De novo	AUGUSTUS	14,436	10,104	1,195	4.97	240	2,245
De novo	Genescan	29,097	13,045	1,022	5.01	203	2,995
Homolog	H. sapiens	5,646	4,752	922	3.74	246	1,398
	C. gigas	9,470	3,067	641	2.69	238	1,432
	C. elegans	3,142	3,913	819	3.27	250	1,361
	D.melanogaster	4,369	6,178	981	4.31	227	1,571
D. pulex		14,183	2,887	628	2.48	252	1,521
Transcriptome		14,123	11,161	2,223	6.83	325	1,532
GLEAN		7,549	12,742	1,470	6.36	230	2,101

Summary of genome annotations In summary, we report the first genome sequencing, assembly, and annotation of the Chinese mitten crab. The draft genome will provide a valuable resource for studying essential developmental processes in the Chinese mitten crab, investigating crustacean evolution, and improving the molecular breeding of this economically important species.

Availability of supporting data

Supporting data are available in the GigaDB database [16], and the raw data were deposited in the PRJNA305216.

15 in total

1. GeneWise and Genomewise.

Authors: Ewan Birney; Michele Clamp; Richard Durbin
Journal: Genome Res Date: 2004-05 Impact factor: 9.043

Review 2. Repbase Update, a database of eukaryotic repetitive elements.

Authors: J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal: Cytogenet Genome Res Date: 2005 Impact factor: 1.636

3. Tandem repeats finder: a program to analyze DNA sequences.

Authors: G Benson
Journal: Nucleic Acids Res Date: 1999-01-15 Impact factor: 16.971

4. Prediction of complete gene structures in human genomic DNA.

Authors: C Burge; S Karlin
Journal: J Mol Biol Date: 1997-04-25 Impact factor: 5.469

5. The sequence and de novo assembly of the giant panda genome.

Authors: Ruiqiang Li; Wei Fan; Geng Tian; Hongmei Zhu; Lin He; Jing Cai; Quanfei Huang; Qingle Cai; Bo Li; Yinqi Bai; Zhihe Zhang; Yaping Zhang; Wen Wang; Jun Li; Fuwen Wei; Heng Li; Min Jian; Jianwen Li; Zhaolei Zhang; Rasmus Nielsen; Dawei Li; Wanjun Gu; Zhentao Yang; Zhaoling Xuan; Oliver A Ryder; Frederick Chi-Ching Leung; Yan Zhou; Jianjun Cao; Xiao Sun; Yonggui Fu; Xiaodong Fang; Xiaosen Guo; Bo Wang; Rong Hou; Fujun Shen; Bo Mu; Peixiang Ni; Runmao Lin; Wubin Qian; Guodong Wang; Chang Yu; Wenhui Nie; Jinhuan Wang; Zhigang Wu; Huiqing Liang; Jiumeng Min; Qi Wu; Shifeng Cheng; Jue Ruan; Mingwei Wang; Zhongbin Shi; Ming Wen; Binghang Liu; Xiaoli Ren; Huisong Zheng; Dong Dong; Kathleen Cook; Gao Shan; Hao Zhang; Carolin Kosiol; Xueying Xie; Zuhong Lu; Hancheng Zheng; Yingrui Li; Cynthia C Steiner; Tommy Tsan-Yuk Lam; Siyuan Lin; Qinghui Zhang; Guoqing Li; Jing Tian; Timing Gong; Hongde Liu; Dejin Zhang; Lin Fang; Chen Ye; Juanbin Zhang; Wenbo Hu; Anlong Xu; Yuanyuan Ren; Guojie Zhang; Michael W Bruford; Qibin Li; Lijia Ma; Yiran Guo; Na An; Yujie Hu; Yang Zheng; Yongyong Shi; Zhiqiang Li; Qing Liu; Yanling Chen; Jing Zhao; Ning Qu; Shancen Zhao; Feng Tian; Xiaoling Wang; Haiyin Wang; Lizhi Xu; Xiao Liu; Tomas Vinar; Yajun Wang; Tak-Wah Lam; Siu-Ming Yiu; Shiping Liu; Hemin Zhang; Desheng Li; Yan Huang; Xia Wang; Guohua Yang; Zhi Jiang; Junyi Wang; Nan Qin; Li Li; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Gane Ka-Shu Wong; Maynard Olson; Xiuqing Zhang; Songgang Li; Huanming Yang; Jian Wang; Jun Wang
Journal: Nature Date: 2009-12-13 Impact factor: 49.962

6. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908

7. AUGUSTUS: ab initio prediction of alternative transcripts.

Authors: Mario Stanke; Oliver Keller; Irfan Gunduz; Alec Hayes; Stephan Waack; Burkhard Morgenstern
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

8. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication.

Authors: Carlos W Nossa; Paul Havlak; Jia-Xing Yue; Jie Lv; Kimberly Y Vincent; H Jane Brockmann; Nicholas H Putnam
Journal: Gigascience Date: 2014-05-14 Impact factor: 6.524

9. TopHat: discovering splice junctions with RNA-Seq.

Authors: Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal: Bioinformatics Date: 2009-03-16 Impact factor: 6.937

10. Draft genome of the Chinese mitten crab, Eriocheir sinensis.

Authors: Linsheng Song; Chao Bian; Yongju Luo; Lingling Wang; Xinxin You; Jia Li; Ying Qiu; Xingyu Ma; Zhifei Zhu; Liang Ma; Zhaogen Wang; Ying Lei; Jun Qiang; Hongxia Li; Juhua Yu; Alex Wong; Junmin Xu; Qiong Shi; Pao Xu
Journal: Gigascience Date: 2016-01-28 Impact factor: 6.524

36 in total

1. Molt-dependent transcriptome analysis of claw muscles in Chinese mitten crab Eriocheir sinensis.

Authors: Zhihuan Tian; Chuanzhen Jiao
Journal: Genes Genomics Date: 2019-02-14 Impact factor: 1.839

2. A phylogenomic framework, evolutionary timeline and genomic resources for comparative studies of decapod crustaceans.

Authors: Joanna M Wolfe; Jesse W Breinholt; Keith A Crandall; Alan R Lemmon; Emily Moriarty Lemmon; Laura E Timm; Mark E Siddall; Heather D Bracken-Grissom
Journal: Proc Biol Sci Date: 2019-04-24 Impact factor: 5.349

Review 3. Investigating the genetic and epigenetic basis of big biological questions with the parthenogenetic marbled crayfish: A review and perspectives.

Authors: Gunter Vogt
Journal: J Biosci Date: 2018-03 Impact factor: 1.826

4. De novo assembly of genome and development of polymorphic microsatellite loci in the blue swimming crab (Portunus pelagicus) using RAD approach.

Authors: Qingyang Wu; Guidong Miao; Xincang Li; Wenhua Liu; Mhd Ikhwanuddin; Hongyu Ma
Journal: Mol Biol Rep Date: 2018-09-10 Impact factor: 2.316

5. Eusociality in snapping shrimps is associated with larger genomes and an accumulation of transposable elements.

Authors: Solomon T C Chak; Stephen E Harris; Kristin M Hultgren; Nicholas W Jeffery; Dustin R Rubenstein
Journal: Proc Natl Acad Sci U S A Date: 2021-06-15 Impact factor: 11.205

6. Gliocyte and synapse analyses in cerebral ganglia of the Chinese mitten crab, Eriocheir sinensis: ultrastructural study.

Authors: H Zhang; P Yu; S Zhong; T Ge; S Peng; Z Zhou; X Guo
Journal: Eur J Histochem Date: 2016-08-10 Impact factor: 3.188

7. A chromosome-level genome assembly of the Asian arowana, Scleropages formosus.

Authors: Jia Li; Chao Bian; Yinchang Hu; Xidong Mu; Xueyan Shen; Vydianathan Ravi; Inna S Kuznetsova; Ying Sun; Xinxin You; Ying Qiu; Xinhui Zhang; Hui Yu; Yu Huang; Pao Xu; Ruobo Gu; Junmin Xu; László Orbán; Byrappa Venkatesh; Qiong Shi
Journal: Sci Data Date: 2016-12-06 Impact factor: 6.444

Draft genome of the Chinese mitten crab, Eriocheir sinensis.

Data description

Availability of supporting data

1. GeneWise and Genomewise.

Review 2. Repbase Update, a database of eukaryotic repetitive elements.

3. Tandem repeats finder: a program to analyze DNA sequences.

4. Prediction of complete gene structures in human genomic DNA.

5. The sequence and de novo assembly of the giant panda genome.

6. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

7. AUGUSTUS: ab initio prediction of alternative transcripts.

8. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication.

9. TopHat: discovering splice junctions with RNA-Seq.

10. Draft genome of the Chinese mitten crab, Eriocheir sinensis.

1. Molt-dependent transcriptome analysis of claw muscles in Chinese mitten crab Eriocheir sinensis.

2. A phylogenomic framework, evolutionary timeline and genomic resources for comparative studies of decapod crustaceans.

Review 3. Investigating the genetic and epigenetic basis of big biological questions with the parthenogenetic marbled crayfish: A review and perspectives.

4. De novo assembly of genome and development of polymorphic microsatellite loci in the blue swimming crab (Portunus pelagicus) using RAD approach.

5. Eusociality in snapping shrimps is associated with larger genomes and an accumulation of transposable elements.

6. Gliocyte and synapse analyses in cerebral ganglia of the Chinese mitten crab, Eriocheir sinensis: ultrastructural study.

7. A chromosome-level genome assembly of the Asian arowana, Scleropages formosus.

Review 8. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology.

9. Similarities between decapod and insect neuropeptidomes.

10. Draft genome of the Chinese mitten crab, Eriocheir sinensis.