| Literature DB >> 30806641 |
Yong Yan Liao1,2, Peng Wei Xu3, Kit Yue Kwan1,2, Zhi Yun Ma3, Huai Yi Fang1,2, Jun Yang Xu3, Peng Liang Wang1,2, Shao Yu Yang1,2, Shang Bo Xie3, Shu Qing Xu1,2, Dan Qian3, Wei Feng Li1,2, Li Rong Bai1,2, Da Jie Zhou3, Yan Qiu Zhang1,2, Juan Lei1,2, Ke Liu1,2, Fan Li1,2, Jian Li1,2, Peng Zhu1,2, Yu Jun Wang1,2, Hai Ping Wu1,2, You Hou Xu1,2, Hu Huang1,2, Chi Zhang3, Jin Xia Liu1,2, Jun Feng Han1,2.
Abstract
Chinese horseshoe crabs (Tachypleus tridentatus), ancient marine arthropods dating back to the mid-Palaeozoic Era, have provided valuable resources for the detection of bacterial or fungal contamination. However, excessive exploitation for the amoebocyte lysate of Tachypleus has dramatically decreased the population of the Chinese horseshoe crabs. Thus, we present sequencing, assembly and annotation of T. tridentatus, with the hope of understanding the genomic feature of the living fossil and assisting scientists with the protection of this endangered species. The final genome contained a total size of 1.943 Gb, covering 90.23% of the estimated genome size. The transcriptome of three larval stages was constructed to investigate the candidate gene involved in the larval development and validate annotation. The completeness of the genome and gene models was estimated by BUSCO, reaching 96.2% and 95.4%, respectively. The synonymous substitution distribution of paralogues revealed that T. tridentatus had undergone two rounds of whole-genome duplication. All genomic and transcriptome data have been deposited in public databases, ready to be used by researchers working on horseshoe crabs.Entities:
Mesh:
Year: 2019 PMID: 30806641 PMCID: PMC6390705 DOI: 10.1038/sdata.2019.29
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1The morphology of T. tridentatus.
Sequencing libraries and data yields for whole-genome shotgun sequencing.
| Library type | Lane | platform | Read Length (bp) | Insert Size (bp) | Raw bases (Gb) | Clean bases (Gb) |
|---|---|---|---|---|---|---|
| Note: PE: paired-end, MP: mate pair. The raw reads were filtered using SOAPnuke. | ||||||
| PE150 | 1 | Hiseq4000 | 150 | 270 | 42.25 | 39.4 |
| PE125 | 1 | Hiseq2500 | 125 | 300 | 79.72 | 58.9 |
| PE125 | 1 | Hiseq2500 | 125 | 500 | 78.01 | 59.58 |
| PE125 | 1 | Hiseq2500 | 125 | 800 | 59.34 | 44.79 |
| MP49 | 3 | Hiseq4000 | 49 | 2000 | 50.04 | 18.3 |
| MP49 | 4 | Hiseq4000 | 49 | 5000 | 53.65 | 20.41 |
| MP49 | 2 | Hiseq4000 | 49 | 10000 | 61.28 | 14.29 |
| MP49 | 3 | Hiseq4000 | 49 | 20000 | 66.15 | 9.63 |
| MP49 | 3 | Hiseq4000 | 49 | 40000 | 63.77 | 8.91 |
| Total | 19 | 554.21 | 274.21 | |||
RNA-Seq data yields of three larval stages.
| Accession | Stage of larvae | Sample | Raw reads numbers (Mb) | Clean reads numbers (Mb) | Clean reads Q20 |
|---|---|---|---|---|---|
| Note: The three developmental stages of larva were collected according to Sekiguchi’ s definition, “pre-trilobite”, “trilobite” and “post-trilobite”. Two biological replicates for each stage. The raw reads were filtered using SOAPnuke. | |||||
| SRR7239295 | pre-trilobite | TaL-1-1 | 67.87 | 66.14 | 96.79 |
| SRR7239308 | pre-trilobite | TaL-1-2 | 76.92 | 66.13 | 96.58 |
| SRR7239307 | trilobite | TaL-2-1 | 76.92 | 65.18 | 97.01 |
| SRR7239306 | trilobite | TaL-2-2 | 79.18 | 65.64 | 96.81 |
| SRR7239305 | post-trilobite | TaL-3-1 | 67.87 | 65.24 | 97.12 |
| SRR7239304 | post-trilobite | TaL-3-2 | 67.87 | 65.24 | 98.92 |
| Total | 436.63 | 393.57 | |||
Figure 2K-mer distribution used for the estimation of genome size.
The distribution was determined with KMERFREQ_AR using a k-mer size of 17.
The statistic of T. tridentatus and L. polyphemus genome assembly.
| Contig | Scaffold | Contig | Scaffold | |||||
|---|---|---|---|---|---|---|---|---|
| Numbers | Sizes | Numbers | Sizes | Numbers | Sizes | Numbers | Sizes | |
| Note: The statistical result of the genome assembly, the contig length is the genome assembly don’t contain ‘N’base.1: PRJNA20489, Washington University (WashU) submit. | ||||||||
| Minimum length | 4 | 41 | 200 | 200 | ||||
| Maximum length | 1,165,240 | 18,230,544 | 133,547 | 5,191,289 | ||||
| Total Numbers | 736,826 | 671,877 | 469,510 | 286,793 | ||||
| N50 | 9,200 | 52,179 | 186 | 2,761,313 | 41,759 | 11,441 | 1,712 | 254,089 |
| Total Size (bp) | 1,912,885,564 | 1,942,936,674 | 1,705,786,612 | 1,828,271,751 | ||||
| >=1kp | 96,184 | 1,761,273,298 | 45,307 | 1,797,749,641 | 222,804 | 1,592,186,184 | 50,802 | 1,722,826,540 |
| >=2kb | 62,482 | 1,715,053,174 | 15,879 | 1,757,753,852 | 165,672 | 1,511,153,688 | 24,654 | 1,687,761,575 |
| >10Kb | 34,272 | 1,584,470,853 | 2,573 | 1,708,312,084 | 49,849 | 939,425,146 | 13,197 | 1,641,492,664 |
| >=100kb | 3,139 | 528,273,461 | 1,150 | 1,666,730,451 | 9 | 996,662 | 4,168 | 1,307,731,143 |
| >=1Mb | 3 | 3,275,645 | 446 | 1,399,495,467 | 0 | 0 | 147 | 215,133,244 |
| Gap sizes (bp) | 30,051,110 | 122,485,139 | ||||||
Figure 3The synonymous substitutions (Ks) distribution of the paralogues and orthologues of the Chinese and Atlantic horseshoe crab.
The ‘*’ indicates the peak derived from the whole-genome duplication event, whereas the ‘#’ indicates the peak derived from small-scale duplications events (SSDs, tandem duplications) in the Chinese horseshoe crab.