| Literature DB >> 32251283 |
Xiao Liu1, Chao Li2, Min Chen3, Bo Liu2, Xiaojun Yan1, Junhao Ning3, Bin Ma4, Guilong Liu4, Zhaoshan Zhong5, Yanglei Jia1, Qiong Shi6, Chunde Wang7,8.
Abstract
The two subspecies of Atlantic bay scallop (Argopecten irradians), A. i. irradians and A. i. concentricus, are economically important aquacultural species in northern and southern China. Here, we performed the whole-genome sequencing, assembly, and gene annotation and produced draft genomes for both subspecies. In total, 253.17 and 272.97 gigabases (Gb) of raw reads were generated from Illumina Hiseq and PacBio platforms for A. i. irradians and A. i. concentricus, respectively. Draft genomes of 835.7 Mb and 874.82 Mb were assembled for the two subspecies, accounting for 83.9% and 89.79% of the estimated sizes of their corresponding genomes, respectively. The contig N50 and scaffold N50 were 78.54 kb and 1.53 Mb for the A. i. irradians genome, and those for the A. i. concentricus genome were 63.73 kb and 1.25 Mb. Moreover, 26,777 and 25,979 protein-coding genes were predicted for A. i. irradians and A. i. concentricus, respectively. These valuable genome assemblies lay a solid foundation for future theoretical studies and provide guidance for practical scallop breeding.Entities:
Mesh:
Year: 2020 PMID: 32251283 PMCID: PMC7090048 DOI: 10.1038/s41597-020-0441-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Pictures of the representative bay scallop in China. (a) The northern subspecies (A. i. irradians). (b) The southern subspecies (A. i. concentricus).
Summary of the genome assemblies and annotations for both subspecies.
| Genome assembly | ||
|---|---|---|
| Contig N50 size (kb) | 78.54 | 63.73 |
| Scaffold N50 size (Mb) | 1.53 | 1.25 |
| Estimated genome size (Mb) | 996.07 | 974.3 |
| Assembled genome size (Mb) | 835.7 | 874.82 |
| Genome coverage for Illumina reads (×) | 254.17 | 259.6 |
| Genome coverage for Pacbio reads (×) | 20.15 | 20.57 |
| The longest scaffold (bp) | 8,652,007 | 5,002,087 |
| Protein-coding gene number | 26,777 | 25,979 |
| Average transcript length (kb) | 11.86 | 12.17 |
| Average CDS length (bp) | 1,443.63 | 1,460.6 |
| Average intron length (bp) | 1,704.92 | 1,722.22 |
| Average exon length (bp) | 203.09 | 202.42 |
| Average exons per gene | 7.11 | 7.22 |
Prediction of repeat elements in the two genome assemblies of bay scallop.
| Type | Repeat Size (bp) | % of genome | ||
|---|---|---|---|---|
| TRF | 126,153,959 | 135,900,220 | 15.10 | 15.53 |
| RepeatMasker | 309,417,572 | 326,918,089 | 37.02 | 37.37 |
| RepeatProteinMask | 31,422,581 | 30,821,540 | 3.76 | 3.52 |
| Total | 389,681,429 | 412,788,948 | 46.63 | 47.19 |
Fig. 2Comparative genome analysis between the bay scallops and the other 19 species. (a) Orthologue clustering analysis of the protein-coding genes in the bay scallop genomes. The horizontal axis shows 19 species and the vertical axis shows the corresponding number of genes. Pink represents the number of single-copy gene families, yellow represents the number of multiple-copy gene families, dark yellow represents the number of unique gene families of the corresponding species, and green represents the number of other gene families not mentioned above. (b) Venn diagram showing the shared and unique gene families among the five compared species. The total number of each gene family in the unique or shared regions is indicated. Abbreviations of the species are as follow: Aic, A. i. concentricus; Aii, A. i. irradians; Aca, A. californica; Bfl, B. floridae; Bpl, B. platifrons; Cel, Caenorhabditis elegans; Cgi, C. gigas; Cte, C. teleta; Dme, D. melanogaster; Hdi, H. discus; Hro, H. robusta; Hsa, H. sapiens; Lan, Lingula anatine; Lgi, L. gigantean; Mph, M. philippinarum; Obi, O. bimaculoides; Pfu, P. fucata; Pye, P. yessoensis; Tca, T. castaneum.
Fig. 3Phylogenetic position of the sequenced species. The phylogenetic tree was constructed based on a dataset from 107 single-copy orthologues using the RAxML method. Clade support was assessed using the bootstrapping algorithm with 1,000 alignment replicates. (a) The phylogenetic tree was reconstructed using the RAxML method with LG + G + I + F model. The tree is drawn to scale, with branch lengths proportional to the number of amino acid substitutions. Bootstrap values are presented above the nodes. (b) Species divergence time was estimated using the MCMCTree function in the PAML with the parameter of ‘–model 0–rootage 1200 -clock 3’. Red nodes in the phylogenetic tree represented the reference divergence times, which were applied to calibrate the divergence dates of these examined species.
| Measurement(s) | DNA • genome • RNA • transcriptome • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing assay • RNA sequencing • sequence assembly process • sequence annotation |
| Sample Characteristic - Organism | Argopecten irradians |