| Literature DB >> 31405144 |
Ronghua Li1,2,3, Michaël Bekaert2, Luning Wu1, Changkao Mu1,3, Weiwei Song1,3, Herve Migaud4,5,6, Chunlin Wang7,8.
Abstract
The marine gastropod Hemifusus tuba is served as a luxury food in Asian countries and used in traditional Chinese medicine to treat lumbago and deafness. The lack of genomic data on H. tuba is a barrier to aquaculture development and functional characteristics of potential bioactive molecules are poorly understood. In the present study, we used high-throughput sequencing technologies to generate the first transcriptomic database of H. tuba. A total of 41 unique conopeptides were retrieved from 44 unigenes, containing 6-cysteine frameworks belonging to four superfamilies. Duplication of mature regions and alternative splicing were also found in some of the conopeptides, and the de novo assembly identified a total of 76,306 transcripts with an average length of 824.6 nt, of which including 75,620 (99.1%) were annotated. In addition, simple sequence repeats (SSRs) detection identified 14,000 unigenes containing 20,735 SSRs, among which, 23 polymorphic SSRs were screened. Thirteen of these markers could be amplified in Hemifusus ternatanus and seven in Rapana venosa. This study provides reports of conopeptide genes in Buccinidae for the first time as well as genomic resources for further drug development, gene discovery and population resource studies of this species.Entities:
Keywords: Hemifusus tuba; conotoxin; simple sequence repeats; transcriptome
Mesh:
Substances:
Year: 2019 PMID: 31405144 PMCID: PMC6722550 DOI: 10.3390/md17080466
Source DB: PubMed Journal: Mar Drugs ISSN: 1660-3397 Impact factor: 5.118
Summary statistics of sequencing and assembly of H. tuba transcriptome.
| Category | Number/Length |
|---|---|
| Total number of raw PE reads | 33,546,714 |
| Maximum read length (nt) | 90 |
| Pre-process PE reads | 22,892,498 |
| Cleaned PE reads | 21,397,329 |
| Clean bases | 1.9 Gb |
| Transcripts generated (raw) | 329,633 |
| Percentage of read assembled | 82.9% |
| Transcripts (filtered) | 76,306 |
| Percentage of read assembled | 54.5% |
| GC content | 52.9% |
| Maximum transcripts length | 17,498 |
| Minimum transcripts length | 300 |
| Transcripts > 500 bp | 44,171 |
| Transcripts > 1 kb | 17,188 |
| Transcripts > 10 kb | 56 |
| N50 length (bp) | 1014 |
| Mean length (bp) | 824.6 |
| Unigenes | 61,575 |
| N50 length (bp) | 865 * |
| Mean length (bp) | 744.2 * |
* based on the longest transcript for each unigene.
Figure 1H. tuba transcript assessments. (A) Length distribution of the assembled H. tuba transcript. Clean reads for H. tuba were assembled and resulted in 76,306 transcripts. (B) BUSCO assessment (Metazoa database; number of BUSCO, 978).
Summary of annotation results for H. tuba unigenes using a range of databases.
| Database | Number annotated |
|---|---|
| PfamA | 60,116 |
| InterPro * | 38,711 |
| SwissProt | 41,468 |
| KEGG | 64,235 |
| GO | 42,819 |
| All | 26,388 |
| Total | 75,620 |
* InterPro covers 12 databases (CATH-Gene3D, CDD, HAMAP, PANTHER, PIRSF, PRINTS, ProDom, PROSITE (patterns and profiles), SFLD, SMART, SUPERFAMILY, TIGRFAMs).
Figure 2A five-way Venn diagram. The figure shows the unique and overlapped transcripts showing predicted protein sequence similarity with one or more databases (details in Table 2).
Figure 3Level 2 GO annotations using the gene ontology (GO) of assembled transcripts.
Figure 4Conopeptides summary. (A) Conopeptide tree based on the alignment of the 73 peptides classified. The Cys framework is reported on the outer section of the tree. Every conopeptide in a multi-domain protein was included in the analysis. Domains aligned with respective domain of duplicated genes. (B) Structure of conotoxin proteins. Types 1, 2 and 3 were complete transcripts, while 4 to 11 were truncated transcripts or incompletely characterised proteins.
Summary of the cysteine framework distribution for the conopeptide and unique conopeptide sequences (details in Supplementary Table S2).
| Cysteine Framework | Conopeptide | Unique Conopeptide |
|---|---|---|
|
| 9 | 5 |
| NoCys | 2 | 1 |
| I or XXIV | 1 | 1 |
| VIII | 7 | 5 |
| XIV | 3 | 1 |
| XXII | 3 | 1 |
| IX | 48 | 27 |
Distribution of the perfect SSR motifs in the H. tuba transcriptome.
| SSR Type | SSR Number | Unigenes Number | Occurrence (%) | Total (%) |
|---|---|---|---|---|
| Di-nucleotide | 6957 | 5167 | 11.3 | 33.6 |
| Tri-nucleotide | 11,654 | 8418 | 19.0 | 56.2 |
| Tetra-nucleotide | 1812 | 1358 | 3.0 | 8.7 |
| Penta-nucleotide | 278 | 232 | 0.5 | 1.3 |
| Hexa-nucleotide | 16 | 15 | <0.1% | 0.1 |
| Total | 20,735 | 14,000 | 33.7 | 100.0 |
Characterisation of successful cross-species amplification of microsatellite loci in two different whelk species, H. ternatanus (n = 16) and R. venosa (n = 20). * loci present in the three species.
| Species | Locus | Size Range (bp) | NA | HO | HE |
|---|---|---|---|---|---|
|
| HT4 | 211-219 | 4 | 1.000 | 0.736 |
| HT10 | 209-218 | 4 | 1.000 | 0.690 | |
| HT20 | 179-189 | 6 | 1.000 | 0.762 | |
| HT22 | 138-148 | 6 | 1.000 | 0.782 | |
| HT24 | 212-216 | 3 | 0.250 | 0.232 | |
| HT25 * | 168-180 | 7 | 1.000 | 0.867 | |
| HT27 | 123-137 | 2 | 0.563 | 0.466 | |
| HT28 * | 122-128 | 4 | 1.000 | 0.651 | |
| HT29 | 132-152 | 10 | 1.000 | 0.891 | |
| HT32 | 249-259 | 5 | 0.875 | 0.718 | |
| HT35 | 155-159 | 3 | 0.688 | 0.599 | |
| HT36 * | 249-261 | 6 | 1.000 | 0.835 | |
| HT39 | 141-147 | 4 | 1.000 | 0.736 | |
|
| HT15 | 126-136 | 6 | 1.000 | 0.794 |
| HT23 | 254-262 | 5 | 0.950 | 0.676 | |
| HT25 * | 168-182 | 8 | 1.000 | 0.876 | |
| HT28 * | 120-124 | 3 | 1.000 | 0.559 | |
| HT31 | 117-125 | 5 | 1.000 | 0.788 | |
| HT36 * | 245-251 | 4 | 1.000 | 0.740 | |
| HT37 | 280-290 | 6 | 1.000 | 0.781 |
NA, observed number of alleles; HO, observed heterozygosity; HE, expected heterozygosity.