| Literature DB >> 32152293 |
Ajit Kumar Patra1, Oksung Chung2, Ji Yong Yoo3, Min Seop Kim3, Moon Geun Yoon3, Jeong-Hyeon Choi3, Youngik Yang4.
Abstract
Crustacean amphipods are important trophic links between primary producers and higher consumers. Although most amphipods occur in or around aquatic environments, the family Talitridae is the only family found in terrestrial and semi-terrestrial habitats. The sand-hopper Trinorchestia longiramus is a talitrid species often found in the sandy beaches of South Korea. In this study, we present the first draft genome assembly and annotation of this species. We generated ~380.3 Gb of sequencing data assembled in a 0.89 Gb draft genome. Annotation analysis estimated 26,080 protein-coding genes, with 89.9% genome completeness. Comparison with other amphipods showed that T. longiramus has 327 unique orthologous gene clusters, many of which are expanded gene families responsible for cellular transport of toxic substances, homeostatic processes, and ionic and osmotic stress tolerance. This first talitrid genome will be useful for further understanding the mechanisms of adaptation in terrestrial environments, the effects of heavy metal toxicity, as well as for studies of comparative genomic variation across amphipods.Entities:
Mesh:
Year: 2020 PMID: 32152293 PMCID: PMC7062882 DOI: 10.1038/s41597-020-0424-8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Sequence libraries and data yield from Illumina DNA and RNA sequencing.
| Library type | Insert Size (bp) | Read Length (bp) | Raw bases (Gb) | Raw reads | SRA accessions | |
|---|---|---|---|---|---|---|
| DNA | Paired-end (PE) | 350 | 251 | 37.616 | 149,863,175 | SRR9098167 |
| 350 | 251 | 37.616 | 149,863,175 | SRR9098167 | ||
| 350 | 251 | 36.788 | 146,564,297 | SRR9098168 | ||
| 350 | 251 | 36.788 | 146,564,297 | SRR9098168 | ||
| Total | 148.808 | 592,854,944 | ||||
| Mate-pair (MP) | 3 K | 101 | 28.942 | 286,552,798 | SRR9098169 | |
| 3 K | 101 | 28.942 | 286,552,798 | SRR9098169 | ||
| 5 K | 101 | 29.710 | 294,156,030 | SRR9098170 | ||
| 5 K | 101 | 29.710 | 294,156,030 | SRR9098170 | ||
| 8 K | 101 | 27.904 | 276,279,897 | SRR9098171 | ||
| 8 K | 101 | 27.904 | 276,279,897 | SRR9098171 | ||
| 10 K | 101 | 29.173 | 288,841,613 | SRR9098172 | ||
| 10 K | 101 | 29.173 | 288,841,613 | SRR9098172 | ||
| Total | 231.458 | 2,291,660,676 | ||||
| RNA | PE | 140 | 101 | 6.204 | 61,429,733 | SRR9112990 |
| 140 | 101 | 6.204 | 61,429,733 | SRR9112990 | ||
| Total | 12.408 | 122,859,466 |
Sequencing libraries and data yields from PacBio RNA sequencing.
| Library size (Kb) | Average read Length (bp) | Raw bases (Gb) | Raw reads | Polished high-quality isoforms | SRA accession |
|---|---|---|---|---|---|
| 1–2 | 1,238 | 0.027 | 21,522 | 72,517 | SRR9112991 |
| 2,070 | 0.219 | 105,671 | |||
| 2–3 | 2,209 | 0.070 | 31,546 | ||
| 2,522 | 0.251 | 99,339 | |||
| 3–6 | 2,810 | 0.029 | 10,278 | ||
| 3,656 | 0.302 | 82,504 | |||
| Total | 2,418 | 0.896 | 350,860 |
Fig. 1Genome size estimation by k-mer distribution.
Fig. 2T. longiramus genome assembly and gene prediction workflow.
Statistics of the T. longiramus genome assembly.
| Platanus | SSPACE | Final | |
|---|---|---|---|
| Scaffolds | 1,025,695 | 30,899 | 30,897 |
| Scaffolds (>1000) | 63,362 | 30,899 | 30,897 |
| Total Length | 1,022,727,337 | 886,386,416 | 886,359,443 |
| Total Length (>1000) | 828,517,177 | 886,386,416 | 886,359,443 |
| Maximum length | 1,019,543 | 1,680,077 | 1,680,077 |
| N50 | 74,013 | 120,570 | 120,570 |
| Gap | 16,045,251 | 73,899,800 | 73,869,646 |
Statistics of repetitive elements.
| Total (bp) | % of genome | |
|---|---|---|
| DNA | 45,354,677 | 5.12 |
| LINE | 23,869,606 | 2.70 |
| LTR | 11,269,516 | 1.27 |
| Low_complexity | 1,202,626 | 0.14 |
| SINE | 163,811 | 0.02 |
| Satellite | 308,670 | 0.03 |
| Simple_repeat | 10,854,020 | 1.22 |
| TandemRepeat | 54,776,419 | 6.18 |
| Unknown | 48,880,228 | 5.51 |
| Unspecified | 397,465 | 0.04 |
| Total | 180,352,209 | 20.35 |
Statistics of predicted protein-coding genes.
| Number | Average transcript length (bp) | Average CDS length (bp) | Average intron length (bp) | |
|---|---|---|---|---|
| 23,985 | 8,060.4 | 242.1 | 1,616.3 | |
| Homology | 9,913 | 7,836.5 | 200.3 | 1,744.8 |
| Merged | 26,080 | 7,720.7 | 242.9 | 1,744.8 |
BUSCO assessment of genome assembly and gene prediction.
| Genome assembly | # Scaffolds | BUSCO (Arthropoda) |
|---|---|---|
| Platanus | 63,362 | C:86.0%[S:84.3%,D:1.7%],F:6.3%,M:7.7%,n:1066 |
| SSPACE | 30,899 | C:88.3%[S:86.8%,D:1.5%],F:4.5%,M:7.2%,n:1066 |
| Final | 30,897 | C:88.3%[S:86.8%,D:1.5%],F:4.5%,M:7.2%,n:1066 |
| Final | 26,080 | C:89.9%[S:85.3%,D:4.6%],F:6.6%,M:3.5%,n:1066 |
Fig. 3Comparison of orthologous genes. (a) Gene family expansion and contraction in arthropod species. Numbers designate the gene families that have expanded (green) and contracted (red) after the split from the common ancestor. Divergence time is scaled in millions of years. (b) A Venn diagram of unique and shared orthologous gene clusters in T. longiramus, P. hawaiensis, and H. azteca.
| Measurement(s) | DNA • RNA • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • RNA sequencing • sequence assembly process • sequence annotation |
| Sample Characteristic - Organism | Trinorchestia longiramus |
A list of software and parameters used for genome analysis.
| Softwares | Version | Parameters/Commands |
|---|---|---|
| FLASH | 1.2.11 | default |
| JELLYFISH | 2.2.6 | -C -m 17 |
| Platanus trim | 1.0.7 | platanus_trim (for PE reads), platanus_internal_trim (for MP reads) |
| Platanus | 1.2.4 | step-1: assemble -m 2048, step-2: scaffold, step-3: gap_close |
| SSPACE Standard | 3.0 | default |
| DIAMOND | 0.9.24 | default |
| MEGAN | 6.15.2 | default |
| QUAST | 4.5 | default |
| BUSCO | 3.0.2 | -l arthropoda_odb9 |
| RepeatMasker | 4.0.7 | -e ncbi -pa 4 |
| RepeatModeler | 1.0.10 | -engine ncbi -pa 4 |
| LSC | 2.0 | default |
| GMAP | 2018-07-04 | -B 5 |
| derive-gene-models-from-PacBio.pl | default | |
| TransDecoder | 3.0.1 | step-1: TransDecoder.LongOrfs, step-2: TransDecoder.Predict |
| Tophat | 2.1.1 | –microexon-search–mate-std-dev 26–mate-inner-dist 38–min-intron-length 30–min-coverage-intron 30–min-segment-intron 30 |
| GenBlastA | 1.0.4 | -p T -e 1e-5 -g T -f F -a 0.5 -d 100000 -r 100 -c 0.01 -s -100 |
| Exonerate | 2.2.0 | –model protein2genome –percent 30 –showvulgar no –showalignment yes–showquerygff no –showtargetgff yes –targetchunkid 1–targetchunktotal 100 |
| BRAKER | 2.0 | –species = |
| InterProscan | 5.16–55.0 | -appl HAMAP,ProDom,PRINTS,Pfam,TIGRFAM,SUPERFAMILY,ProSitePatterns,ProSiteProfiles -goterms -iprlookup |
| OrthoMCL | 2.0.9 | -I 1.5 |
| MUSCLE | 3.8.31 | default |
| ETE | 3.1.1 | trimal -gappyout |
| RAxML | 8.2.10 | -m PROTGAMMAJTT |
| MEGA | 7.00 | megacc |
| CAFE | 4.0 | default |