| Literature DB >> 30124856 |
Haiping Liu1, Qiyong Liu1, Zhiqiang Chen2, Yanchao Liu1, Chaowei Zhou1, Qiqi Liang2, Caixia Ma2, Jianshe Zhou1, Yingzi Pan1, Meiqun Chen1, Wenkai Jiang2, Shijun Xiao3, Zhenbo Mou1.
Abstract
Background: Mechanisms for high-altitude adaption have attracted widespread interest among evolutionary biologists. Several genome-wide studies have been carried out for endemic vertebrates in Tibet, including mammals, birds, and amphibians. However, little information is available about the adaptive evolution of highland fishes. Glyptosternon maculatum (Regan 1905), also known as Regan or barkley and endemic to the Tibetan Plateau, belongs to the Sisoridae family, order Siluriformes (catfishes). This species lives at an elevation ranging from roughly 2,800 m to 4,200 m. Hence, a high-quality reference genome of G. maculatum provides an opportunity to investigate high-altitude adaption mechanisms of fishes. Findings: To obtain a high-quality reference genome sequence of G. maculatum, we combined Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, 10X Genomics linked-reads, and BioNano optical map techniques. In total, 603.99 Gb sequencing data were generated. The assembled genome was about 662.34 Mb with scaffold and contig N50 sizes of 20.90 Mb and 993.67 kb, respectively, which captured 83% complete and 3.9% partial vertebrate Benchmarking Universal Single-Copy Orthologs. Repetitive elements account for 35.88% of the genome, and 22,066 protein-coding genes were predicted from the genome, of which 91.7% have been functionally annotated. Conclusions: We present the first comprehensive de novo genome of G. maculatum. This genetic resource is fundamental for investigating the origin of G. maculatum and will improve our understanding of high-altitude adaption of fishes. The assembled genome can also be used as reference for future population genetic studies of G. maculatum.Entities:
Mesh:
Year: 2018 PMID: 30124856 PMCID: PMC6136493 DOI: 10.1093/gigascience/giy104
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:(a) The appearance of G. maculatum. (b) The liver of G. maculatum was divided into two parts, one placed outside the abdominal cavity (attaching liver), connected to another part that is located inside the cavity (mail liver). (c) Distributed localization (red triangle) of G. maculatum for sequencing. (Figure schematic drawings (ventral view) of G. maculatum (imaged from Zhang [9]).
Sequencing data used for the G. maculatum genome assembly
| Pair-end libraries | Insert size (bp) | Raw data (Gb) | Clean data (Gb) | Read length (bp) | Sequence coverage (X) |
|---|---|---|---|---|---|
| Illumina reads | 250 bp | 148.16 | 147.16 | 150 | 191 |
| PacBio reads | 20 Kb | 106.32 | 106.05 | 11 745 | 145.2 |
| 10X Genomics | 500 bp | 157.21 | 157.02 | 150 | 203.5 |
| BioNano | – | 192.30 | 191.30 | – | 248 |
| Total | – | 603.99 | 601.53 | – | 787.7 |
The coverage was calculated using an estimated genome size of 771.19 Mb.
Assembly statistics of G. maculatum
| Sample ID | Length | Number | ||
|---|---|---|---|---|
| Contiga (bp) | Scaffold (bp) | Contiga | Scaffold | |
| Total | 637,133,884 | 662,339,741 | 3,281 | 531 |
| Max | 5,772,991 | 47,179,384 | - | - |
| Number ≥ 2000 | - | - | 3,161 | 531 |
| N50 | 993,673 | 20,902,354 | 161 | 11 |
| N60 | 668,112 | 17,328,106 | 239 | 14 |
| N70 | 418,057 | 12,288,896 | 359 | 19 |
| N80 | 211,596 | 6,320,921 | 575 | 27 |
| N90 | 77,392 | 1,017,220 | 1,067 | 50 |
aContig after scaffolding.
General statistics of predicted protein-coding genes
| Gene set | Number | Average transcript length (bp) | Average coding sequence length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) | |
|---|---|---|---|---|---|---|---|
|
| Augustus | 14,910 | 9,534 | 1,241 | 6.93 | 179 | 1,399 |
| GlimmerHMM | 73,320 | 7,896 | 574 | 3.87 | 148 | 2,551 | |
| SNAP | 43,247 | 15,950 | 847 | 6.04 | 140 | 2,996 | |
| Geneid | 23,523 | 16,924 | 1,323 | 6.29 | 210 | 2,948 | |
| Genscan | 24,037 | 19,024 | 1,514 | 8.14 | 186 | 2,451 | |
|
| 32,364 | 6,413 | 1,142 | 5.12 | 223 | 1,279 | |
|
| 27,208 | 6,326 | 1,252 | 5.36 | 234 | 1,165 | |
|
| 30,336 | 5,615 | 1,048 | 4.87 | 215 | 1,181 | |
|
| 19,458 | 9,935 | 1,507 | 7.58 | 199 | 1,280 | |
| Homolog |
| 16,090 | 10,844 | 1,432 | 7.83 | 183 | 1,379 |
|
| 23,120 | 8,191 | 1,225 | 6.12 | 200 | 1,362 | |
|
| 16,164 | 10,803 | 1,417 | 7.74 | 183 | 1,392 | |
|
| 37,610 | 6,704 | 1,155 | 5.22 | 221 | 1,315 | |
| RNA-seq | PASA | 97,309 | 9,419 | 1,201 | 7.09 | 169 | 1,348 |
| Cufflinks | 92,180 | 19,478 | 4,707 | 10.13 | 465 | 1,618 | |
| EvidenceModeler | 25,365 | 11,517 | 1,323 | 7.66 | 173 | 1,531 | |
| PASA-update* | 38,086 | 13,009 | 1,521 | 8.79 | 173 | 1,475 | |
| Final set* | 22,066 | 12,913 | 1,458 | 8.48 | 172 | 1,531 | |
Figure 2:Divergence time estimated between G. maculatum and other species.