| Literature DB >> 28595343 |
Jian Gao1,2,3, Qiye Li2,3,4, Zongji Wang2,3,5, Yang Zhou2,3, Paolo Martelli6, Fang Li2,3, Zijun Xiong2,3,4, Jian Wang3,7, Huanming Yang3,7, Guojie Zhang2,3,4,8.
Abstract
The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution.Entities:
Keywords: Chinese crocodile lizard; Shinisaurus crocodilurus; annotation; genome assembly; sequencing
Mesh:
Year: 2017 PMID: 28595343 PMCID: PMC5569961 DOI: 10.1093/gigascience/gix041
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Example of a Chinese crocodile lizard, Shinisaurus crocodilurus (image from Wong Sai Lok).
Statistics of the Chinese crocodile lizard genome sequencing
| Raw data | Clean data | |||||||
|---|---|---|---|---|---|---|---|---|
| Insert size (bp) | Library | Reads length (bp) | Total data (Gb) | Sequence coverage (×) | Physical coverage (×) | Total Data (Gb) | Sequence coverage (×) | Physical coverage (×) |
| 250 | 1 | 150 | 54.16 | 27.78 | 23.15 | 41.99 | 23.15 | 20.32 |
| 500 | 1 | 150 | 54.67 | 28.04 | 46.73 | 39.27 | 46.73 | 39.5 |
| S800 | 1 | 150 | 15.68 | 8.04 | 21.45 | 11.82 | 21.45 | 18.31 |
| 2000 | 2 | 49 | 34.93 | 17.92 | 365.62 | 13.99 | 7.18 | 146.48 |
| 5000 | 2 | 49 | 34.35 | 17.62 | 899.1 | 13.60 | 6.97 | 355.83 |
| 10 000 | 2 | 49 | 33.74 | 17.3 | 1765.7 | 9.48 | 4.86 | 496.14 |
| 20 000 | 2 | 49 | 30.64 | 15.71 | 3207.13 | 3.25 | 1.66 | 340.01 |
| 40 000 | 2 | 49 | 32.68 | 16.76 | 6842.73 | 3.33 | 1.71 | 697.58 |
| Total | 13 | - | 290.85 | 149.17 | 13 171.61 | 136.73 | 70.12 | 2114.17 |
Coverage calculation was based on the estimated genome size of 1.95 Gb. Sequence coverage is the average number of times a base is read, while physical coverage is the average number of times a base is spanned by mate-paired reads.
Statistics of 17-mer analysis
| Genome | Kmer | Kmer number | Peak depth | Estimated genome size (bp) | Used base (bp) |
|---|---|---|---|---|---|
|
| 17 | 68 234 898 814 | 35 | 1 949 568 537 | 78 251 030 750 |
The genome size was estimated according to the following formula: genome size = (Kmer number)/(Peak depth).
Figure 2:17-mer depth distribution. The 17-mer analysis was employed by using 250, 500, and 800 bp short-insert size libraries. The peak depth was ×35. The total number of 17-mer present in this subset was 68 234 898 814. The genome size was estimated to be 1.95 Gb according to the following formula: genome size = (Kmer number)/(Peak depth).
Comparison of genome assembly and gene number for 15 reptiles with published genomes
| Species | Common name | Sequencing platform | Sequence coverage (×) | Assembled genome size (Gb) | Contig N50 (kb) | Scaffold N50 (kb) | Gap ratio (%) | Gene number | Reference |
|---|---|---|---|---|---|---|---|---|---|
|
| American alligator | NGS | 156.0 | 2.17 | 7.0 | 509 | 2.09 | 23 323 | [ |
|
| Chinese alligator | NGS | 109.0 | 2.30 | 23.4 | 2188 | 3.17 | 22 200 | [ |
|
| Green anole lizard | Sanger | 6.0 | 1.78 | 79.9 | 4033 | 4.49 | 17 472 | [ |
|
| Western painted turtle | Sanger + NGS | 18.0 | 2.59 | 11.9 | 5212 | 7.64 | 21 796 | [ |
|
| Green sea turtle | NGS | 82.3 | 2.24 | 20.4 | 3778 | 4.33 | 19 633 | [ |
|
| Saltwater crocodile | NGS | 74.0 | 2.12 | 32.8 | 205 | 5.30 | 13 321 | [ |
|
| Five-pacer viper | NGS | 114.2 | 1.47 | 22.4 | 2122 | 5.29 | 21 194 | [ |
|
| Leopard gecko | NGS | 135.8 | 2.02 | 20.0 | 664 | 1.76 | 24 755 | [ |
|
| Indian gharial | NGS | 81.0 | 2.88 | 14.2 | 127 | 2.22 | 14 043 | [ |
|
| Japanese gecko | NGS | 131.3 | 2.55 | 21.1 | 685 | 3.54 | 22 487 | [ |
|
| King cobra | NGS | 28.0 | 1.66 | 4.0 | 226 | 13.5 | 18 579 | [ |
|
| Soft-shell turtle | NGS | 105.6 | 2.21 | 21.9 | 3331 | 4.35 | 23 649 | [ |
|
| Australian dragon lizard | NGS | 179.1 | 1.82 | 31.3 | 2290 | 3.78 | 19 406 | [ |
|
| Burmese python | NGS | 20.0 | 1.44 | 10.7 | 208 | 3.52 | 25 385 | [ |
|
| Chinese crocodile lizard | NGS | 149 | 2.24 | 11.7 | 1470 | 7.98 | 20 150 |
The percentages of complete, fragmented, and missing genes out of the 2586 expected vertebrata genes in 15 reptile genomes based on the BUSCO assessment
| Species | Common name | Complete single-copy (%) | Complete duplicated (%) | Fragmented (%) | Missing (%) |
|---|---|---|---|---|---|
|
| American alligator | 95.0 | 0.6 | 3.1 | 1.3 |
|
| Chinese alligator | 94.4 | 0.7 | 3.2 | 1.7 |
|
| Green anole lizard | 88.1 | 0.8 | 5.6 | 5.5 |
|
| Green sea turtle | 93.9 | 0.8 | 3.7 | 1.6 |
|
| Western painted turtle | 75.5 | 0.8 | 3.3 | 20.4 |
|
| Saltwater crocodile | 94.1 | 0.6 | 2.1 | 3.2 |
|
| Five-pacer viper | 94.5 | 0.6 | 2.4 | 2.5 |
|
| Leopard gecko | 94.0 | 1.2 | 3.3 | 1.5 |
|
| Indian gharial | 85.2 | 0.5 | 11.6 | 2.7 |
|
| Japanese gecko | 89.8 | 1.1 | 6.3 | 2.8 |
|
| King cobra | 86.6 | 0.6 | 8.6 | 4.2 |
|
| Soft-shell turtle | 93.5 | 0.5 | 3.8 | 2.2 |
|
| Australian dragon lizard | 94.3 | 0.6 | 3.1 | 2.0 |
|
| Burmese python | 91.0 | 0.7 | 5.4 | 2.9 |
|
| Chinese crocodile lizard | 91.6 | 0.9 | 4.8 | 2.7 |
The statistics of repeats annotated by different methods in the Chinese crocodile lizard genome
| Method | Total repeat length (bp) | Percentage of genome |
|---|---|---|
| TRF | 35 995 906 | 1.74 |
| Repeatmasker | 199 442 776 | 9.65 |
| Proteinmask | 164 914 070 | 7.98 |
| RepeatModeler | 938 017 292 | 41.79 |
| LTR_FINDER | 235 204 092 | 10.48 |
| Total | 1 113 900 339 | 49.62 |
Breakdown of repeat content for 5 reptile genomes estimated by RepeatMasker
| Repeat type | The Burmese python (%) | The king cobra (%) | The green anole lizard (%) | The Australian dragon lizard (%) | The Chinese crocodile lizard (%) |
|---|---|---|---|---|---|
| DNA | 3.45 | 3.49 | 8.71 | 3.26 | 3.80 |
| LINE | 8.57 | 10.55 | 12.19 | 10.93 | 10.20 |
| SINE | 1.60 | 2.09 | 5.11 | 3.14 | 2.72 |
| LTR | 0.85 | 1.75 | 2.94 | 0.92 | 1.52 |
| Unknown | 12.61 | 12.87 | 7.49 | 16.23 | 23.95 |
| Total | 31.82 | 35.22 | 33.82 | 35.93 | 41.79 |
Number and percentage of genes with functional annotation
| Number | Percentage (%) | |
|---|---|---|
| SwissProt | 18 817 | 93.38 |
| TrEMBL | 9675 | 48.01 |
| InterPro | 17 589 | 87.29 |
| KEGG | 15 791 | 78.37 |
| GO | 14 518 | 72.05 |
| Combined | 20 010 | 99.31 |