Bo Song1, Shifeng Cheng2, Yanbo Sun3, Xiao Zhong1, Jieqiong Jin3, Rui Guan1, Robert W Murphy4, Jing Che3, Yaping Zhang5, Xin Liu1. 1. BGI-Shenzhen, Shenzhen, 518083 China. 2. BGI-Shenzhen, Shenzhen, 518083 China ; HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, The Computer Science Department, The University of Hong Kong, Hong Kong, China. 3. State Key Laboratory of Genetic Resources and Evolution, and Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China. 4. State Key Laboratory of Genetic Resources and Evolution, and Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China ; Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, 100 Queen's Park, Toronto, Ont. M5S 2C6 Canada. 5. State Key Laboratory of Genetic Resources and Evolution, and Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China ; Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming, 650091 China.
Abstract
BACKGROUND: Transition from a lizard-like to a snake-like body form is one of the most important transformations in reptilian evolution. The increasing number of sequenced reptilian genomes is enabling a deeper understanding of vertebrate evolution, although the genetic basis of the loss of limbs in reptiles remains enigmatic. Here we report genome sequencing, assembly, and annotation for the Asian glass lizard Ophisaurus gracilis, a limbless lizard species with an elongated snake-like body form. Addition of this species to the genome repository will provide an excellent resource for studying the genetic basis of limb loss and trunk elongation. FINDINGS: O. gracilis genome sequencing using the Illumina HiSeq2000 platform resulted in 274.20 Gbp of raw data that was filtered and assembled to a final size of 1.78 Gbp, comprising 6,717 scaffolds with N50 = 1.27 Mbp. Based on the k-mer estimated genome size of 1.71 Gbp, the assembly appears to be nearly 100% complete. A total of 19,513 protein-coding genes were predicted, and 884.06 Mbp of repeat sequences (approximately half of the genome) were annotated. The draft genome of O. gracilis has similar characteristics to both lizard and snake genomes. CONCLUSIONS: We report the first genome of a lizard from the family Anguidae, O. gracilis. This supplements currently available genetic and genomic resources for amniote vertebrates, representing a major increase in comparative genome data available for squamate reptiles in particular.
BACKGROUND: Transition from a lizard-like to a snake-like body form is one of the most important transformations in reptilian evolution. The increasing number of sequenced reptilian genomes is enabling a deeper understanding of vertebrate evolution, although the genetic basis of the loss of limbs in reptiles remains enigmatic. Here we report genome sequencing, assembly, and annotation for the Asian glass lizardOphisaurus gracilis, a limbless lizard species with an elongated snake-like body form. Addition of this species to the genome repository will provide an excellent resource for studying the genetic basis of limb loss and trunk elongation. FINDINGS:O. gracilis genome sequencing using the Illumina HiSeq2000 platform resulted in 274.20 Gbp of raw data that was filtered and assembled to a final size of 1.78 Gbp, comprising 6,717 scaffolds with N50 = 1.27 Mbp. Based on the k-mer estimated genome size of 1.71 Gbp, the assembly appears to be nearly 100% complete. A total of 19,513 protein-coding genes were predicted, and 884.06 Mbp of repeat sequences (approximately half of the genome) were annotated. The draft genome of O. gracilis has similar characteristics to both lizard and snake genomes. CONCLUSIONS: We report the first genome of a lizard from the family Anguidae, O. gracilis. This supplements currently available genetic and genomic resources for amniote vertebrates, representing a major increase in comparative genome data available for squamate reptiles in particular.
Ophiosaurous gracilis genomic DNA was extracted from the tail of a single male lizard collected from the Tibetan Plateau and used to construct seven paired-end Illumina libraries with insert sizes ranging from 180 bp to 20 kbp. To construct small-insert libraries (180, 500, and 800 bp), DNA was sheared to the target size range using Covair S2 (Covaris, Woburn, MA, USA) and ligated to adaptors. For long-insert libraries (2, 5, 10, and 20 kb), DNA was fragmented using a Hydroshear system (Digilab, Marlborough, MA, USA). Sheared fragments were biotin labelled at the ends and fragments of the desired size were gel purified. A second round of fragmentation was then conducted before adapter ligation. Both libraries were sequenced on an Illumina HiSeq2000 Genome Analyzer (Illumina, San Diego, CA, USA), with 100 bp and 90 bp sequencing for short insert size libraries (180–800 bp) and large insert size libraries (2–20 kbp), respectively. A total of 274.20 Gbp of raw data was generated, from which 147.08 Gbp of ‘clean’ data was obtained after removal of duplicates, contaminated reads (reads with adaptor sequences), low quality reads (with Solexa quality scores (Phred64) of less than 7 for >60% and >80% of bases for short-insert libraries and long-insert libraries, respectively) and reads with more than 10% ‘N’ bases. The O. gracilis genome size was estimated to be approximately 1.71 Gbp using a k-mer-based approach [1]. Based on this estimate, the clean data corresponds to approximately 86-fold coverage of the O. gracilis genome. High-quality reads were used for genome assembly (contig and scaffold construction) and gap closure was performed using the SOAPdenovo package and default parameters except that the k-mer size was set at 63 [2]. The final assembly had a total length of 1.78 Gbp, comprising 6,715 scaffolds assembled from 135,863 contigs, with the longest scaffold size being 6.68 Mbp. The N50 sizes for contigs and scaffolds were 23.41 kbp and 1.27 Mbp, respectively. Given the genome size estimate of 1.71 Gbp, genome coverage by the final assembly was probably complete, although this is probably a slight overestimate due to possible overlaps between some of the scaffolds and/or misassembly of some heterozygous alleles. Completeness of the assembly was confirmed by the successful mapping of up to 97% of reads from short insert libraries. Collectively, this data indicates that almost complete O. gracilis genome coverage was obtained.Protein-coding genes were predicted and annotated by a combination of homology searching and de novo prediction using AUGUSTUS [3]. To search for homologous gene models, the genome assembly was queried against a database containing protein sequences and gene transcripts from three other squamate reptile species (Anolis carolinesis, Ophiophagus hannah, and Python molurus bivittatus) and four other tetrapod vertebrates (Gallus gallus, Homo sapiens, Taeniopygia gutta, and Xenopus tropicalis). This resulted in identification of a total of 19,513 protein-coding genes in the O. gracilis assembly, with an average of seven introns per gene. The gene length ranged from 137 to 96,389 bp, with an average of 1,506 bp; the average exon and intron length was 186 and 3,809 bp, respectively (Table 1).
Table 1
Global statistics of the
genome
Statistic
Value
Size (Gb)
1.71
Scaffold number
6,715
Scaffold N50 (Mb)
1.27
Gene number
19,513
Average gene length (bp)
1,506
Average intron number
7
Average intron length (bp)
3,809
Average exon length (bp)
186
Global statistics of the
genomeGenomic repeat elements in the O. gracilis genome assembly were also identified and annotated. RepeatMasker software version 3.2.7 [4] was used to search for repeat elements using the RepBase library (version 16.10) [5]. We also constructed a de novo repeat sequence database for the O. gracilis genome using LTR-FINDER [6] and RepeatModeler [7], and used this library to identify additional repeat elements using RepeatMasker. By combining the data obtained from both repeat element annotation approaches, a total length of 884.06 Mbp of the O. gracilis genome was identified as repetitive. Repeat annotations accounted for approximately 49.63% of the entire genome assembly, which is remarkably higher than estimates for other squamate reptiles, the anole lizard (~30.4%) [8] and both of the available snake genomes (the python (~27.60%) [9] and cobra (~31.28%) [10]). The repeat element landscape of O. gracilis mostly consists of retrotransposons, including long interspersed elements (LINEs), short interspersed elements (SINEs) and long terminal repeats (LTRs). LINEs represented the most abundant class of retrotransposons, occupying 37.65% of the genome, while the other repeat elements (SINE and LTR) comprised 1.80% and 6.44%, respectively (Table 2). DNA transposons were particularly rare, forming only 3.2% of the genome.
Table 2
Summary of mobile element types
Type
Length (kb)
Percentage of genome (%)
DNA
56,874
3.19
LINE
670,619
37.65
SINE
32,019
1.80
LTR
114,739
6.44
Other
177
0.01
Unknown
160,545
9.01
Total
884,057
49.63
Summary of mobile element typesIn summary, we report the first annotated anguid lizard genome sequence assembly, to supplement the existing amniote genome resources in which squamate reptile sequences are sparsely represented. Despite the distant phylogenetic relationship [11], the morphology of the Asian glass lizardO. gracilis is highly convergent with that of snakes, including the lack of limbs and an elongated body. We therefore expect the genome of this species to be particularly useful for future comparative genomic analyses to identify the molecular basis of limb loss and body form evolution in squamate reptiles, and vertebrates in general.
Availability of supporting data
Supporting data is available in the GigaScience repository, GigaDB [12], and raw data in the SRP052050.
Authors: Todd A Castoe; A P Jason de Koning; Kathryn T Hall; Daren C Card; Drew R Schield; Matthew K Fujita; Robert P Ruggiero; Jack F Degner; Juan M Daza; Wanjun Gu; Jacobo Reyes-Velasco; Kyle J Shaney; Jill M Castoe; Samuel E Fox; Alex W Poole; Daniel Polanco; Jason Dobry; Michael W Vandewege; Qing Li; Ryan K Schott; Aurélie Kapusta; Patrick Minx; Cédric Feschotte; Peter Uetz; David A Ray; Federico G Hoffmann; Robert Bogden; Eric N Smith; Belinda S W Chang; Freek J Vonk; Nicholas R Casewell; Christiaan V Henkel; Michael K Richardson; Stephen P Mackessy; Anne M Bronikowski; Anne M Bronikowsi; Mark Yandell; Wesley C Warren; Stephen M Secor; David D Pollock Journal: Proc Natl Acad Sci U S A Date: 2013-12-02 Impact factor: 11.205
Authors: Freek J Vonk; Nicholas R Casewell; Christiaan V Henkel; Alysha M Heimberg; Hans J Jansen; Ryan J R McCleary; Harald M E Kerkkamp; Rutger A Vos; Isabel Guerreiro; Juan J Calvete; Wolfgang Wüster; Anthony E Woods; Jessica M Logan; Robert A Harrison; Todd A Castoe; A P Jason de Koning; David D Pollock; Mark Yandell; Diego Calderon; Camila Renjifo; Rachel B Currier; David Salgado; Davinia Pla; Libia Sanz; Asad S Hyder; José M C Ribeiro; Jan W Arntzen; Guido E E J M van den Thillart; Marten Boetzer; Walter Pirovano; Ron P Dirks; Herman P Spaink; Denis Duboule; Edwina McGlinn; R Manjunatha Kini; Michael K Richardson Journal: Proc Natl Acad Sci U S A Date: 2013-12-02 Impact factor: 11.205
Authors: Jessica Alföldi; Federica Di Palma; Manfred Grabherr; Christina Williams; Lesheng Kong; Evan Mauceli; Pamela Russell; Craig B Lowe; Richard E Glor; Jacob D Jaffe; David A Ray; Stephane Boissinot; Andrew M Shedlock; Christopher Botka; Todd A Castoe; John K Colbourne; Matthew K Fujita; Ricardo Godinez Moreno; Boudewijn F ten Hallers; David Haussler; Andreas Heger; David Heiman; Daniel E Janes; Jeremy Johnson; Pieter J de Jong; Maxim Y Koriabine; Marcia Lara; Peter A Novick; Chris L Organ; Sally E Peach; Steven Poe; David D Pollock; Kevin de Queiroz; Thomas Sanger; Steve Searle; Jeremy D Smith; Zachary Smith; Ross Swofford; Jason Turner-Maier; Juli Wade; Sarah Young; Amonida Zadissa; Scott V Edwards; Travis C Glenn; Christopher J Schneider; Jonathan B Losos; Eric S Lander; Matthew Breen; Chris P Ponting; Kerstin Lindblad-Toh Journal: Nature Date: 2011-08-31 Impact factor: 49.962
Authors: Jia-Tang Li; Yue-Dong Gao; Liang Xie; Cao Deng; Peng Shi; Meng-Long Guan; Song Huang; Jin-Long Ren; Dong-Dong Wu; Li Ding; Zi-Yan Huang; Hu Nie; Devon P Humphreys; David M Hillis; Wen-Zhi Wang; Ya-Ping Zhang Journal: Proc Natl Acad Sci U S A Date: 2018-07-31 Impact factor: 11.205
Authors: Marc Tollis; Elizabeth D Hutchins; Jessica Stapley; Shawn M Rupp; Walter L Eckalbar; Inbar Maayan; Eris Lasku; Carlos R Infante; Stuart R Dennis; Joel A Robertson; Catherine M May; Michael R Crusoe; Eldredge Bermingham; Dale F DeNardo; Shi-Tong Tonia Hsieh; Rob J Kulathinal; William Owen McMillan; Douglas B Menke; Stephen C Pratt; Jeffery Alan Rawls; Oris Sanjur; Jeanne Wilson-Rawls; Melissa A Wilson Sayres; Rebecca E Fisher; Kenro Kusumi Journal: Genome Biol Evol Date: 2018-02-01 Impact factor: 3.416
Authors: Juliana G Roscito; Katrin Sameith; Martin Pippel; Kees-Jan Francoijs; Sylke Winkler; Andreas Dahl; Georg Papoutsoglou; Gene Myers; Michael Hiller Journal: Gigascience Date: 2018-12-01 Impact factor: 6.524