| Literature DB >> 26421146 |
Arthur Georges1, Qiye Li2, Jinmin Lian3, Denis O'Meally1, Janine Deakin1, Zongji Wang4, Pei Zhang3, Matthew Fujita5, Hardip R Patel6, Clare E Holleley1, Yang Zhou3, Xiuwen Zhang1, Kazumi Matsubara1, Paul Waters7, Jennifer A Marshall Graves8, Stephen D Sarre1, Guojie Zhang9.
Abstract
BACKGROUND: The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps.Entities:
Keywords: Agamidae, Squamata, Next-generation sequencing; Central bearded dragon; Dragon lizard; Pogona vitticeps
Mesh:
Year: 2015 PMID: 26421146 PMCID: PMC4585809 DOI: 10.1186/s13742-015-0085-2
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Summary of sequencing data derived from paired-end sequencing of 13 insert libraries using an Illumina HiSeq 2000 platform
| Raw data | Filtered data | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Insert size (bp) | Accession numbers | Nunber of libraries | Read length (bp) | Raw data (Gbp) | Average read depth (X) | Physical coverage (X) | Read length (bp) | Filtered data (Gbp) | Average read depth (X) | Physical coverage (X) |
| 250 | ERR409943 | 1 | 150 | 55.17 | 31.17 | 25.97 | 125 | 42.49 | 24.01 | 24.00 |
| ERR409944 | ||||||||||
| 500 | ERR409945 | 1 | 150 | 34.32 | 19.39 | 32.32 | 125 | 23.66 | 13.37 | 26.72 |
| ERR409946 | ||||||||||
| 800 | ERR409947 | 1 | 150 | 46.28 | 26.15 | 69.72 | 125 | 32.2 | 18.19 | 60.63 |
| 2,000 | ERR440173 | 2 | 49 | 38.39 | 21.69 | 442.64 | 49 | 18.19 | 10.28 | 209.73 |
| ERR409948 | ||||||||||
| 5,000 | ERR409949 | 1 | 49 | 17.48 | 9.88 | 503.95 | 49 | 6.56 | 3.71 | 188.99 |
| 6,000 | ERR409950 | 1 | 49 | 17.43 | 9.85 | 603.01 | 49 | 6.01 | 3.4 | 208.00 |
| 10,000 | ERR409951 | 2 | 49 | 34.94 | 19.74 | 2,014.60 | 49 | 7.89 | 4.46 | 455.00 |
| ERR409952 | ||||||||||
| 20,000 | ERR409953 | 2 | 49 | 38.53 | 21.77 | 4,443.48 | 49 | 6.63 | 3.75 | 764.38 |
| ERR409954 | ||||||||||
| 40,000 | ERR409955 | 2 | 49 | 34.4 | 19.44 | 7,932.30 | 49 | 2.75 | 1.55 | 633.29 |
| ERR409956 | ||||||||||
| 13 | 316.94 | 179.06 | 16,067.99 | 146.38 | 82.7 | 2,570.74 | ||||
Read depth was calculated on the basis of a genome size of 1.77 Gbp. Average read depth, number of times on average a particular base is included in a read. Physical coverage, the number of times on average a particular base is spanned by a paired read
Fig. 1K-mer spectrum for the genome sequence of a male Pogona vitticeps (ZZ). Sequencing DNA derived from the short-insert libraries (250, 500, 800 bp) yielded 98.35 Gbases of clean data in the form of 125 bp reads, which generated 76.89x109 17-mer sequences. The solid line shows the k-mer spectrum (percentage frequency against k-mer copy number). The second mode (copy number 48.5) represents homozygous single copy sequence, whereas the first mode (24.5), half the copy number of the first, represents heterozygous single copy sequence. Heterozygosity is high, which complicated assembly
Statistics for the assembly contigs and scaffolds (after gap filling)
| Contig | Scaffold | |||
|---|---|---|---|---|
| Size (bp) | Number | Size (bp) | Number | |
| N90 | 4,850 | 63,958 | 200,992 | 1,095 |
| N80 | 12,159 | 42,491 | 670,865 | 644 |
| N70 | 18,332 | 30,884 | 1,149,567 | 441 |
| N60 | 24,540 | 22,654 | 1,671,674 | 311 |
| N50 | 31,298 | 16,344 | 2,290,546 | 219 |
| Longest | 295,776 | 14,681,335 | ||
| Total size | 1,747,524,961 | 1,816,115,349 | ||
| ≥100 bp | 636,524 | 545,300 | ||
| ≥2 kbp | 79,002 | 4,356 | ||
| Gap ratio | 0 % | 3.78 % | ||
Number of predicted genes with RNA-seq signals
| Specimen ID (tissue ID) | Accession number | Tissue | Genotype | Phenotype | RPKM >0 | RPKM >1 | RPKM >5 | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Number | Ratio (%) | Number | Ratio (%) | Number | Ratio (%) | |||||
| 1003347859 | ERR753524 | Brain | ZZ | Intersex | 17,049 | 87.85 | 14,403 | 74.22 | 11,244 | 57.94 |
| (AA45100) | ||||||||||
| 1003338787 | ERR753525 | Brain | ZZ | Male | 16,934 | 87.26 | 14,467 | 74.55 | 11,359 | 58.53 |
| (AA60463) | ||||||||||
| 1003348364 | ERR753526 | Brain | ZW | Female | 17,121 | 88.23 | 14,526 | 74.85 | 11,474 | 59.13 |
| (AA60435) | ||||||||||
| 1003347859 | ERR753527 | Testes | ZZ | Intersex | 16,874 | 86.95 | 13,874 | 71.49 | 10,784 | 55.57 |
| (AA45100) | ||||||||||
| 1003347859 | ERR753528 | Ovary | ZZ | Intersex | 16,827 | 86.71 | 12,952 | 66.74 | 10,421 | 53.7 |
| (AA45100) | ||||||||||
| 1003338787 | ERR753529 | Testes | ZZ | Male | 17,963 | 92.56 | 14,951 | 77.04 | 11,311 | 58.29 |
| (AA60463) | ||||||||||
| 1003348364 | ERR753530 | Ovary | ZW | Female | 17,188 | 88.57 | 13,634 | 70.26 | 10,946 | 56.41 |
| (AA60435) | ||||||||||
| Combined | 18,833 | 97.05 | 17,646 | 90.93 | 15,974 | 82.31 | ||||
Gene expression levels were measured as RPKM (reads per kilobase of gene per million mapped reads). Ratios are based on a total of 19,406 annotated protein-coding genes
The statistics for repeats in the P. vitticeps genome annotated by different methods
| Program | Total repeat length (bp) | Percentage of genome |
|---|---|---|
| Tandem Repeats Finder | 59,773,950 | 3.42 |
| Repeatmasker | 174,011,206 | 9.96 |
| Proteinmask | 157,050,977 | 8.99 |
| RepeatModeler | 592,771,829 | 33.92 |
| LTR Finder | 65,464,996 | 3.75 |
| Total | 689,687,572 | 39.47 |
Breakdown of repeat content of the Pogona vitticeps genome derived from RepeatMasker analysis
| Category | Repbase TEs | TE proteins |
| Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % of genome | Length (bp) | % of genome | Length (bp) | % of genome | Length (bp) | % of genome | |
| DNA | 25,035,683 | 1.43 | 6,450,126 | 0.37 | 56,943,252 | 3.26 | 70,663,766 | 4.04 |
| LINE | 124,676,466 | 7.13 | 132,747,210 | 7.60 | 191,015,014 | 10.93 | 213,508,152 | 12.22 |
| SINE | 20,281,741 | 1.16 | - | 0.00 | 54,941,907 | 3.14 | 57,180,364 | 3.27 |
| LTR | 7,613,766 | 0.44 | 17,931,338 | 1.03 | 16,104,019 | 0.92 | 28,021,391 | 1.60 |
| Other | 24,327 | 0.00 | - | 0.00 | - | 0.00 | 24,327 | 0.00 |
| Unknown | 761,119 | 0.04 | - | 0.00 | 283,563,847 | 16.23 | 284,276,315 | 16.27 |
| Total | 174,011,206 | 9.96 | 157,050,977 | 8.99 | 627,828,869 | 35.93 | 657,625,603 | 37.63 |
Abbreviations: LINE long interspersed nuclear element, LTR long terminal repeat, SINE short interspersed nuclear element, TE transposable element
Characteristics of predicted protein-coding genes in the Pogona vitticeps assembly and comparison with Anolis carolensis, Gallus gallus and Homo sapiens
| Gene set | Total | Intact ORF | Single exon gene | Gene length (bp) | mRNA length (bp) | Exons per gene | Exon length (bp) | Intron length (bp) | |
|---|---|---|---|---|---|---|---|---|---|
| Homolog |
| 16,009 | 2,583 | 1,668 | 23,021 | 1,524 | 8.57 | 178 | 2,839 |
|
| 12,727 | 2,068 | 1,509 | 27,608 | 1,558 | 9.06 | 172 | 3,232 | |
|
| 13,544 | 2,456 | 1,250 | 32,551 | 1,699 | 9.75 | 174 | 3,528 | |
| Combined | 18,033 | 3,263 | 2,180 | 26,631 | 1,577 | 8.93 | 177 | 3,160 | |
|
| 32,110 | 32,110 | 6,767 | 14,109 | 1,125 | 6.07 | 185 | 2,561 | |
| Transcriptome | 22,986 | 14,555 | 2,951 | 12,511 | 1,214 | 6.99 | 174 | 1,885 | |
| Merged | 19,406 | 12,172 | 1,999 | 26,215 | 1,642 | 9.24 | 178 | 2,984 | |
| Other species |
| 17,805 | 4,280 | 1,372 | 23,469 | 1,526 | 9.55 | 160 | 2,566 |
|
| 16,736 | 7,777 | 1,684 | 21,314 | 1,438 | 9.35 | 154 | 2,379 | |
|
| 21,849 | 20,905 | 2,602 | 46,301 | 1,635 | 9.44 | 173 | 5,293 | |
Except for the columns headed Total, Intact ORF and Single exon gene, the values presented are means.
Abbreviation: ORF open reading frame
Comparison of mean GC content for available tetrapod genomes
| Organism | Genome version | Mean GC | SD |
|---|---|---|---|
|
| pvi1.1.Jan.2013 | 0.418 | 0.037 |
| | pvi1.1.Jan.2013 | 0.445 | 0.050 |
| | pvi1.1.Jan.2013 | 0.409 | 0.029 |
| | pvi1.1.Jan.2013 | 0.469 | 0.037 |
|
| JGI_4.2 | 0.398 | 0.038 |
|
| AnoCar2 | 0.403 | 0.032 |
|
| CanFam3.1 | 0.413 | 0.069 |
|
| GRCm38 | 0.417 | 0.046 |
|
| Galgal4 | 0.416 | 0.059 |
|
| croc_sub2 | 0.442 | 0.050 |
|
| PelSin_1.0 | 0.441 | 0.053 |
|
| ChrPicBel3.0.1 | 0.437 | 0.055 |
|
| python_5.0 | 0.396 | 0.042 |
|
| GCA_000516915.1 (NCBI) | 0.386 | 0.040 |
Abbreviation: SD standard deviation
Fig. 2Distribution of GC content in 5 Kbp windows for a range of vertebrates including Pogona vitticeps
Fig. 3Variation in GC content among windows for various genome sequences with increasing window size (5, 10, 20, 40, 80, 160, and 320 Kb windows). The relationship for Pogona vitticeps is disaggregated to macrochromosomes, microchromosomes and the Z sex chromosome for comparison. Scale of X axis is natural logarithm. Pogona macrochromosomes share the lack of isochore structure reported for the Anolis genomeᅟ
Fig. 4Analysis of GC content in Pogona vitticeps. a, Distribution of GC content in all chromosomes, macrochromosomes, microchromosomes and the Z chromosome, calculated with a non-overlap 5-kb sliding windows ; b, GC content of various components of the genome, in comparison with the average GC content for macrochromosomes (red line), microchromosomes (green line) and the Z chromosome (blue line) ; c, GC content of the macrochromosomes, microchromosomes and Z chromosomes broken down for various components of the genome
Comparison of sequencing platform, assembler, and assembly statistics for the reptiles for which a genome sequence is available
| Bearded dragon | Burmese python | King cobra | Saltwater crocodile | Chicken | Green anole | American alligator | Gharial | Chinese softshell turtle | Green sea turtle | Western painted turtle | |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| |
| Assembler | SOAP deNovo | SOAP deNovo | CLC NGS Cell (version 2011) | AllPaths (version R41313) | Celera Assembler (version 5.4) | Arachne (version 3.0.0) | Allpaths (version R41313)a | SOAP deNovo | SOAP deNovo | SOAP deNovo | Newbler |
| Sequence method | Illumina HiSeq 2000 | Illumina GAIIx & HiSeq 2000, Roche 454 | Illumina HiSeq | Illumina GAII & HiSeq 2000 | Sanger, Roche 454 | Sanger | Illumina GAII & HiSeq 2000 | Illumina GAII | Illumina HiSeq 2000 | Illumina HiSeq 2000 | Roche 454, Illumina, Sanger |
| Average read depth | 85.5X | 20X | 28X | 74X | 12X | 7.1X | 68X | 109X | 105.6X | 110X | 15X |
| Genome size (Gbp) | 1.77 | 1.44 | 1.36–1.59 | 2.12 | 1.20 | 2.17 | 2.88 | 2.21 | 2.24 | 2.6 | |
| Total bases in contigs (excluding unknown bases, Ns) | 1,747,541,145 | 1,384,532,810 | 1,380,486,984 | 2,088,185,434 | 1,032,841,023 | 1,701,336,547 | 2,129,643,287 | 2,198,585,703 | 2,106,622,020 | 2,110,365,500 | 2,173,204,098 |
| Total bases in scaffolds | 1,816,115,349 | 1,435,035,089 | 1.66 Gbp | 2,120,573,303 | 1,046,932,099 | 1,799,143,587 | 2,174,259,888 | 2,270,567,745 | 2,202,483,752 | 2,208,410,377 | 2,365,766,571 |
| No. of scaffolds (>100 bp) | 543,500 | 39,113 | - | 23,365 | 16,847 | 6,645 | 14,645 | 9,317 | 19,904 | 140,023 | 78,631 |
| N50 scaffold (kbp) | 2,291 | 214 | 226 | 204 | 12,877 | 4,033 | 509 | 2,188 | 3,351 | 3,864 | 6,606 |
| No. of contigs (>100 bp) | 636,524 | 274,244 | 816,633 | 112,407 | 27,041 | 41,986 | 114,159 | 177,282 | 205,380 | 274,367 | 262,326 |
| N50 contig (kbp) | 31.2 | 10.7 | 5.2 | 32.7 | 279 | 79.9 | 36 | 23.4 | 22.0 | 29.2 | 21.3 |
| Repeat content | 39.5 | 31.8 | 35.2 | 37.5 | 9.4 | 34.4 | 37.7 | 37.6 | 42.47 | 37.35 | 9.82 |
| No. protein-coding genes | 19,406 | 17262 | - | 13,321 | 15,508 | 17,472 | 23,323 | 14,043 | 19,327 | 19,633 | -- |
Information is taken from the NCBI database (http://www.ncbi.nlm.nih.gov/assembly), with additional data from the primary papers in which the findings were originally published. aManual scaffolding
Fig. 5Comparisons of gene parameters among Pogona vitticeps, Gallus gallus, Python bivittatus, Anolis carolinensis, and Pelodiscus sinensis genomes