| Literature DB >> 23343042 |
Walter L Eckalbar1, Elizabeth D Hutchins, Glenn J Markov, April N Allen, Jason J Corneveaux, Kerstin Lindblad-Toh, Federica Di Palma, Jessica Alföldi, Matthew J Huentelman, Kenro Kusumi.
Abstract
BACKGROUND: The green anole lizard, Anolis carolinensis, is a key species for both laboratory and field-based studies of evolutionary genetics, development, neurobiology, physiology, behavior, and ecology. As the first non-avian reptilian genome sequenced, A. carolinesis is also a prime reptilian model for comparison with other vertebrate genomes. The public databases of Ensembl and NCBI have provided a first generation gene annotation of the anole genome that relies primarily on sequence conservation with related species. A second generation annotation based on tissue-specific transcriptomes would provide a valuable resource for molecular studies.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23343042 PMCID: PMC3561122 DOI: 10.1186/1471-2164-14-49
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Overview of transcript assembly for based on RNA-Seq data from 14 adult and embryonic tissues and deposited EST sequence data
| Embryo-28 somite stage | 52,548,024 | 83,627 | 81,032 | 22,670 |
| Embryo-38 somite stage | 55,048,179 | 99,578 | 95,753 | 24,595 |
| Regenerating tail tip | 122,099,352 | 92,275 | 88,150 | 22,278 |
| Regenerating tail base | 31,721,054 | 78,005 | 73,516 | 24,897 |
| Original tail | 109,404,060 | 96,450 | 91,601 | 20,240 |
| Adrenal | 55,858,836 | 110,349 | 101,449 | 20,482 |
| Brain | 32,518,977 | 203,519 | 192,407 | 33,912 |
| Dewlap skin | 31,785,178 | 81,598 | 76,866 | 25,853 |
| Embryos (pooled) | 59,681,427 | 118,949 | 110,124 | 19,969 |
| Heart | 34,068,834 | 154,255 | 144,617 | 26,582 |
| Liver | 50,782,350 | 89,010 | 81,441 | 21,549 |
| Lung | 48,723,049 | 272,071 | 255,035 | 37,985 |
| Ovary | 35,139,647 | 80,306 | 75,807 | 26,827 |
| Skeletal Muscle | 42,707,477 | 75,006 | 69,250 | 18,857 |
| Brain | 19,139 | 5,631 | 9,991 | 1,715 |
| Dewlap skin | 19,809 | 5,453 | 10,180 | 2,216 |
| Embryo | 38,923 | 8,714 | 9,991 | 4,158 |
| Mixed Organ | 19,863 | 5,657 | 9,327 | 2,053 |
| Ovary | 19,410 | 5,467 | 7,394 | 3,737 |
| Regenerating Tail | 19,851 | 6,751 | 11,064 | 6,757 |
| Testis | 19,807 | 4,261 | 8,677 | 2,594 |
Figure 1A. Diagram of the bioinformatic pipeline for the reannotation. B. Venn diagram illustrating the sources of data for the A. carolinensis reannotation. Ab initio, algorithm based gene predictions using Augustus and SNAP [26-28]. RefSeq, alignments of zebrafish, Xenopus frog, chicken, mouse, and human protein and available vertebrate transcripts to the Anocar2.0 genome assembly. NCBI/Ensembl, combined data of A. carolinensis genome annotations from NCBI ref_Anocar2.0 and Ensembl Build 65. RNA-Seq, transcriptomic data from analysis of 14 adult and embryonic tissues.
Comparison of ASU, NCBI and Ensembl gene annotations of the genome
| Annotated genes | 22,962 | 15,645 | 17,792 |
| Annotated transcript isoforms | 59,373 | 16,533 | 18,939 |
| Annotated isoforms/gene | 2.59 | 1.06 | 1.06 |
| All transcript isoforms | 59,373 | 16,533 | 18,939 |
| Transcripts with start & stop codons | 53,401 | 14,667 | 4,170 |
| Transcripts missing start or stop codon | 5,972 | 1,866 | 14,769 |
| Single exon transcripts | 2,070 | 983 | 364 |
| Transcript N50 length | 5,355 | 2,364 | 2,037 |
| Average coding sequence length | 1,964 | 1,701 | 1,531 |
| Total number of exons | 229,204 | 156,742 | 174,545 |
| Exons with start with codon | 29,677 | 13,512 | 5,971 |
| Exons without start or stop codon | 168,367 | 128,486 | 158,935 |
| Exons with stop codon | 29,727 | 13,779 | 9,278 |
| Exons/annotated transcript | 12.05 | 10.11 | 9.62 |
| Average exon length | 170 | 170 | 160 |
| Total exon length | 38,902,806 | 26,658,387 | 27,910,718 |
| Total transcripts with 3'UTR | 34,926 | 5,861 | 0 |
| Average length of transcripts with 3'UTR | 1,168 | 456 | 0 |
| Total 3'UTR sequence length | 40,798,794 | 2,674,388 | 0 |
| Total transcripts with 5'UTR | 46,782 | 6,168 | 0 |
| Average length of transcripts with 5'UTR | 244 | 86 | 0 |
| Total 5'UTR sequence length | 11,422,626 | 527,454 | 0 |
| Total number of introns | 192,418 | 141,362 | 155,949 |
| Average intron length | 4,525 | 4,463 | 2,553 |
| Total intron sequence length | 870,771,088 | 630,937,171 | 398,124,572 |
Figure 2Increased N50 transcript length and number of predicted transcripts in the ASU annotation.A. The distribution of transcript lengths is shown for the ASU, NCBI and Ensembl genome annotations. The ASU annotation transcript N50 length of 5,355 bp is greater than values for the first generation annotations from Ensembl (2,037 bp) and NCBI (2,364 bp). B. A boxal plot showing the median (horizontal line) and boundaries for the 25th and 75th percentiles (box) as well as the range for the ASU, NCBI, and Ensembl predicted transcripts. C. The Notch ligand dll1 is an example of gene whose annotation has been markedly improved in the ASU annotation.
genes that are unique to the ASU annotation and have vertebrate orthologues
| Annotated genes | 2,928 |
| Annotated transcript isoforms | 3,612 |
| Annotated isoforms/gene | 1.23 |
| All transcript isoforms | 3,612 |
| Transcripts with start & stop codons | 2,698 |
| Transcripts missing start or stop codon | 914 |
| Single exon transcripts | 301 |
| Transcript N50 length | 2,157 |
| Average coding sequence length | 1,182 |
| Total number of exons | 18,921 |
| Exons with start with codon | 2,468 |
| Exons without start or stop codon | 13,901 |
| Exons with stop codon | 2,300 |
| Exons/annotated transcript | 6.35 |
| Average exon length | 188 |
| Total exon length | 3,569,265 |
| Total transcripts with 3'UTR | 1,323 |
| Average length of transcripts with 3'UTR | 761.2 |
| Total 3'UTR sequence length | 1,007,040 |
| Total transcripts with 5'UTR | 1,816 |
| Average length of transcripts with 5'UTR | 238.7 |
| Total 5'UTR sequence length | 433,533 |
| Total number of introns | 15,835 |
| Average intron length | 5,304 |
| Total intron sequence length | 83,999,254 |