| Literature DB >> 25790166 |
Alicja Pacholewska1, Michaela Drögemüller2, Jolanta Klukowska-Rötzler3, Simone Lanz4, Eman Hamza5, Emmanouil T Dermitzakis6, Eliane Marti5, Vincent Gerber4, Tosso Leeb2, Vidhya Jagannathan2.
Abstract
Complete transcriptomic data at high resolution are available only for a few model organisms with medical importance. The gene structures of non-model organisms are mostly computationally predicted based on comparative genomics with other species. As a result, more than half of the horse gene models are known only by projection. Experimental data supporting these gene models are scarce. Moreover, most of the annotated equine genes are single-transcript genes. Utilizing RNA sequencing (RNA-seq) the experimental validation of predicted transcriptomes has become accessible at reasonable costs. To improve the horse genome annotation we performed RNA-seq on 561 samples of peripheral blood mononuclear cells (PBMCs) derived from 85 Warmblood horses. The mapped sequencing reads were used to build a new transcriptome assembly. The new assembly revealed many alternative isoforms associated to known genes or to those predicted by the Ensembl and/or Gnomon pipelines. We also identified 7,531 transcripts not associated with any horse gene annotated in public databases. Of these, 3,280 transcripts did not have a homologous match to any sequence deposited in the NCBI EST database suggesting horse specificity. The unknown transcripts were categorized as coding and noncoding based on predicted coding potential scores. Among them 230 transcripts had high coding potential score, at least 2 exons, and an open reading frame of at least 300 nt. We experimentally validated 9 new equine coding transcripts using RT-PCR and Sanger sequencing. Our results provide valuable detailed information on many transcripts yet to be annotated in the horse genome.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25790166 PMCID: PMC4366165 DOI: 10.1371/journal.pone.0122011
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Workflow of the analysis.
The principal steps of the analysis and the format of the output files are given.
Fig 2Distribution of exons and transcripts identified in the new transcriptome assembly.
The number of exons and transcripts is shown on the y-axis on the left; the length of chromosomes in base pairs is shown on the y-axis on the right.
Fig 3Number of transcripts.
The number of transcripts mapped to known/predicted gene models annotated in the Ensembl/NCBI databases and assembled into unknown transcripts.
Statistics on the horse PBMCs transcriptome assembly.
| Class | # transcripts | # genes | # single-exon transcripts |
|---|---|---|---|
| Unknown | 7,531 | 6,006 | 5,281 |
| Complete match to annotated | 59,103 | 26,493 | 7,705 |
| Novel isoforms | 129,309 | 11,717 | 0 |
| Other | 89,595 | 12,321 | 418 |
| TOTAL | 285,538 | 42,602 | 13,404 |
aThe numbers of transcripts identified in the assembly are given after filtering transcripts with expression values less than 0.01 FPKM per sample. The classes are defined according ton the Cufflinks manual [21]:
bUnknown, intergenic transcripts
cTranscripts with complete match of intron chain to reference transcript
dPotentially novel isoform with at least one splice junction shared with a reference transcript
eTranscripts with an intron overlapping a reference intron on the opposite strand (n = 74,073); transcripts with generic exonic overlap with a reference transcript (n = 12,700); transcripts with an exonic overlap with reference on the opposite strand (n = 2,780); transcripts falling entirely within a reference intron (n = 39); possible polymerase run-on fragment (n = 2); possible repeat sequence (n = 1).
Fig 4Distribution of the number of exons in the putative new equine transcripts.
Fig 5Example of a new putative horse-specific coding transcript.
The new transcript is indicated in the UCSC Genome Browser view.
Fig 6Experimental verification of the expression of predicted new equine transcripts.
An agarose gel with transcripts amplified by 36 cycles of RT-PCR is shown.