| Literature DB >> 28107812 |
T A Mansour1,2, E Y Scott3, C J Finno1, R R Bellone1,4, M J Mienaltowski3, M C Penedo4, P J Ross3, S J Valberg5, J D Murray1,3, C T Brown6.
Abstract
BACKGROUND: Transcriptome interpretation relies on a good-quality reference transcriptome for accurate quantification of gene expression as well as functional analysis of genetic variants. The current annotation of the horse genome lacks the specificity and sensitivity necessary to assess gene expression especially at the isoform level, and suffers from insufficient annotation of untranslated regions (UTR) usage. We built an annotation pipeline for horse and used it to integrate 1.9 billion reads from multiple RNA-seq data sets into a new refined transcriptome.Entities:
Keywords: Equine transcriptome; RNA-seq; Tissue-specificity
Mesh:
Substances:
Year: 2017 PMID: 28107812 PMCID: PMC5251313 DOI: 10.1186/s12864-016-3451-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sample and library preparations used as input for our equine transcriptome
| Tissue | Library Preparation | Library Characteristics | #Samples | #Frag (M) | #bp (Gb) | Reference |
|---|---|---|---|---|---|---|
| Brainstem | RiboRNA-depleted | PEl00bp, stranded | 8* | 166.73 | 33.68 | Finno et al., 2016 [ |
| Cerebellum | RiboRNA-depleted | PEl00bp, stranded | 12 | 411.48 | 82.3 | Scott et al., 2016 [ |
| Muscle | Poly-A capture | PE125bp, stranded | 12 | 301.94 | 76.08 | |
| Retina | Poly-A captured | PE80bp unstranded | 2 | 20.3 | 3.28 | Bellone et al., 2013 |
| Spinal Cord | RiboRNA-depleted | PEl00bp, stranded | 16* | 403 | 81.4 | Finno et al., 2016 [ |
| Skin | Poly-A captured | PE80bp, unstranded | 2 | 18.54 | 3 | Holl et al., 2016 |
| Poly-A captured | SE80bp, unstranded | 2 | 16.57 | 1.34 | Holl et al., 2016 | |
| Poly-A captured | SE95bp unstranded | 3 | 105.51 | 10.02 | Bellone et al., 2013 | |
| Embryo ICM | Ovation RNA-seq | PEl00bp, unstranded | 3 | 126.32 | 25.26 | Iqbal et al., 2014 |
| Ovation RNA-seq | SEl00bp, unstranded | 3 | 115.21 | 11.52 | Iqbal et al., 2014 | |
| Embryo TE | Ovation RNA-seq | PEl00bp, unstranded | 3 | 129.84 | 25.96 | Iqbal et al., 2014 |
| Ovation RNA-seq | SEl00bp, unstranded | 3 | 102.26 | 10.23 | Iqbal et al., 2014 | |
| Total | 1917.7 | 364.07 |
Notes: *Seven individuals had both brainstem and spinal cord tissue collected from them. Seven of the skin samples were taken from 5 individuals and one individual had both retina and skin sampled, bringing our total number of individuals to 59
Fig. 1An outline of the workflow used to generate each version of the transcriptome. Transcriptome products are in ovals. Programs used to perform various steps are indicated in parentheses. All transcriptome versions and the pipeline scripts are publically available
Comparison of current public equine annotations to six versions of our transcriptome (bolded and outline in red) in terms of gene numbers and composition
Fig. 2Comparison of our refined transcriptome to current equine annotations. The degree of similarity between our refined transcriptome and current annotations can be found in (a). The annotation of MUTYH in the refined version of the transcriptome shows the addition of several isoforms, α, β, and γ, as seen in the human, of MUTYH (b). The gene annotation of CYP7A1 in the refined transcriptome also shows the inclusion of an extended alternative first exon not seen in other species (c)
Fig. 3Tissue-specific gene and isoform composition of the transcriptome. A heatmap of genes with high expression and substantial expression differences across tissues (a). A bar graph showing isoforms uniquely present (the bar outlined in red above the x-axis) or solely absent (the blue outlined bars extending below the x-axis). The green trendline corresponds to the cumulative TPM of the uniquely present transcripts (b). A stacked bar graph showing the transcription percentage of mitochondrial genes versus nuclear encoded genes (c). Emb. Is short for embryo
Tissue-specific splicing rate as calculated by Cuffcompare, with relevant number of multi-exonic transcripts and multi-transcript loci per tissue
| Embryo ICM | Embryo TE | Skin | Brainstem | Cerebellum | Retina | Spinal cord | Muscle | |
|---|---|---|---|---|---|---|---|---|
| Genes | 33,998 | 32,050 | 30,003 | 34,792 | 36,139 | 26,733 | 34,980 | 29,549 |
| Transcripts | 57,400 | 54,424 | 51,995 | 62,993 | 66,364 | 47,095 | 66,001 | 52,000 |
| multi-exon transcripts | 44,069 | 42,433 | 42,432 | 49,346 | 51,640 | 39,420 | 52,175 | 42,483 |
| Multi-transcript loci | 11,938 | 11,461 | 11,797 | 13,066 | 13,334 | 10,866 | 13,352 | 11,560 |
| Splicing rate | 1.7 | 1.7 | 1.7 | 1.8 | 1.8 | 1.8 | 1.9 | 1.8 |
Fig. 4Novel gene analysis and classification. A bar graph showing the comparison of all the novel genes against the current equine annotations (a). The three categories of novel genes were supported novel genes (Category I), unsupported, but conserved, novel genes (Category II) and the unsupported, un-conserved, but novel genes with an ORF (Category III). A stacked bar graph of transcript counts with all three categories of novel genes showing exonic composition (b) and their cumulative TPM in a tissue specific manner (c)