| Literature DB >> 28676104 |
E Y Scott1, T Mansour2,3, R R Bellone2,4, C T Brown2, M J Mienaltowski1, M C Penedo4, P J Ross1, S J Valberg5, J D Murray1,2, C J Finno6.
Abstract
BACKGROUND: Efforts to resolve the transcribed sequences in the equine genome have focused on protein-coding RNA. The transcription of the intergenic regions, although detected via total RNA sequencing (RNA-seq), has yet to be characterized in the horse. The most recent equine transcriptome based on RNA-seq from several tissues was a prime opportunity to obtain a concurrent long non-coding RNA (lncRNA) database.Entities:
Keywords: Equine transcriptome; Intergenic; Long non-coding RNA
Mesh:
Substances:
Year: 2017 PMID: 28676104 PMCID: PMC5496257 DOI: 10.1186/s12864-017-3884-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
General lncRNA statistics and the number of candidate lncRNA transcripts that passed through each filter. Filter numbers correspond to Fig. 1
| Novel I | Novel II | Novel III | Intergenic | Known lncRNA | total | ||
|---|---|---|---|---|---|---|---|
| Initial number of transcripts | 8459 | 7494 | 6687 | 38,507 | 3956 | 62,216 | |
| Number of lncRNA | F1 | 7193 | 3873 | 1128 | 15,686 | 3523 | 30,998 |
| F2 | 7193 | 3873 | 1128 | 15,281 | 3523 | 28,503 | |
| F3 | 4408 | 3102 | 726 | 15,162 | 2593 | 24,029 | |
| F4 | 3334 | 2475 | 639 | 13,804 | 2011 | 20,800 | |
| Average Length (kb) | 3.2 | 3.2 | 2.3 | 1.2 | 3.8 | - | |
| Average TPM | 18.3 | 28.2 | 3.9 | 1.8 | 4.0 | - | |
| GC% | 45.3 | 45.1 | 48.7 | 43.1 | 44.4 | - | |
| Total bp | 10,604,817 | 7,870,739 | 1,465,708 | 16,880,112 | 5,658,390 | - | |
Fig. 1Filtering pipeline used for candidate lncRNA. The inputs correspond to products of the protein-coding transcriptome [10]
Fig. 2Different behavior seen by inputs novel I, novel II, novel III, intergenic and known lncRNA transcripts during and post filtering. a The amount of transcriptional output removed by each filter (F1, F2, F3 and F4, as labeled in Fig. 1), where the whole pie represents all the transcriptional output of that input and each wedge represents the cumulative TPM removed by each filter. b The exon diversity relative to the total cumulative TPM provided by each input post-filtering
Fig. 3Sequence conservation of equine lncRNA and protein-coding transcripts relative to human transcriptional products. Blast conservation represents the BLASTN identity multiplied by the BLASTN coverage of a given transcript. The cumulative frequency represents the percentage of lncRNA transcripts obtaining a BLASTN conservation measure equal to or less than the indicated x-axis value
Five examples of equine lncRNA compared to human lncRNA in terms of relative position to surrounding genes and BLASTN percent identity and percent coverage of the equine lncRNA relative to the human counterparts
| Proposed lncRNA | Horse coordinates | Distance to nearest gene in horse | Human coordinates | Distance to nearest gene in human | % identity | % coverage |
|---|---|---|---|---|---|---|
|
| chr5:9,536,77 | 390 (5’antisense | chr1: 173,863,900–173,867,989 | 93 (5’antisense | 70 | 43 |
|
| Chr12:255851 | 7235(3′ | Chr11:65,422,798–65,445,538 | 9274 (3′ | 74 | 63 |
|
| Chr19:31292750–31300597 | 1030 (5′ antisense to | chr3:194,487,140–194,488,545 | Overlap with | 68 | 16 |
|
| chrX: 55,214,315–55,2 43,223 | Complete overlap (antisense) to | chrX:73,792,205–73,829,231 | Overlap with | 75 | 54 |
|
| chr3:68,892,305–68,911,651 | 131 (5′ antisense EPHAS) | chr4:65,669,961–65,693,386 | 382 (5′ antisense | 77 | 91 |
Fig. 4Tissue and RNA-seq library preparation effects on lncRNA detection and expression. a There is a positive relationship between the number of annotated genes and candidate lncRNA detected in each tissue; the pie charts represent the cumulative TPM of that tissue with the turquoise correlated to the expression of the protein-coding transcripts and red to the candidate lncRNA expression. The pies outlined in yellow were rRNA-depleted RNA-seq libraries, pies outlined in black were Ovation RNA-seq libraries and the pies outlined in blue were the polyA-captured RNA-seq libraries. b The hierarchically clustered heatmap also shows clustering on a tissue and RNA-seq library level. c There is a distinguishable difference in the number on lncRNA that seem to be unique to a given tissue, with the skin having the largest number of unique lncRNA and the highest cumulative expression associated with its unique lncRNA. The green line represents the cumulative TPM of all the uniquely present lncRNA, divided by 5 for scaling