| Literature DB >> 26496443 |
Lambros T Koufariotis1, Yi-Ping Phoebe Chen2, Amanda Chamberlain3, Christy Vander Jagt3, Ben J Hayes1.
Abstract
Long non-coding RNA (lncRNA) have been implicated in diverse biological roles including gene regulation and genomic imprinting. Identifying lncRNA in bovine across many differing tissue would contribute to the current repertoire of bovine lncRNA, and help further improve our understanding of the evolutionary importance and constraints of these transcripts. Additionally, it could aid in identifying sites in the genome outside of protein coding genes where mutations could contribute to variation in complex traits. This is particularly important in bovine as genomic predictions are increasingly used in genetic improvement for milk and meat production. Our aim was to identify and annotate novel long non coding RNA transcripts in the bovine genome captured from RNA Sequencing (RNA-Seq) data across 18 tissues, sampled in triplicate from a single cow. To address the main challenge in identifying lncRNA, namely distinguishing lncRNA transcripts from unannotated genes and protein coding genes, a lncRNA identification pipeline with a number of filtering steps was developed. A total of 9,778 transcripts passed the filtering pipeline. The bovine lncRNA catalogue includes MALAT1 and HOTAIR, both of which have been well described in human and mouse genomes. We attempted to validate the lncRNA in libraries from three additional cows. 726 (87.47%) liver and 1,668 (55.27%) blood class 3 lncRNA were validated with stranded liver and blood libraries respectively. Additionally, this study identified a large number of novel unknown transcripts in the bovine genome with high protein coding potential, illustrating a clear need for better annotations of protein coding genes.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26496443 PMCID: PMC4619662 DOI: 10.1371/journal.pone.0141225
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Percentage of intergenic assembled unknown transcripts (UT) out of total transcripts found for each tissue.
This graph represents the percentage of unknown transcripts out of the total number of transcripts found after running the cufflinks pipeline on the RNA-Seq data. We see that the number of unknown transcripts discovered in our analysis fluctuates for each tissue, with kidney, liver and lung having some of the highest number of unknown transcripts. These could represent unknown novel RNA sequences, or artefacts and noise.
Number of unknown transcripts that pass the filtering pipeline, showing no coding potential.
| Unknown transcripts with no coding potential | Filtered for moderate to high expression | Transcripts conserved with human lncRNA | Transcripts conserved with mouse lncRNA | Transcripts conserved with both human and mouse lncRNA | |
|---|---|---|---|---|---|
|
| 24,381 | 9,828 | 396 | 115 | 53 |
|
| 20,301 | 10,467 | 645 | 186 | 57 |
|
| 16,336 | 9,778 | 289 | 119 | 36 |
Fig 2Total number of class 3 putative lncRNA per chromosome vs the size of each chromosome.
This figure shows the correlation with the number of class 3 putative lncRNA found on each chromosome vs the actual size of our chromosome. The blue bars indicate the number of class 3 transcripts (as a percentage of the total number). The red line indicates the size of the chromosome (as a percentage of the total nucleotide size).
Fig 3Tissue x tissue heat map and hierarchical clustering of gene co-expression data for putative intergenic long ncRNA.
This heat map shows the number of transcripts that are co-expressed in each of the tissues in relation to another tissue along with the replicates (calculated using the package DESeq). The order of the tissues is based on their pairwise distances. The colour indicates the level of the expression correlation within tissue replicates and between tissue samples. The darker the blue colour is the higher the correlations are. A white colour indicates no similarities in the expression data.
Fig 4Differential expression heat map of class 3 lncRNA.
This heat map shows the number of transcripts that either upregulated or downregulated for each tissue. On the x-axis are the upregulated tissues. On the y-axis are the downregulated tissues. The tissues are ordered and grouped based on upregulation. Red colors indicating the most differential expression, while white colors indicate the least differential expression.
Fig 5Average number of differential expressed class 3 transcripts that are either upregulated or downregulated.
This graph shows us the average number of class 3 putative lncRNA transcripts that are either upregulated (blue bars) or downregulated (red bars). We see that in the tissues kidney, liver and thymus there are, on average, more upregulated transcripts, while in the tissues leg muscle, ovaries, spleen and tongue there is, on average, more downregulated transcripts.
Fig 6Correlation analysis between the expression patterns of the putative lncRNA transcripts and the neighboring protein coding transcripts.
The Pearson’s correlation anlysis is represented by the blue bars. Spearman’s rho is represented by the orange bars. We considered a cut off for the level of correlation to be <0.6 for a lncRNA/mRNA pair that is uncorrelated
Validated lncRNA from blood stranded RNA-Seq that overlap with class 3 un-stranded lncRNA.
| Cow ID | Overlap with class 3 lncRNA | Overlap only with blood class 3 lncRNA (3,018) | Overlap with class 3 lncRNA close to protein coding genes (1,547) | Have coding direction on opposite strands to protein coding genes |
|---|---|---|---|---|
| 210004817 | 1,076 | 795 | 158 | 92 |
| Y10ST0027 | 630 | 500 | 90 | 40 |
| Y10ST0106 | 2,057 | 1,327 | 293 | 153 |
| Combined | 2,508 | 1,668 | 390 | 198 |
Fig 7Venn diagram of number of validated class3 lncRNA with stranded blood libraries that are found between each animal.
This figure represents the number of common and unique class 3 lncRNA validated with stranded RNA-Seq from blood that are found in each animal. The green circle represents the validated lncRNA found in the cow 210004817. The orange circle represents the validated lncRNA found in the cow Y10ST0106. The blue circle represents the class 3 lncRNA found in the cow daisy. The red circle represents the validated lncRNA found in the cow Y10ST0027. In the very middle we see that 237 validated class 3 lncRNA are found in all 4 animals.
Pseudogenes with transcripts that are moderately to highly expressed, and show significant sequence similarity with a lncRNA.
| Pseudogene | Ensembl defined loci | Cufflinks predicted loci | Conserved Protein from Blastx | Conserved lncRNA |
|---|---|---|---|---|
|
|
|
| Ferritin heavy chain gene |
|
|
| ||||
|
|
|
| enhancer-binding protein |
|
|
|
|
| 60S ribosomal protein L10, partial |
|
|
| ||||
|
|
|
| ribosomal protein L9-like |
|
|
| ||||
|
| ||||
|
|
|
| zinc finger protein 22 |
|
|
|
|
| heat shock-related 70 kda protein |
|
|
| ||||
|
| ||||
|
|
|
| recQ-mediated genome instability protein 1 |
|
|
| ||||
|
|
|
| eukaryotic peptide chain release factor GTP-Binding subunit |
|
|
| ||||
|
| ||||
|
| ||||
|
|
|
| MARCKS-related protein 1 (MARCKSL1) |
|
|
|