| Literature DB >> 30227846 |
Colin Kern1, Ying Wang1, James Chitwood1, Ian Korf2, Mary Delany1, Hans Cheng3, Juan F Medrano1, Alison L Van Eenennaam1, Catherine Ernst4, Pablo Ross5, Huaijun Zhou6.
Abstract
BACKGROUND: Numerous long non-coding RNAs (lncRNAs) have been identified and their roles in gene regulation in humans, mice, and other model organisms studied; however, far less research has been focused on lncRNAs in farm animal species. While previous studies in chickens, cattle, and pigs identified lncRNAs in specific developmental stages or differentially expressed under specific conditions in a limited number of tissues, more comprehensive identification of lncRNAs in these species is needed. The goal of the FAANG Consortium (Functional Annotation of Animal Genomes) is to functionally annotate animal genomes, including the annotation of lncRNAs. As one of the FAANG pilot projects, lncRNAs were identified across eight tissues in two adult male biological replicates from chickens, cattle, and pigs.Entities:
Keywords: Epigenetics; Gene regulation; Long non-coding RNAs
Mesh:
Substances:
Year: 2018 PMID: 30227846 PMCID: PMC6145346 DOI: 10.1186/s12864-018-5037-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Total number of aligned and filtered RNA-seq reads per tissue
| Chicken | Cattle | Pig | |
|---|---|---|---|
| Adipose | 198,929,564 | 156,656,620 | 119,721,691 |
| Cerebellum | 242,807,223 | 246,658,282 | 152,762,359 |
| Cortex | 236,147,593 | 119,721,576 | 126,240,107 |
| Hypothalamus | 244,215,661 | 142,709,163 | 132,786,659 |
| Liver | 244,674,805 | 119,617,850 | 104,210,750 |
| Lung | 205,055,604 | 138,746,254 | 198,053,139 |
| Muscle | 238,435,618 | 140,106,635 | 155,724,909 |
| Spleen | 201,084,991 | 150,804,156 | 125,682,422 |
The number of genes assembled from each RNA-seq library
| Chicken A | Chicken B | Cattle A | Cattle B | Pig A | Pig B | |
|---|---|---|---|---|---|---|
| Adipose | 25,837 | 27,020 | 50,396 | 51,271 | 49,322 | 47,401 |
| Cerebellum | 33,830 | 33,729 | 70,001 | 81,189 | 60,174 | 66,127 |
| Cortex | 35,110 | 35,984 | 46,410 | 52,946 | 50,951 | 51,532 |
| Hypothalamus | 33,437 | 34,457 | 53,784 | 54,949 | 53,811 | 46,592 |
| Liver | 25,127 | 27,235 | 45,275 | 47,518 | 43,793 | 44,592 |
| Lung | 30,680 | 29,747 | 50,051 | 59,447 | 66,299 | 61,041 |
| Muscle | 23,414 | 23,417 | 39,334 | 38,960 | 43,307 | 42,422 |
| Spleen | 30,927 | 31,752 | 56,125 | 62,107 | 61,337 | 57,744 |
The number of transcripts assembled from each RNA-seq library
| Chicken A | Chicken B | Cattle A | Cattle B | Pig A | Pig B | |
|---|---|---|---|---|---|---|
| Adipose | 66,252 | 67,811 | 96,844 | 98,317 | 90,838 | 88,337 |
| Cerebellum | 76,797 | 76,515 | 119,305 | 131,204 | 104,161 | 110,994 |
| Cortex | 78,157 | 79,363 | 92,521 | 100,484 | 93,695 | 94,132 |
| Hypothalamus | 76,096 | 77,811 | 101,482 | 103,398 | 97,113 | 88,079 |
| Liver | 64,847 | 68,013 | 90,252 | 93,361 | 80,706 | 80,826 |
| Lung | 72,857 | 71,558 | 97,876 | 108,481 | 111,665 | 105,423 |
| Muscle | 61,921 | 61,825 | 82,076 | 81,887 | 82,664 | 81,214 |
| Spleen | 73,368 | 74,021 | 103,069 | 110,812 | 105,930 | 101,208 |
The number of each Cufflinks “class code” in the transcriptome merged from all tissues
| = | j | u | x | o | s | |
|---|---|---|---|---|---|---|
| Chicken | 49,456 | 40,620 | 21,034 | 3205 | 802 | 0 |
| Pig | 54,311 | 41,237 | 35,046 | 4306 | 925 | 7 |
| Cattle | 64,413 | 45,759 | 30,504 | 3736 | 1071 | 0 |
“=” is a complete match of an existing transcript in the NCBI annotation. “j” is a potential novel isoform of an existing transcript. “u” is an unknown intergenic transcript. “x” is an exonic overlap on the opposite strand. “o” is an overlap with annotated exons, but is not classed as “j” because no splice sites match. “s” is an intronic overlap on the opposite strand. See http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/ for more details
Fig. 1Identification of lncRNAs. a Computational pipeline used to identify lncRNAs. b Total number of lncRNAs identified per species. c The percentage of lncRNAs that match previously annotated lncRNAs in the NCBI annotation, are novel isoforms of previously annotated lncRNAs, or are expressed from unannotated genomic loci. A lncRNA was considered a novel isoform if it shared some exons with an annotated gene, but had additional unannotated exons or novel splicing. Previously annotated lncRNA had the same exons and splicing as an annotated gene. LncRNAs expressed from novel loci were in regions of the genome that no annotated transcript originated. d Distribution of transcript lengths of both lncRNAs and annotated protein-coding genes. e Distribution of the number of exons of both lncRNAs and protein-coding genes. f Distribution of the number of isoforms of both lncRNAs and protein-coding genes
The number of lncRNA transcripts and loci from NCBI annotations and this study
| Chicken | Cattle | Pig | Human | Mouse | |
|---|---|---|---|---|---|
| NCBI Transcripts | 6072 | 6187 | 14,503 | 27,986 | 21,705 |
| Novel Transcripts | 9393 | 7235 | 14,429 | – | – |
| NCBI Loci | 4167 | 4601 | 10,388 | 15,765 | 11,957 |
| Novel Loci | 4654 | 4325 | 8772 | – | – |
LncRNA comparison with the NONCODEv5 database based on sequence similarity
| Novel LncRNA | NONCODE | Overlap | |
|---|---|---|---|
| Chicken | 9393 | 12,850 | 730 |
| Pig | 14,429 | 29,585 | 5424 |
| Cattle | 7235 | 23,515 | 403 |
Fig. 2Potential regulatory targets of lncRNAs. a Percentage of lncRNAs that are intergenic, overlapping with exons of protein-coding genes, or overlapping with gene introns. LncRNAs were considered overlapping with exons if at least 1 base pair of a lncRNA exon overlapped a gene exon. A lncRNA was considered overlapping with gene introns if at least 1 base pair of a lncRNA exon overlapped a gene intron. Intergenic lncRNA had no exon overlap with any annotated protein coding gene region. b Percentage of genic (overlapping genes) lncRNAs that overlap on the same strand (sense) or opposite strand (antisense) and with exons or introns. c Percentage of intergenic lncRNAs that are upstream or downstream and on the same strand or opposite strand of the nearest gene. d, e Difference in the Spearman correlation of expression between lncRNA-mRNA pairs from the average correlation, grouped by positional relationship (d) and tissue (e). f Spearman correlation of expression of antisense upstream (divergent) lncRNA-mRNA pairs at different distances between the transcripts
Number of lncRNAs in each genomic location group
| Chicken | Cattle | Pig | |
|---|---|---|---|
| Sense Intergenic Upstream | 1302 | 843 | 1733 |
| Sense Intergenic Downstream | 1679 | 923 | 1868 |
| Antisense Intergenic Upstream | 2063 | 1747 | 3069 |
| Antisense Intergenic Downstream | 1168 | 790 | 1696 |
| Intergenic, No Gene Within 100 kb | 1208 | 1216 | 3109 |
| Sense Containing Exonic | 227 | 182 | 344 |
| Sense Overlapping Exonic | 48 | 46 | 79 |
| Sense Nested Exonic | 49 | 41 | 109 |
| Sense Containing Intronic | 58 | 30 | 72 |
| Sense Overlapping Intronic | 27 | 25 | 21 |
| Sense Nested Intronic | 166 | 128 | 232 |
| Antisense Containing Exonic | 8 | 12 | 14 |
| Antisense Overlapping Exonic | 465 | 372 | 565 |
| Antisense Nested Exonic | 119 | 75 | 198 |
| Antisense Containing Intronic | 110 | 97 | 205 |
| Antisense Overlapping Intronic | 362 | 418 | 622 |
| Antisense Nested Intronic | 334 | 290 | 493 |
| Total | 9393 | 7235 | 14,429 |
Fig. 3Tissue-specific lncRNAs. a The number of tissue-specific lncRNAs identified per species and tissue. b, c, d The percentage of tissue-specific lncRNAs expressed above various FPKM levels in chicken (b), cattle (c), and pig (d) respectively. e The percentage of protein-coding genes associated with tissue-specific lncRNA that are also tissue specific
Fig. 4Conservation of lncRNAs. a Phylogenetic tree of the five animal species used for conservation analysis. b LncRNAs positionally conserved in other species. The numbers with the same species on the row and column indicate lncRNAs that are within 50 kb of protein-coding genes with orthologs in the other four species. Because the analysis relied on associating lncRNAs with genes that had orthologs in the other species, this number represents the number of lncRNAs that were included in the conservation analysis. c The percentage of lncRNAs positionally conserved in other species. d The top 8 GO terms, ranked by lowest FDR, enriched in lncRNAs conserved across all five species