Literature DB >> 24891613

Genome-wide identification of long intergenic noncoding RNA genes and their potential association with domestication in pigs.

Zhong-Yin Zhou1, Ai-Min Li2, Adeniyi C Adeola3, Yan-Hu Liu4, David M Irwin5, Hai-Bing Xie6, Ya-Ping Zhang7.   

Abstract

Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in the human and mouse genomes, some of which play important roles in fundamental biological processes. The pig is an important domesticated animal, however, pig lincRNAs remain poorly characterized and it is unknown if they were involved in the domestication of the pig. Here, we used available RNA-seq resources derived from 93 samples and expressed sequence tag data sets, and identified 6,621 lincRNA transcripts from 4,515 gene loci. Among the identified lincRNAs, some lincRNA genes exhibit synteny and sequence conservation, including linc-sscg2561, whose gene neighbor Dnmt3a is associated with emotional behaviors. Both linc-sscg2561 and Dnmt3a show differential expression in the frontal cortex between domesticated pigs and wild boars, suggesting a possible role in pig domestication. This study provides the first comprehensive genome-wide analysis of pig lincRNAs. © Crown copyright 2014.

Entities:  

Keywords:  domestication; lincRNA; pig

Mesh:

Substances:

Year:  2014        PMID: 24891613      PMCID: PMC4079208          DOI: 10.1093/gbe/evu113

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Genome-wide analyses have uncovered more than 9,000 long intergenic noncoding RNA (lincRNA) genes in the human genome (Khalil et al. 2009; Jia et al. 2010; Cabili et al. 2011; Derrien et al. 2012), and more than 10,000 lincRNA transcripts in the mouse genome (Ravasi et al. 2006; Mitchell Guttman et al. 2009; Guttman et al. 2010). Several studies have indicated that some lincRNAs play important roles in fundamental biological processes, such as dosage compensation (Borsani et al. 1991; Brockdorff et al. 1992; Brown et al. 1992; Payer and Lee 2008), maintenance of pluripotency (Guttman et al. 2011), transcriptional regulation (Huarte et al. 2010; Orom et al. 2010; Hung et al. 2011), and epigenetic regulation (Martianov et al. 2007; Rinn et al. 2007). The pig is an important domesticated animal and is a significant large-animal model for medical research. Thousands of years of selection have created considerable diversity in the phenotypes of pigs. Many protein-coding genes with major effects on diversity in pigs have been identified, including IGF2 (Van Laere et al. 2003), NR6A1 (Mikawa et al. 2007), MC1R (Fang et al. 2009), and RYR1 (Fujii et al. 1991). The contribution of changes in lincRNAs to the domestication of pigs is currently unknown. To address this question, a comprehensive genome-wide identification of lincRNAs is required. Here, we identified a total of 6,621 lincRNAs, encoded by 4,515 gene loci, and profiled the expression of these lincRNAs in various tissues. Several lincRNA sequences were found to share homology with sequences in the human and mouse lincRNA data sets. Finally, we profiled changes in the expression of lincRNAs using RNA sequence (RNA-seq) data sets from the brain of domesticated pigs and wild boars, and found one lincRNA (linc-sscg2561), and its neighboring gene Dnmt3a, which might be associated with differences in emotional behavior between the domesticated pig and the wild boar.

Results and Discussion

Identification of lincRNAs Based on Expressed Sequence Tag and RNA Sequencing Data Sets

Only 47 pig lincRNA transcripts are annotated in the Ensembl database (version 73), a quantity far lower than that known for the human or mouse. As the human genome has identified about 9,000 lincRNA genes (Derrien et al. 2012), and the pig genome is of comparable size and contains a similar number of protein-coding genes (Groenen et al. 2012), one might expect that the pig will also should have similar number of lincRNA genes. Hence, a large number of pig lincRNAs are likely undetermined. To comprehensively identify pig lincRNAs, we used expressed sequence tag (EST) (UniGene) and RNA-seq data sets and performed searches using the following criteria that provide a strict definition for lincRNAs: 1) Transcript must include ≥2 exons, 2) length should be ≥200 nt, 3) must be located at least 500 bp away from any protein-coding genes or house-keeping ncRNAs genes annotated in the Ensembl Sus scrofa 10.2 gene set (GTF), and 4) Coding Potential Calculator (CPC) score of less than −1, as calculated using the CPC tool (Kong et al. 2007) to assess the protein-coding potential for every transcript (fig. 1A). A total of 1,125 lincRNA transcripts from 1,090 intergenic regions were identified in the pig genome from the EST data set.
F

Identification and characterization of lincRNA genes in the pig. (A) Pipeline for the identification of lincRNAs. (B) Comparison of the mRNA–lincRNA intervals, mRNA–mRNA intervals, and sizes of mRNA introns. (C) Expression levels of lincRNA and protein-coding genes detected using RNA-seq data from ten tissues (ERA178851).

Identification and characterization of lincRNA genes in the pig. (A) Pipeline for the identification of lincRNAs. (B) Comparison of the mRNA–lincRNA intervals, mRNA–mRNA intervals, and sizes of mRNA introns. (C) Expression levels of lincRNA and protein-coding genes detected using RNA-seq data from ten tissues (ERA178851). High-throughput RNA sequencing has been used to identify lincRNAs in diverse species (Cabili et al. 2011; Ulitsky et al. 2011; Liu et al. 2012). To identify novel pig lincRNA, we used ten RNA sequencing data sets derived from various tissues of the pig. RNA sequencing reads were aligned to the Sus scrofa 10.2 genome (Groenen et al. 2012) using TopHat (Trapnell et al. 2009). Mapped reads were assembled into transcripts using Cufflinks and Cuffcompare (Trapnell et al. 2010, 2012). The number of transcripts identified in the intergenic regions from these ten studies ranged from 2,999 to 48,272 (supplementary table S1, Supplementary Material online). Using our criteria, the number of lincRNAs for each of the ten studies ranged from 222 to 3,010 (supplementary table S1, Supplementary Material online), and could be merged into a single data set of 5,594 lincRNAs encoded by 3,753 gene loci. Of these lincRNAs, 328 genes were detected from both the RNA-seq and EST data sets, with a final total of 6,621 unique lincRNA transcripts being identified. To determine the basic features of pig lincRNAs, we compared our identified lincRNAs with mRNAs identified by Ensembl. LincRNAs are shorter in length than protein-coding transcripts (supplementary fig. S1, Supplementary Material online), and their genes tend to contain fewer exons (supplementary fig. S1, Supplementary Material online). The length and number of exons for lincRNAs might have been overestimated in our study as transcripts with only a single exon were excluded as lincRNAs. Despite having shorter transcript lengths, exons for pig lincRNAs were on average larger (average 451 nt) than those for protein-coding genes (average 221 nt). The distance between lincRNA genes and their closest protein-coding genes was greater than the median distance between adjacent protein-coding genes (median 80,818 nt for mRNA–lincRNA intervals, compared with 36,072 nt for mRNA–mRNA intervals; Mann–Whitney P < 2.2 × 10−16; fig. 1B); 1,354 of the lincRNA genes are located within 10 kb of a protein-coding gene. Gene ontology (GO) enrichment analyses were conducted for the set of protein-coding genes proximal (≤10 kb) to these lincRNAs. These closest neighbors of pig lincRNAs are enriched for GO terms associated with transcriptional regulation processes (supplementary table S2, Supplementary Material online), which is consistent with a previous report in other species (Ulitsky et al. 2011). The distances between lincRNA genes and their closest protein-coding genes were larger than the lengths of the introns in the protein-coding genes (Mann–Whitney P < 2.2 × 10−16; fig. 1B), indicating that these lincRNAs are independent transcripts, rather than being unannotated exons of these protein-coding genes.

Tissue Expression Profile of Pig lincRNAs

We used RNA-seq data sets from ten tissues (ERA178851) (Farajzadeh et al. 2013) of wild boars to characterize the expression pattern of the lincRNA genes. The expression level of lincRNA genes is lower than that of protein-coding genes (Kolmogorov–Smirnov test, P < 2.2 × 10−16; fig. 1C), which has also been observed in other mammalian species (Ravasi et al. 2006; Cabili et al. 2011). This feature implies that lincRNAs and mRNAs have a number of differences in their biogenesis, processing, stability, and spatial–temporal expression patterns. Protein-coding genes proximal to lincRNAs are enriched in specific gene functions. Previous studies have indicated that in some mammalian species, lincRNAs may act in cis to regulate the expression of their neighboring protein-coding genes (Mercer et al. 2009; Orom et al. 2010; Wang et al. 2011). To determine whether lincRNAs in the pig had a similar effect on expression, we focused on pig lincRNAs that are located within 10 kb of a protein-coding gene and tested to see whether there was a correlation in the expression patterns between the lincRNAs and their neighboring protein-coding genes. Across ten tissues, expression of the closely linked lincRNAs tended to correlate with that of their protein-coding neighbors (average Spearman correlation r2 = 0.31). A similar magnitude of correlation was observed for adjacent protein-coding genes (r2 = 0.28). The correlated expression of the lincRNAs and their adjacent protein-coding genes suggests that they may share cis-regulatory modules or chromatin domains. Based on the hierarchically clustering of the gene expression profiles, many lincRNA genes exhibit a tissue preferential expression pattern, which is similar to that of the protein-coding genes (supplementary fig. S2, Supplementary Material online). The differential expression patterns of the lincRNAs were further analyzed using Deseq2 with a cutoff of 2-fold change and padj < 0.1 (Anders and Huber 2010). This analysis identified 581 tissue preferential lincRNAs based on the RNA-seq data set (supplementary fig. S3, Supplementary Material online). Interestingly, 261 lincRNAs are preferentially expressed in the frontal cortex and occipital cortex (supplementary fig. S3, Supplementary Material online).

Identification of Sequence Homology with Human and Mouse lincRNAs

To identify homologs of the pig lincRNAs in humans and mice, we aligned the pig lincRNAs with human and mouse lincRNAs using BLASTn and identified 2,630 (40%) of the pig lincRNAs that had detectable homology with human lincRNAs, and 2,598 (39%) with mouse lincRNAs, of which 1,660 were shared between human and mouse. In comparison, 3,672 (31%) human lincRNAs had detectable homology with mouse lincRNAs when compared using the same approach. Among the pig lincRNA transcripts that align to the human lincRNAs, 187 have one-side or two-side synteny that extends to at least one neighboring protein-coding gene. Similarly, 244 of the pig lincRNAs have one-side or two-side synteny to a protein-coding neighbor in the mouse. These results imply that the pig may be an excellent model for research on lincRNA function.

Differential Expression of lincRNAs in Domesticated Pigs and Wild Boars

Domesticated pigs differ from wild boars in several behavioral traits, such as lower levels of aggressive behavior and reduced fear of humans (Price 1999). Therefore, we considered whether changes in the level of lincRNAs expression in the brain occurred during pig domestication. We analyzed the expression profile of lincRNAs in the published RNA-seq data set derived from the brains of five domesticated pigs and five wild boars (ERA209456) (Albert et al. 2012) and found 30 lincRNAs that show significant differential expression between pigs and wild boars (padj<0.1) (fig. 2A). Of these 30 differentially expressed lincRNA genes, 18 have higher expression in the domesticated pig, and 12 in the wild boar.
F

Expression differences between domesticated pigs and wild boars. (A) Heatmap showing expression abundance of lincRNA genes showing significant differences in expression. Expression levels (FPKM) were measured by RNA-seq. Genes were clustered by hierarchical clustering. (B) Expression differences of linc-sscg2561 and genes in the surrounding 500 kb of genomic DNA. The x axis shows the genomic positions of these genes. A threshold of Padj = 0.1 is indicated by the dashed line.

Expression differences between domesticated pigs and wild boars. (A) Heatmap showing expression abundance of lincRNA genes showing significant differences in expression. Expression levels (FPKM) were measured by RNA-seq. Genes were clustered by hierarchical clustering. (B) Expression differences of linc-sscg2561 and genes in the surrounding 500 kb of genomic DNA. The x axis shows the genomic positions of these genes. A threshold of Padj = 0.1 is indicated by the dashed line. Interestingly, two lincRNA genes (linc-sscg1409 and linc-sscg2561) that have two-sided synteny extending to protein-coding neighbors in humans were found to show differential expression between the domesticated pigs and boars. Linc-sscg1409 is a 1,200-nt transcript encoded by four exons and neighbors the VWA2 and FAM160B1 protein-coding genes. Linc-sscg2561 is a 3,067-nt transcript encoded by two exons and shows tissue-specific expression in the pig brain (frontal cortex and occipital cortex). Our BLASTn search identified a conserved 367-nt match between linc-sscg2561 and a human lincRNA (ENSG00000272048.1). The PhastCons plot from the UCSC 99 vertebrate whole-genome alignment to human showed a conserved region within the terminal exon, which includes the approximately 300-nt region that is conserved between pigs and humans (fig. 3). Linc-sscg2561 displays 1.4-fold higher expression in domesticated pigs compared with boars. As lincRNAs are known to interact with chromatin proteins to positively and negatively regulate expression of neighboring genes (Wang et al. 2011), we conducted an analysis of the protein and lincRNA genes in the 500-kb window surrounding this lincRNA gene. Dnmt3a is the only gene adjacent to this lincRNA gene that displays differential expression, with 1.4-fold higher expression in the domesticated pig (fig. 2B). This observation implies that the linc-sscg2561 may be a regulation element for Dnmt3a gene. Dnmt3a is an important protein with functions in DNA methylation, and a previous study had shown that Dnmt3a regulates behavioral plasticity to emotional stimuli (LaPlant et al. 2010), indicating that linc-sscg2561 and Dnmt3a may influence the methylation of genes in the pig nervous system, and thus contribute to changes in emotional behavior during the domestication of the pig. In addition, experimental studies are needed to unravel the functions of lincRNA genes to understand the domestication of the pig.
F

Linc-sscg2561 shows synteny and sequence conservation. The gray box shows the region with sequence conservation. The PhastCon plot is relative to loci in the human genome and is derived from 99 vertebrate whole-genome alignments. The consensus logo highlights the 367-nt conserved sequence, which was identified from the 99 vertebrate genome alignments. A score of 4 bits indicates that these bases are perfectly conserved in the 99 vertebrate genomes.

Linc-sscg2561 shows synteny and sequence conservation. The gray box shows the region with sequence conservation. The PhastCon plot is relative to loci in the human genome and is derived from 99 vertebrate whole-genome alignments. The consensus logo highlights the 367-nt conserved sequence, which was identified from the 99 vertebrate genome alignments. A score of 4 bits indicates that these bases are perfectly conserved in the 99 vertebrate genomes.

Materials and Methods

We used two types of data sets from the pig for the identification of pig lincRNAs. The first data set included 50,136 UniGene transcripts, which was downloaded from the National Center for Biotechnology Information (NCBI) UniGene database build 42. Blat was used to align the UniGene transcripts against the Sus scrofa 10.2 genome sequence (Groenen et al. 2012) and pslCDnaFilter was used to filter the blat results. A total of 43,942 UniGene transcripts that had unique matches to the genome were retained. The second set of data included ten RNA-seq data sets was downloaded from the NCBI SRA database. RNA-seq reads were mapped to the Sus scrofa 10.2 genome using TopHat version 2.0.8 (Trapnell et al. 2009). Aligned reads for each sample were assembled using Cufflinks version 2.0.2. We then used Cuffcompare to generate intergenic transcripts for each sample assembly. To obtain high confidence transcripts, we used two criteria to filter the transcripts: RNA-seq reads must cover at least 80% of the predicted exon nucleotides for a transcript, and there must be at least three RNA-seq reads mapping to the predicted splice structure in at least one sample. We used strict criteria to identify lincRNAs as figure 1A. Tophat and Cufflinks were used to obtain FPKM (fragments per kilobase of exon per million fragments mapped) value. For each pairwise comparison of the samples, differentially expressed genes were identified based on the integer count data using Deseq2 version 1.2.8. (Anders and Huber 2010). SummarizeOverlaps was used to calculate counts of reads for each gene with the default mode of “Union.” We downloaded human lincRNAs from the Gencode database (v19) (Harrow et al. 2012) and mouse lincRNAs from the NONCODE database (v4) (Xie et al. 2014). NCBI BLASTn was used to identify lincRNA sequence homology.

Supplementary Material

Supplementary tables S1 and S2 and figure S1–S3 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  36 in total

1.  The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus.

Authors:  N Brockdorff; A Ashworth; G F Kay; V M McCabe; D P Norris; P J Cooper; S Swift; S Rastan
Journal:  Cell       Date:  1992-10-30       Impact factor: 41.582

2.  Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome.

Authors:  Timothy Ravasi; Harukazu Suzuki; Ken C Pang; Shintaro Katayama; Masaaki Furuno; Rie Okunishi; Shiro Fukuda; Kelin Ru; Martin C Frith; M Milena Gongora; Sean M Grimmond; David A Hume; Yoshihide Hayashizaki; John S Mattick
Journal:  Genome Res       Date:  2005-12-12       Impact factor: 9.043

3.  Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs.

Authors:  John L Rinn; Michael Kertesz; Jordon K Wang; Sharon L Squazzo; Xiao Xu; Samantha A Brugmann; L Henry Goodnough; Jill A Helms; Peggy J Farnham; Eran Segal; Howard Y Chang
Journal:  Cell       Date:  2007-06-29       Impact factor: 41.582

4.  A regulatory mutation in IGF2 causes a major QTL effect on muscle growth in the pig.

Authors:  Anne-Sophie Van Laere; Minh Nguyen; Martin Braunschweig; Carine Nezer; Catherine Collette; Laurence Moreau; Alan L Archibald; Chris S Haley; Nadine Buys; Michael Tally; Göran Andersson; Michel Georges; Leif Andersson
Journal:  Nature       Date:  2003-10-23       Impact factor: 49.962

5.  The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus.

Authors:  C J Brown; B D Hendrich; J L Rupert; R G Lafrenière; Y Xing; J Lawrence; H F Willard
Journal:  Cell       Date:  1992-10-30       Impact factor: 41.582

6.  Identification of a mutation in porcine ryanodine receptor associated with malignant hyperthermia.

Authors:  J Fujii; K Otsu; F Zorzato; S de Leon; V K Khanna; J E Weiler; P J O'Brien; D H MacLennan
Journal:  Science       Date:  1991-07-26       Impact factor: 47.728

7.  Fine mapping of a swine quantitative trait locus for number of vertebrae and analysis of an orphan nuclear receptor, germ cell nuclear factor (NR6A1).

Authors:  Satoshi Mikawa; Takeya Morozumi; Shin-Ichi Shimanuki; Takeshi Hayashi; Hirohide Uenishi; Michiko Domukai; Naohiko Okumura; Takashi Awata
Journal:  Genome Res       Date:  2007-04-06       Impact factor: 9.043

8.  Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript.

Authors:  Igor Martianov; Aroul Ramadass; Ana Serra Barros; Natalie Chow; Alexandre Akoulitchev
Journal:  Nature       Date:  2007-01-21       Impact factor: 49.962

9.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine.

Authors:  Lei Kong; Yong Zhang; Zhi-Qiang Ye; Xiao-Qiao Liu; Shu-Qi Zhao; Liping Wei; Ge Gao
Journal:  Nucleic Acids Res       Date:  2007-07       Impact factor: 16.971

10.  NONCODEv4: exploring the world of long non-coding RNA genes.

Authors:  Chaoyong Xie; Jiao Yuan; Hui Li; Ming Li; Guoguang Zhao; Dechao Bu; Weimin Zhu; Wei Wu; Runsheng Chen; Yi Zhao
Journal:  Nucleic Acids Res       Date:  2013-11-26       Impact factor: 16.971

View more
  53 in total

1.  Landscape of long non-coding RNAs in Trichophyton mentagrophytes-induced rabbit dermatophytosis lesional skin and normal skin.

Authors:  Wudian Xiao; Yongsong Hu; Yan Tong; Mingcheng Cai; Hongbing He; Buwei Liu; Yu Shi; Jie Wang; Yinghe Qin; Songjia Lai
Journal:  Funct Integr Genomics       Date:  2018-03-20       Impact factor: 3.410

2.  Genome-wide identification of functional enhancers and their potential roles in pig breeding.

Authors:  Yinqiao Wu; Yuedong Zhang; Hang Liu; Yun Gao; Yuyan Liu; Ling Chen; Lu Liu; David M Irwin; Chunhui Hou; Zhongyin Zhou; Yaping Zhang
Journal:  J Anim Sci Biotechnol       Date:  2022-07-04

3.  Genome-wide identification, characterization and evolutionary analysis of long intergenic noncoding RNAs in cucumber.

Authors:  Zhiqiang Hao; Chunyan Fan; Tian Cheng; Ya Su; Qiang Wei; Guanglin Li
Journal:  PLoS One       Date:  2015-03-23       Impact factor: 3.240

4.  Systematic identification and characterization of long intergenic non-coding RNAs in fetal porcine skeletal muscle development.

Authors:  Weimin Zhao; Yulian Mu; Lei Ma; Chen Wang; Zhonglin Tang; Shulin Yang; Rong Zhou; Xiaoju Hu; Meng-Hua Li; Kui Li
Journal:  Sci Rep       Date:  2015-03-10       Impact factor: 4.379

5.  Dynamic and Widespread lncRNA Expression in a Sponge and the Origin of Animal Complexity.

Authors:  Federico Gaiti; Selene L Fernandez-Valverde; Nagayasu Nakanishi; Andrew D Calcino; Itai Yanai; Milos Tanurdzic; Bernard M Degnan
Journal:  Mol Biol Evol       Date:  2015-05-14       Impact factor: 16.240

6.  Genome-wide identification of long noncoding RNA genes and their potential association with fecundity and virulence in rice brown planthopper, Nilaparvata lugens.

Authors:  Huamei Xiao; Zhuting Yuan; Dianhao Guo; Bofeng Hou; Chuanlin Yin; Wenqing Zhang; Fei Li
Journal:  BMC Genomics       Date:  2015-10-05       Impact factor: 3.969

7.  PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

Authors:  Aimin Li; Junying Zhang; Zhongyin Zhou
Journal:  BMC Bioinformatics       Date:  2014-09-19       Impact factor: 3.169

8.  ALDB: a domestic-animal long noncoding RNA database.

Authors:  Aimin Li; Junying Zhang; Zhongyin Zhou; Lei Wang; Yujuan Liu; Yajun Liu
Journal:  PLoS One       Date:  2015-04-08       Impact factor: 3.240

9.  Genome-wide characterization of long intergenic non-coding RNAs (lincRNAs) provides new insight into viral diseases in honey bees Apis cerana and Apis mellifera.

Authors:  Murukarthick Jayakodi; Je Won Jung; Doori Park; Young-Joon Ahn; Sang-Choon Lee; Sang-Yoon Shin; Chanseok Shin; Tae-Jin Yang; Hyung Wook Kwon
Journal:  BMC Genomics       Date:  2015-09-04       Impact factor: 3.969

10.  Genome wide discovery of long intergenic non-coding RNAs in Diamondback moth (Plutella xylostella) and their expression in insecticide resistant strains.

Authors:  Kayvan Etebari; Michael J Furlong; Sassan Asgari
Journal:  Sci Rep       Date:  2015-09-28       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.