Lijuan Hou1, Jin Xie1, Yaoyao Wu1, Jiaojiao Wang1, Anqi Duan1, Yaqi Ao1, Xuejiao Liu1, Xinmei Yu1, Hui Yan1, Jonathan Perreault2, Sanshu Li3. 1. Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China. 2. INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval, Québec, H7V1B7, Canada. 3. Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China. sanshuli@hqu.edu.cn.
Abstract
BACKGROUND: Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. RESULTS: Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. CONCLUSIONS: By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes.
BACKGROUND: Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. RESULTS: Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. CONCLUSIONS: By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes.
Entities:
Keywords:
Animal genomes; Comparative genomics; Human genomes; Pipeline; Structured ncRNAs
Authors: Di Wu; Shyamali Mandal; Alex Choi; August Anderson; Michaela Prochazkova; Hazel Perry; Vera L Gil-Da-Silva-Lopes; Richard Lao; Eunice Wan; Paul Ling-Fung Tang; Pui-yan Kwok; Ophir Klein; Bian Zhuan; Anne M Slavotinek Journal: Hum Mol Genet Date: 2015-05-07 Impact factor: 6.150
Authors: Zasha Weinberg; Peter B Kim; Tony H Chen; Sanshu Li; Kimberly A Harris; Christina E Lünse; Ronald R Breaker Journal: Nat Chem Biol Date: 2015-07-13 Impact factor: 15.040