| Literature DB >> 29945466 |
John Lalith Charles Richard1,2,3, Pieter Johan Adam Eichhorn1,2,4.
Abstract
Prior to the sequencing of the human genome, it was presumed that most of the DNA coded for proteins. However, with the advent of next-generation sequencing, it has now been recognized that most complex eukaryotic genomes are in fact transcribed into noncoding RNAs (ncRNAs), including a family of transcripts referred to as long noncoding RNAs (lncRNAs). LncRNAs have been implicated in many biological processes ranging from housekeeping functions such as transcription to more specialized functions such as dosage compensation or genomic imprinting, among others. Interestingly, lncRNAs are not limited to a defined set of functions but can regulate varied activities such as messenger RNA degradation, translation, and protein kinetics or function as RNA decoys or scaffolds. Although still in its infancy, research into the biology of lncRNAs has demonstrated the importance of lncRNAs in development and disease. However, the specific mechanisms through which these lncRNAs act remain poorly defined. Focused research into a small number of these lncRNAs has provided important clues into the heterogeneous nature of this family of ncRNAs. Due to the complex diversity of lncRNA function, in this review, we provide an update on the platforms available for investigators to aid in the identification of lncRNA function.Entities:
Keywords: lncRNA interactome; long noncoding RNA; tools for lncRNA functional annotation
Mesh:
Substances:
Year: 2018 PMID: 29945466 PMCID: PMC6249642 DOI: 10.1177/2472630318780639
Source DB: PubMed Journal: SLAS Technol ISSN: 2472-6303 Impact factor: 3.047
Databases and Tools Used to Study Long Noncoding RNA.
| Database/Tools | Application | Reference |
|---|---|---|
| LncRNAdb v2.0 (lncRNA Database) | Reference database for functional long noncoding RNAs and provides comprehensive annotations of eukaryotic lncRNAs. |
[ |
| FANTOM (Functional Annotation of the mammalian Genome) | Database as a resource for experimentally supported lncRNA-disease association data. Database also has platform with integrated tools for predicting novel lncRNA-disease associations. |
[ |
| ENCODE (Encyclopaedia of DNA Elements) | The ENCODE database is a comprehensive collection of functional elements in the human genome, including elements that act at the protein and RNA levels and regulatory elements that control cells and circumstances in which a gene is active. |
[ |
| The GENCODE Project | The repository contains comprehensive gene annotations on reference chromosomes, scaffolds, assembly patches, and alternate loci. There is also comprehensive gene annotation of lncRNA genes. |
[ |
| lncRNAMap | A repository to investigate the putative regulatory functions of human lncRNAs and expression profiles for lncRNAs and their homologous protein coding genes. In addition, information regarding miRNA regulators of lncRNA is also available. |
[ |
| LNCipedia 3.0 | A repository for annotated human lncRNA sequences. |
[ |
| The LncRNA and Disease Database | A repository for curated and experimentally supported lncRNA-disease association data. The database also hosts integrated tools for predicting novel disease associations. Interactions at various levels such as protein, RNA, miRNA, and DNA are also available. |
[ |
| lnCeDB | A database of human lncRNAs that can act as ceRNAs. Database also provides information on lncRNA-mRNA pairs having common targeting miRNAs. The expression of lncRNA can be compared across 22 human tissues to estimate the chances of the pair for actually being ceRNAs. |
[ |
| starBASE v2.0 | Database designed for decoding pan-cancer and interaction networks of lncRNAs, miRNAs, ceRNAs, RNA binding proteins, and mRNAs from large-scale CLIP-Seq (HITS-CLIP, PAR-CLIP, iCLIP, CLASH) data and tumor samples comprising 14 cancer types spanning more than 6000 samples. Starbase also provides miR and ceRNA function web tools to predict the function of ncRNAs and protein coding genes from the miRNA-mediated (ceRNA) regulatory networks. |
[ |
| DIANA TOOLS LncBase v.2 | Tool for determining experimentally verified and computationally predicted miRNA targets on long noncoding RNAs. The experimental module engages miRNA and lncRNA interactions pertaining to the experimental validation and outcomes. The prediction module contains information for more than 10 million interactions and provides information of interaction sites, graphical representation of their binding, and the predicted score. |
[ |
| GeneCards | A human gene database that provides comprehensive information on all annotated and predicted human genes, including lncRNAs. An overall integrated data comprising linked genomic, transcriptomic proteomic, genetic, clinical, and functional information. |
[ |
| LincSNP2.0 | Database that stores and annotates disease-associated single-nucleotide polymorphisms in human long noncoding RNA and their transcription factor binding sites. |
[ |
| LncRNA2Target | A repository for differentially expressed genes after lncRNA knockdown or overexpression. |
[ |
| ChIP Base v2.0 | Open database for studying transcription factor binding sites and motifs and decoding the transcriptional regulatory networks of lncRNAs, miRNAs, other noncoding RNAs, and protein coding genes. |
[ |
| NRED | Database for lncRNA expression from microarray and in situ hybridization data. In addition, provides information on the evolutionary conservation, secondary structure, genomic context links, and antisense relationships. |
[ |
| NONCODE | An integrated database dedicated to noncoding RNA and in particular long noncoding RNA with more accurate annotations. The recent update provides additional features such as conservation annotation, lncRNA-disease relationships, and an interface to choose high-quality data sets through predicted scores, literature support, and long-read sequencing method support. |
[ |
| HGNC (HUGO Gene Nomenclature Committee) | Database aimed at approving unique names and symbols for human loci, including protein coding genes, noncoding genes, and pseudogenes, to allow unambiguous scientific communication. |
[ |
| PhyloCSF (Phylogenetic Codon Substitution Frequency) | Tool used to distinguish between protein coding and noncoding regions based on a formal statistical comparison of phylogenetic codon models. |
[ |
ceRNA, competing endogenous RNA; CLASH, cross-linking, ligation, and sequencing hybrids; CLIP-Seq, crosslinked immunoprecipitation sequencing; HITS-CLIP, high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation; iCLIP, individual nucleotide resolution crosslinked immunoprecipitation; lncRNA, long noncoding RNA; miRNA, microRNA; mRNA, messenger RNA; ncRNA, noncoding RNA; PAR-CLIP, photoactivable ribonucleoside-enhanced crosslinked immunoprecipitation.
Techniques Used to Investigate lncRNAs.
| Technique | Bait | Crosslinking | Interaction | Technical Concept | Scope | Reference |
|---|---|---|---|---|---|---|
| nRIP | Protein | No | Direct/indirect | Captures transcriptome and its targets. RNA and protein components associated with the protein of interest. | Genome-wide |
[ |
| CLIP-Seq | Protein | UV 254 nm | Direct | Captures protein-RNA interactions in vivo. RNA components associated with protein of interest. | Genome-wide |
[ |
| CLIP–mass spectrometry | Protein | UV 254 nm | Direct/indirect | Captures protein-RNA interactions in vivo. Proteins complexes associated with protein of interest and the RNA targets it interacts with. | Genome-wide |
[ |
| PAR-CLIP | Protein | UV 365 nm | Direct T/C or G/A | Captures protein-RNA covalent binding enabled by efficient crosslinking from 4-SU or 6-SG. | Genome-wide |
[ |
| iCLIP | Protein | UV 254 nm | Direct; bound to a barcode sequence | Circularization of reverse transcribed products after the ligation of cleavable adaptors. | Genome-wide |
[ |
| RNA pulldown | lncRNA | Optional | Direct | Special aptamers such as biotin or MS2 fused to the lncRNA pulls down interactome of lncRNA. This includes the targets of lncRNA and complexes interacting. Proteins can be studied by immunoblotting or mass spectrometry. | lncRNA-specific interactions |
[ |
| RAP[ | Antisense-RNA | Disuccinimidyl glutarate-formaldehyde-aminomethyl-trioxsalen | Direct/indirect | 120-nt long nucleotide probes antisense to the target RNA and tiled across the entire RNA target. The probes are biotinylated and captures the lncRNA enrichment amidst protein-RNA interactions, RNA degradations and RNA secondary structures. | Genome-wide |
[ |
| ChIRP | DNA | Glutaraldehyde | Direct | Antisense DNA probes that hybridize to target RNA. Pulls down endogenous RNA and associated genomic DNA. | Genome-wide |
[ |
| ChIRP-domain | DNA | Glutaraldehyde-formaldehyde | Direct | Enables the pulldown of endogenous RNA-chromatin interactions in living cells. Similar to ChIRP, also provides functional information on the architecture and domains of the RNA under investigation. | Genome-wide |
[ |
| SHAPE-Seq | RNA | 1-Methyl-7 (1M7)–nitroisatoic anhydride (NMIA) | Architecture/structure | The method uses selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), which measures nucleotide resolution flexibility information for RNAs in vitro and in vivo. | Structural information |
[ |
| SHAPE-MaP | RNA | 1-methyl-7 (1M7)–nitroisatoic anhydride (NMIA)–1M6 | Architecture/structure | Similar to SHAPE, but SHAPE-MaP provides additional information on the mutations and yields accurate and high-resolution secondary-structure models and disentangles sequence polymorphisms. | Structural and mutation information |
[ |
| DMS-Seq | RNA | Dimethyl sulfate | Architecture/structure | Can be performed in vivo and in vitro. Interacts with unpaired adenine and cytosine residues followed by deep sequencing to identify modifications. | Structural modifications |
[ |
| FRAG-Seq | RNA | RNaseP1 | Architecture/structure | High-throughput RNA structure probing method that uses high-throughput RNA sequencing of fragments generated by digestion with nuclease P1, which specifically cleaves single-stranded nucleic acids. | Genome-wide |
[ |
| PARS, PARTE | RNA | RNase V1, RNase S1 | Architecture/structure | High-throughput deep sequencing of RNA fragments that are treated with structure-specific enzymes providing in vitro profiling of secondary structures at single-nucleotide resolution. | Genome-wide |
[ |
| icSHAPE | RNA | 2-methylnicotinic acid imidazolide N3 | Architecture/structure | Living cells are treated with the icSHAPE chemical NAI-N3 followed by selective chemical enrichment of NAI-N3–modified RNA, which provides an improved signal-to-noise ratio compared with similar methods leveraging deep sequencing. Purified RNA is then reverse-transcribed to produce cDNA, with SHAPE-modified bases leading to truncated cDNA. | Genome-wide |
[ |
4-SU, 4-thiouridine; 6-SG, 6-thiguanosine; cDNA, complementary DNA; ChIRP, chromatin isolation by RNA purification; CLIP, crosslinked immunoprecipitation; CLIP-Seq, crosslinked immunoprecipitation sequencing; iCLIP, individual nucleotide resolution crosslinked immunoprecipitation; DMS-Seq, dimethyl sulfate sequencing; FRAG-Seq, fragmentation sequencing; icSHAPE, in vivo click selective 2′-hydroxyl acylation analyzed by primer extension; lncRNA, long noncoding RNA; nRIP, native RNA immunoprecipitation; PAR-CLIP, photoactivable ribonucleoside-enhanced crosslinked immunoprecipitation; PARS, parallel analysis of RNA structure; PARTE, parallel analysis of RNA structure with temperature elevation; RAP, RNA antisense purification; SHAPE-MaP, selective 2′-hydroxyl acylation analyzed by primer extension mutational profiling; SHAPE-Seq, selective 2′-hydroxyl acylation analyzed by primer extension sequencing; UV, ultraviolet.
Figure 1.Long noncoding RNA (lncRNAs) interactome and strategies. The lncRNA interactome is complex and involves DNA, RNA, and/or proteins. It is important to understand the mechanisms and functions of lncRNAs and the role they play in normal and diseased states. The methods or strategies employed for studying lncRNAs can be achieved by either of the following techniques. Protein-lncRNA interactions broadly represent the protein partners of lncRNAs and suggest their functional mechanisms and pathways. RNA immunoprecipitation (RIP) and crosslinked immunoprecipitation (CLIP) techniques provide clues of the associated RNAs when ribonucleoprotein complexes are pulled down based on the antibody of interest. The coupling of these techniques with high-throughput RNA sequencing and mass spectrometry could help identify the protein interactions to lncRNAs genome-wide or simply other proteins associated in the RNA binding protein complex or the protein of interest, respectively. Techniques that shed information based on the structural features like the secondary and tertiary structure of the lncRNAs eventually aid toward understanding the lncRNA function. The structural features can be harnessed through techniques and chemical reagents that cleave RNA at specific nucleotides or attack the regions that are exposed to the solvent, avoiding the RNA regions that are buried inside or are covered by proteins. Crosslinking also could reveal the intramolecular interactions that could be extended over a long range. Ribonucleases with different cleavage specificities can be used to obtain a RNAse footprint of potential regions covered by the proteins. Methods such as selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) and in-line probing aim at providing information on the local nucleotide flexibility. Coupling of SHAPE with sequencing could provide details of binding regions. Fragment sequencing (FragSeq) and parallel analysis of RNA structure (PARS), on the other hand, also employ RNase digestion to provide information on the RNA structure. Several techniques have been developed to identify the genomic DNA targets of lncRNAs. Based on the workflow backbone of chromatin immunoprecipitation (ChIP), chromatin isolation by RNA purification (ChIRP) helps identify lncRNAs associated with unique chromatin marks, whereas techniques such as chromatin oligo-affinity purification (ChOP) and capture hybridization of RNA targets (CHART) are basically used to identify the complementary DNA regions that interact with the RNA of interest. In addition, coupling with RNA sequencing, quantitative PCRs and mass spectrometry could yield important information regarding the RNA and protein interactome, respectively.