| Literature DB >> 25982314 |
Alfred Chun-Shui Luk1, Huayan Gao1, Sizhe Xiao2, Jinyue Liao1, Daxi Wang2, Jiajie Tu1, Owen M Rennert2, Wai-Yee Chan3, Tin-Lap Lee3.
Abstract
Spermatogenic failure is a major cause of male infertility, which affects millions of couples worldwide. Recent discovery of long non-coding RNAs (lncRNAs) as critical regulators in normal and disease development provides new clues for delineating the molecular regulation in male germ cell development. However, few functional lncRNAs have been characterized to date. A major limitation in studying lncRNA in male germ cell development is the absence of germ cell-specific lncRNA annotation. Current lncRNA annotations are assembled by transcriptome data from heterogeneous tissue sources; specific germ cell transcript information of various developmental stages is therefore under-represented, which may lead to biased prediction or fail to identity important germ cell-specific lncRNAs. GermlncRNA provides the first comprehensive web-based and open-access lncRNA catalogue for three key male germ cell stages, including type A spermatogonia, pachytene spermatocytes and round spermatids. This information has been developed by integrating male germ transcriptome resources derived from RNA-Seq, tiling microarray and GermSAGE. Characterizations on lncRNA-associated regulatory features, potential coding gene and microRNA targets are also provided. Search results from GermlncRNA can be exported to Galaxy for downstream analysis or downloaded locally. Taken together, GermlncRNA offers a new avenue to better understand the role of lncRNAs and associated targets during spermatogenesis. Database URL: http://germlncrna.cbiit.cuhk.edu.hk/Entities:
Mesh:
Substances:
Year: 2015 PMID: 25982314 PMCID: PMC4433719 DOI: 10.1093/database/bav044
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Data content and distribution of GermlncRNA
| Category of lncRNA | Number (%) |
|---|---|
| Annotated lncRNA | |
| | |
| Intergenic | 36 298 (32.9%) |
| Exonic | 49 679 (45.0%) |
| Intronic | 19 646 (17.8%) |
| Promoter-associated | 4853 (4.4%) |
| | |
| ≥0.3 FPKM at | |
| Any stage | 43 215 (39.1%) |
| Spermatogonia | 25 322 (22.9%) |
| Spermatocytes | 29 789 (27.0%) |
| Spermatids | 28 050 (25.4%) |
| ≥3 FPKM at | |
| Any stage | 18 409 (16.7%) |
| Spermatogonia | 8923 (8.1%) |
| Spermatocytes | 11 199 (10.1%) |
| Spermatids | 11 587 (10.5%) |
| Absence of SAGE tag | 7832 (7.1%) |
| Tag count ≥1 at | |
| Any stage | 34 529 (31.3%) |
| Spermatogonia | 18 958 (17.2%) |
| Spermatocytes | 20 942 (19.0%) |
| Spermatids | 19 520 (17.7%) |
| Tag count ≥5 at any stage | |
| Any stage | 8574 (7.8%) |
| Spermatogonia | 5360 (4.9%) |
| Spermatocytes | 5448 (4.9%) |
| Spermatids | 5180 (4.7%) |
| Novel (intergenic) lncRNA | |
| | |
| Intergenic | 2357 (84.5%) |
| Promoter-associated | 433 (15.5%) |
| | |
| ≥0.3 FPKM at | |
| Any stage | 2786 (99.9%) |
| Spermatogonia | 1009 (36.2%) |
| Spermatocytes | 2206 (79.1%) |
| Spermatids | 2532 (90.8%) |
| ≥3 FPKM at | |
| Any stage | 2059 (73.8%) |
| Spermatogonia | 710 (25.4%) |
| Spermatocytes | 1259 (45.1%) |
| Spermatids | 1560 (55.9%) |
| Absence of SAGE tag | 337 (12.1%) |
| Tag count ≥1 at | |
| Any stage | 812 (29.1%) |
| Spermatogonia | 410 (14.7%) |
| Spermatocytes | 494 (17.7%) |
| Spermatids | 432 (15.5%) |
| Tag count ≥5 at | |
| Any stage | 187 (6.7%) |
| Spermatogonia | 119 (4.3%) |
| Spermatocytes | 115 (4.1%) |
| Spermatids | 108 (3.9%) |
| | |
| Class 1 (1 RNASeq + Tiling arrays) | 1155 (41.4%) |
| Class 2 (2 RNASeq + Tiling arrays) | 934 (33.5%) |
| Class 3 (1 RNASeq + Tiling arrays + SAGE) | 344 (12.3%) |
| Class 4 (2 RNASeq + Tiling arrays + SAGE) | 357 (12.8%) |
List of genomic features in GermlncRNA
| Symbol | Full name | Definition of positive association | Biological implication |
|---|---|---|---|
| PolyA | Polyadenylation | Any overlap from − 50 to + 200 bp relative to transcription termination sites (TTS) | 3′-Terminal for a transcript |
| CAGE | Cap analysis of gene expression | Any overlap from −200 to + 50 bp relative to TSS | 5′-Terminal for a transcript |
| DHS | DNase I hypersensitivity site | Significant overlap (≥10 bp) of strong (top 50%) DHS signals with promoters (upstream 2000 bp) | More accessible to DNA-binding proteins, such as TFs and regulators |
| H3K4me1 | Histone 3 mono-methylated lysine 4 | Significant overlap of strong H3K4me1 signal with promoters or gene bodies (from TSS to TTS) | Enriched in enhancers |
| H3K4me3 | Histone 3 tri-methylated lysine 4 | Significant overlap of strong H3K4me3 signal with promoters | Enriched in active promoters |
| H3K27ac | Histone 3 acetylated lysine 27 | Significant overlap of strong H3K27ac signal with promoters | Enriched in active promoters |
| H3K27me3 | Histone 3 tri-methylated lysine 27 | Significant overlap of strong H3K27me3 signal with promoters | Enriched in repressed promoters |
| H3K36me3 | Histone 3 tri-methylated lysine 36 | Significant overlap of strong H3K36me3 signal with promoters or gene bodies | Enriched in actively transcribed regions |
| Conservedelements | PhastCons placental mammal-conserved elements (30-way Multiz alignment) | Significant overlap of PhastCons placental mammal-conserved elements with gene bodies | Conserved regions among mammals |
Figure 1.GermlncRNA overview. To study lncRNA biology in mouse germ cell development, we made use of high-throughput transcriptomic data on three germ cell stages from three platforms, namely RNA sequencing, tiling microarray and SAGE, and identified germ cell-specific novel lncRNAs. Annotations from five public databases, including Ensembl, RefSeq, UCSC Genes, Non-code and fRNAdb were combined to obtain a catalogue of annotated lncRNAs. Both annotated and novel lncRNAs were analysed for expression, association with regulatory features and functional implications. The search results in GermlncRNA can be exported as text file, visualized and further analysed in Galaxy.
Figure 2.GermlncRNA structure. The database consists of five main sections: Home, Data Search, Statistics, Help and Contact. The Data Search section provides the core lncRNA information by the help of three tabs—Search data, Select column and Export to Galaxy (48). A Glossary panel provides explanations for key terms in Data Search sections. Furthermore, the lncRNAs in search results can be visualized in UCSC Genome Browser or downloaded in a text (CSV) file.
Figure 3.Example of two annotated lncRNAs with similar loci, GlncRNA0062198 and GlncRNA0062199, as viewed in UCSC Genome Browser. As shown, the two lncRNAs have annotated loci of 3′-terminals differ by 2 bp and those of 5′-terminal by 104 bp. The former is annotated by NONCODE and fRNAdb, while the latter is annotated by fRNAdb only. Both of the lncRNAs were supported by Spga-specific SAGE, RNASeq and tiling array evidence, indicating a strong expression specifically in Spga. They are also covered by probes in lncRNA microarray and showed a stronger expression in neonatal testis than in adult testis. Furthermore, the expression is also supported by signals in PolyA sequencing data. They are located in the promoter region of a bidirectional protein-coding gene prolyl endopeptidase (Prep), whose expression is also Spga-specific, suggesting possible regulation in cis.
Figure 4.Example of a novel lncRNA, GlncRNA20990d, as viewed in UCSC Genome Browser. This lncRNA is intergenic and previously unannotated by any of the five public genomic databases. It has a gradually increasing expression along spermatogenesis, with specifically highest expression in Sptd, as supported by both SAGE and RNASeq data. A nearby upstream CAGE signal suggested the location of its 5′-terminal.
Figure 5.Percentages of annotated and novel lncRNAs associated with various regulatory features.