| Literature DB >> 27582018 |
Raphael Müller1,2,3, Tyler Weirick1,2, David John1,2, Giuseppe Militello1,2, Wei Chen4,5,6, Stefanie Dimmeler1,2, Shizuka Uchida1,2.
Abstract
Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database "ANGIOGENES" (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27582018 PMCID: PMC5007478 DOI: 10.1038/srep32475
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1ANGIOGENES web interface.
(A) Top page of ANGIOGENES. (B) Result of a quick search. Each field of a Venn diagram is clickable to provide the list of the corresponding transcripts. (C) The result table of transcripts from the selected field of Venn diagram shown in (B). The information about each transcript is shown as well as the link to the more detailed information page for each transcript is reachable by clicking on the “Accession” column. (D) The heat map and the corresponding expression values of the transcript.
Match between ANGIOGENES and PubMed.
| Organism | Query Term | # of Matched Genes | # of Genes in PubMed | Coverage (%) |
|---|---|---|---|---|
| Human | EXPRESSED:“HAoEC–Heart–Normal” | 30,771 | 3,754 | 83.83 |
| Human | EXPRESSED:“BECs–Skin–Normal” | 20,300 | 3,142 | 70.17 |
| Human | EXPRESSED:“HUVEC–Umbilical vein–Normal” | 35,630 | 4,022 | 89.82 |
| Human | EXPRESSED:“HHSEC–Liver–Normal” | 20,618 | 3,020 | 67.44 |
| Human | EXPRESSED:“HMVEC-D–Skin–Normal” | 18,326 | 3,166 | 70.70 |
| Mouse | EXPRESSED:“Tie2plus EC–Cerebral cortex–Normal” | 18,952 | 2,993 | 82.43 |
| Mouse | EXPRESSED:“C166–Yolk sac–Normal” | 19,523 | 2,861 | 78.79 |
| Mouse | EXPRESSED:“Tie2plus EC–kidney–Normal” | 21,923 | 2,875 | 79.18 |
| Mouse | EXPRESSED:“mES–Embryoid body–Normal” | 18,850 | 2,674 | 73.64 |
| Zebrafish | EXPRESSED:“LECs–Lymphatic–FACS isolation of Kaede photconverted red ECs” | 13,647 | 799 | 63.56 |
The percent coverage was calculated by dividing the “# of Gene IDs in PubMed” (the match between the query result of ANGIOGENES and Gene IDs listed under “endothelial” in PubMed) by ENSEMBL Gene IDs found associated to the term “endothelial” in PubMed.
Match between ANGIOGENES and Human Protein Atlas.
| Query Term | # of Matched Genes | # of Genes in HPA | Coverage (%) |
|---|---|---|---|
| EXPRESSED:“HAoEC–Heart–Normal” | 30,771 | 9,159 | 83.84 |
| EXPRESSED:“BECs–Skin–Normal” | 20,300 | 7,552 | 69.13 |
| EXPRESSED:“HUVEC–Umbilical vein–Normal” | 35,630 | 9,675 | 88.56 |
| EXPRESSED:“HHSEC–Liver–Normal” | 20,618 | 7,089 | 64.89 |
| EXPRESSED:“HMVEC-D–Skin–Normal” | 18,326 | 7,861 | 71.95 |
The percent coverage was calculated by dividing the “# of Gene IDs in HPA (Human Protein Atlas)” (which is the match between the query result of ANGIOGENES and ENSEMBL Gene IDs listed under “endothelial” in HPA) by ENSEMBL Gene IDs found associated to the term “endothelial” in HPA.
Figure 2Comparison of HUVEC Transcriptomes.
A 6-way Venn diagram represents the numbers of transcripts detected in each preparation of total RNAs with and without poly A tails. “Cell” stands for “whole cell”; “Nuc” for “nucleus”; and “Cyto” for “cytosol”.
Figure 3A DAG describing the assembly pipeline used in ANGIOGENES.
In this directed acyclic graph (DAG), nodes represent rules, while edges represent transfer of files. Headings (A–L) contain descriptions of the operations preformed by each node. (A) Download the appropriate genome in the FASTA format from ENSEMBL. (B) Download the current ENSEMBL annotation. (C) Unzip the genome FASTA file. (D) Unzip the genome annotations. (E) For single-end reads, convert SRA files to FASTQ files. (F) Build a Bowtie2 index from the given genome FASTA file. (G) For paired-end reads, convert SRA files to FASTQ files. (H) Align single-end reads with Tophat2. (I) Align paired-end reads with Tophat2. (J) Assemble single-end aligned reads with Cufflinks. (K) Assemble paired-end aligned reads with Cufflinks. (L) The “snakemake_target_rule” is a special case representing the desired output from the pipeline. In this case, the “transcripts.gtf” files produced by Cufflinks satisfy the rule. More detailed description is provided in the “Snakefile” file at https://bitbucket.org/raistlin91/angiogenes_pipeline/src.