| Literature DB >> 29892516 |
Masami Ikeda1, Minoru Sugihara2, Makiko Suwa1.
Abstract
We report the development of the SEVENS database, which contains information on G-protein coupled receptor (GPCR) genes that are identified with high confidence levels (A, B, C, and D) from various eukaryotic genomes, by using a pipeline comprising bioinformatics softwares, including a gene finder, a sequence alignment tool, a motif and domain assignment tool, and a transmembrane helix predictor. SEVENS compiles detailed information on GPCR genes, such as chromosomal mapping position, phylogenetic tree, sequence similarity to known genes, and protein function described by motif/domain and transmembrane helices. They are presented in a user-friendly interface. Because of the comprehensive gene findings from genomes, SEVENS contains a larger data set than that of previous databases and enables the performance of a genome-scale overview of all the GPCR genes. We surveyed the complete genomes of 68 eukaryotes, and found that there were between 6 and 3,470 GPCR genes for each genome (Level A data). Within these genes, the number of receptors for various molecules, including biological amines, peptides, and lipids, were conserved in mammals, birds, and fishes, whereas the numbers of odorant receptors and pheromone receptors were highly diverse in mammals. SEVENS is freely available at http://sevens.cbrc.jp or http://sevens.chem.aoyama.ac.jp.Entities:
Keywords: G-protein coupled receptor; bioinformatics; comparative genome analysis; functional annotation; gene finding
Year: 2018 PMID: 29892516 PMCID: PMC5992857 DOI: 10.2142/biophysico.15.0_104
Source DB: PubMed Journal: Biophys Physicobiol ISSN: 2189-4779
Thresholds used for GPCR discovery
| Level A | Level B | Level C | Level D | |
|---|---|---|---|---|
| Sequence search with BLASTP | E<10−80 | E<10−30 | E<10−30 | E<10−30 |
| Pfam domain assignment with HMMER | E<10−10 | E<1.0 | E<1.0 | E<1.0 |
| PROSITE motif assignment | Not used | Match | Match | Match |
| TMH Prediction | Not used | TMwindows (7) AND SOSUI (7) | TMwindows (7) AND SOSUI (6–8) | TMwindows (7) OR SOSUI (7) |
| Sensitivity | 99.40% | 99.80% | 99.90% | 99.90% |
| Specificity | 96.60% | 70.00% | 48.40% | 20.00% |
Best specificity threshold of BLAST and HMMER against the reference data set.
Best sensitivity threshold of BLAST and HMMER against the reference data set.
The number in the parentheses represents the predicted number of TMH.
Figure 1Content search page, which shows the chromosome map, phylogenetic icon, and search condition entry form. The chromosomal map in the upper region shows the position of GPCR genes colored according to their status as actual genes (purple) or pseudogenes (orange) and the selection of these positions leads to the result page. Selection of the phylogenetic icon indicates a GPCR tree viewer with each leaf line colored according to the GPCR family or chromosome number information. Selection of a gene navigates to the result page. The search condition entry form (in the middle of the figure) retrieves candidate GPCR genes through the “AND” combination of keyword in nr.aa (non-redundant amino acid) database search results, chromosome number, data level, predicted exon number, DNA and protein sequence length, E-value of sequence search against the Swiss-Prot or nr.aa database, and whether the query has GPCR-specific PROSITE motifs and GPCR-specific Pfam domains. The search results appear in the chromosomal viewer and the lower table, which are linked to the result page.
Figure 2a. Chromosomal coordinate information, together with the known information on regulatory regions (green bars). The board color represents the GC content of the genome sequence. The selected gene is colored red. In a table in the lower region of this figure, the information of the selected protein sequences (sequence search result against the Swiss-Prot/TrEMBL, nr.aa, and UniGene database using BLAST) is shown. Furthermore, information such as the gene expression pattern, binding ligand, the type of binding G-protein, and the composition of the amino acid sequence, are described in the bottom region. b. Structural information, such as predicted TM helix region (red bar) by SOSUI, PROSITE motif pattern regions (green bar), domain regions (blue bar), predicted disorder regions by DISOPRED (white bar), exon-intron boundary, pseudogene, novel genes, and regions of known structure (purple bar), can be observed. Each exon sequence appears when “EXON SEQUENCES” is clicked. The structures for class A GPCRs, determined by comparative modeling, are presented in Jmol 3D viewer. Based on these structures, the actual TM helix regions are displayed (yellow bar) on the structure board.