| Literature DB >> 18597685 |
Csaba Ortutay1, Mauno Vihinen.
Abstract
BACKGROUND: Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments. DESCRIPTION: The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in the public pseudogene databases. The service has a user-friendly web interface and uses a powerful computer cluster in order to perform parallel searches and provide relatively fast runtimes despite exhaustive database searches and analyses.Entities:
Mesh:
Year: 2008 PMID: 18597685 PMCID: PMC2453144 DOI: 10.1186/1471-2105-9-299
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Layout of the PseudoGeneQuest service. The user initiates the search by providing a protein sequence on the web page. The analysis is performed on a cluster using databases, search programs and scripts. The results are mailed to the user, and deposited on the web server, where they can be accessed with the provided search ID.
Figure 2Overview of the search algorithm of PseudoGeneQuest service. User provides the protein query sequence and the number of best hits wanted. The algorithm provides seven types of results in addition to the known genes and pseudogenes (see Table 1). The chart is a modified version of Figure 1 from [9].
Genome segments identified by PseudoGeneQuest
| # | Tag in the result file | Explanation |
| 1 | ALREADY KNOWN GENE | Hits overlapping genes annotated in the genome files. |
| 2 | KNOWN PSEUDOGENE | Results overlapping with records in pseudogene.org. |
| 3 | REAL GENE OR EXON | The hit matches almost exactly to the query. Frequently parts of yet un-annotated or predicted genes. |
| 4 | PSEUDOGENE FRAGMENT | Covers <70% of the length of the query. Small parts of pseudogenes. |
| 5 | PSEUDOGENE | Covers >70% of the length of the query and the reading frame is broken. Processed pseudogenes with high homology to the query sequence. |
| 6 | MISCELLANEOUS | Covers >70% of the length of the query and the reading frame is intact. |
| 7 | PUTATIVE NEW GENE | The hit has uninterrupted reading frame. Un-annotated gene. |
| 8 | DUPLICATED PSEUDOGENE | Multiexon hit with <50% repeat content. A recent duplication of a gene. |
| 9 | INTERRUPTED PROCESSED PSEUDOGENE | Multiexon hit with >50% repeat content. Old processed pseudogenes which accumulated repeats. |
Results of test analysis.
| Sequence ID | NP_000865.2 | NP_004235.3 | NP_003397.1 | NP_000052.1 | NP_061820.1 |
| Already known gene | 15 | 5 | 11 | 32 | 6 |
| Known pseudogene | 35 | 29 | 42 | 27 | 54 |
| Real gene or exon | 18 | 24 | 16 | 36 | 15 |
| Pseudogene fragment | 36 | 0 | 9 | 8 | 5 |
| Pseudogene | 0 | 0 | 0 | 0 | 0 |
| Miscellaneous | 0 | 2 | 6 | 0 | 5 |
| Putative new gene | 0 | 0 | 0 | 0 | 0 |
| Duplicated pseudogene | 2 | 0 | 0 | 0 | 0 |
| Interrupted processed pseudogene | 0 | 3 | 0 | 2 | 0 |
Selected genes were tested with PGQ service. The 100 hits with the highest blast score were analysed.