| Literature DB >> 12875046 |
Marissa A Ehringer1, James M Sikela.
Abstract
When studying complex diseases such as alcoholism that develop as a result of numerous genetic and environmental factors, researchers can use the sequence data that have become available both for the human and for animal genomes. For these analyses, investigators are being aided by efforts to identify and characterize functionally relevant DNA sequences in the entire genomic DNA sequence--a process called annotation. Various bioinformatics and annotation tools can help in this enterprise. These include four primary approaches: (1) precomputed, annotated public Web sites that provide a plethora of information; (2) in-house analyses from which users can choose the appropriate analyses for their purposes; (3) Web-based annotation systems that analyze a user's DNA sequence; and (4) private resources that provide access to annotated genomic sequences at cost. In addition to careful study of the DNA sequence for clues about function, expression studies of mRNA levels using gene chips provide information about the activity levels of thousands of genes that may vary in different tissues, different animals and people, or under different environmental conditions.Entities:
Mesh:
Year: 2002 PMID: 12875046 PMCID: PMC6683845
Source DB: PubMed Journal: Alcohol Res Health ISSN: 1535-7414
Resources for Genomic Analyses and Their Web Sites
| UCSC Golden Path | |
| NCBI Map Viewer | |
| Ensembl | |
| NIX | |
| RUMMAGE | |
| GESTALT | |
| Celera Genomics | |
| DoubleTwist™ | |
| Affymetrix | |
| LifeArray™ chips | |
| Rosetta Inpharmatics | |
Originally produced by IncyteGenomics, now available through Agilent.
Genome Analysis Programs for Identifying Characteristic Features of Unknown DNA Sequences
| Program Name | Type of Annotation | Program Location | Program Information |
|---|---|---|---|
| Repeats | Repeat Finder | Local | NCBI Repeats Database ( |
| DbEST | Homology Search against EST database | Remote (Network-Client) | NCBI EST Database using Blastc13 ( |
| Spliced ESTs | — | Local | Groups ESTs originated from a single clone |
| GenPept | Homology Search against NCBI protein database | Remote (Network-Client) | NCBI Protein Database using Blastc13 ( |
| Xpound | Gene Finder (rule-based) | Local | Software for exon trapping based on maximum likelihood methods ( |
| MZEF | Gene Finder (rule-based) | Local | Predicts putative internal protein coding exons in genomic DNA sequences, starts with potential exon and calculates posterior exon probability ( |
| GrailEXP | Gene Finder (neural network) | Web Server | Incorporates EST homology searches and biological rules to model more complicated gene structures ( |
| GeneID | Gene Finder (neural network) | Local | Uses hierarchical structure to first identify splice sites, start and stop codons, then build exons, followed by scoring of exons and assembling gene structure ( |
| Eukaryotic GeneMark.hmm | Gene Finder [Hidden Markov Model (HMM)] | Web Server | Relies upon an Inhomogeneous Markov Model approach combined with training data sets to predict genes ( |
| GENSCAN | Gene Finder (HMM) | Local | Uses general probabilistic model of gene structure assembling knowledge of basic transcriptional, translational, and splicing signals to predict exons ( |
| Fgenesh | Gene Finder (HMM) | Local | Combines pattern recognition features with similarity searches of predicted exons against known protein databases ( |
| ORF Finder | Open Reading Frame Finder | Web Server | |
| NNPP | Promoter Prediction | Local | Neural Network Promoter Prediction (NNPP) uses time-delay to predict promoters |
| Promoter-Inspector | Promoter Prediction | Web Server | Accurately predicts 43 percent of true promoters ( |
| GrailEXP CpG islands | CpG Island Finder | Web Server | Identifies regions containing short (200–2000bp) segments with a characteristic DNA composition (i.e., a GC content greater than 50 percent) that are commonly located near the starting regions of genes ( |
NOTE: The results of these analyses can be entered into the Genotator program for a comprehensive analysis of unknown DNA sequences.
Figure 1Example of a search result obtained with the University of California Santa Cruz (UCSC) human genome browser, Golden Path, showing known genes, predicted genes, and various other details about the genomic DNA region examined. For known genes (see bracket) the name of the gene serves as a hypertext link to additional information about the gene. From there, one can access other databases to learn about the function of the gene, mRNA expression data, homologous genes in other organisms, and other relevant information.
Figure 2Example of a search result obtained with the National Center for Biotechnology Information (NCBI) Map Viewer. The map displays information about known genes, marker locations, and different types of maps of the human genome.
Figure 3Example of a search result obtained with the Ensembl human genome browser. The map shows known genes, predicted genes, marker locations, and other genomic DNA details.
Figure 4Example of search results obtained with a series of analyses that were entered into the Genotator browser. (A) A color-coded window depicts the results from the multiple software analyses completed. The display shows the locations of expressed sequence tags (ESTs), results from gene prediction programs, promoter predictions, and protein peptide similarities. (B) A closeup of an area of the display shown in Figure 4A, revealed by using the scroll bar in the left corner of the screen. By clicking on these regions with the mouse, one can view the DNA sequence and gain additional information.
Figure 5Example of a search result using the human genome browser developed by Celera Genomics. The map includes many annotations similar to those in the public domain, as well as additional results generated by Celera from analyses of gene predictions and information about novel genes that may be related to known genes.