| Literature DB >> 15655071 |
David A Nix1, Michael B Eisen.
Abstract
BACKGROUND: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15655071 PMCID: PMC546196 DOI: 10.1186/1471-2105-6-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Screen capture of the GATAligner program.
GATAligner parameters and features
| Score added to the total for each match. | |
| Score subtracted from total for each mismatch. | |
| Score subtracted from total for each new gap. | |
| Score subtracted from total for each additional base in a gap after its creation. | |
| Use of DUST to mask and thus not align regions of low complexity. | |
| Size of window used to score sub-alignments in each local alignment. Sub-alignments smaller than the window size will be saved provided they are at or above the score cut off. When aligning longer sequences, increase the size of the window as well as the cut off score to minimize non-related alignments. | |
| Score (raw or bits) at which windowed sub-alignments are saved or discarded. The higher this score is set the faster GATAligner will run. Set between 20–25 bits for a window size of 24 when aligning sequences less than 10 KB. For larger sequences, increase the cut off and window size (e.g. 30 bits and 30 bp). | |
| Use this to maintain register with gene annotation. | |
| GATAligner is multithreaded. Queue up multiple alignments. |
Figure 2Screen capture of the GATAPlotter program. An alignment between D. melanogaster and D. pseudobscura surrounding gene CG1877.
Figure 3Rendered gene annotation in GATA. A typical protein coding gene is visualized as a GeneGroup comprised of multiple TransGroups containing Exons, Introns, and a Protein transcript. Arrows designate orientation. The DNA glyph is rendered as both Non-Coding and Coding elements.
GATAPlotter menus
| Open a new GATA plot or close the present GATA plot. | |
| Quit the entire application. | |
| Use to save a high resolution PNG file of the GATAPlot. | |
| Select this menu option to save the current settings. These will be used upon opening new GATA plots. Generic Track settings are not saved. To restore the defaults, select the Redraw Using Defaults from the Windows menu and then the Save GATAPlotter settings. Alternatively, delete the GATAPlotterPreferences file in the GATA folder. | |
| A variety of parameters to change the height, width, thickness and relative location of the Alignment panel shapes. | |
| Allows for specifying the number of nucleotides that are rendered per pixel enabling size synchronization between different GATAPlots. | |
| Use this option to reformat both the reference and conserved sequence using the visible alignment boxes in the GATA plot. Upon selection, a dialog box will appear asking how you would like to reformat the non-boxed sequences. These non-conserved sequences can be replaced with any single character (e.g. N or X) or converted to lower or upper case. Use the sliders in the Tools Panel to adjust what is visible. | |
| Select to retrieve all the GATAligner settings used in making the GATA alignments and GATA plot. (e.g. score cut off, window size, match, mismatch, etc). | |
| (These menu items are only available if gene annotation has been added to the GATA plot.) | |
| Use to hide or show all of the gene groups (Protein, RNA, DNA) or labels. | |
| Select whether to hide, show or change their colour. | |
| Select to hide or show, change the colour, or move the scale bar. | |
| This option contains global effectors for generic tracks. | |
| If generic features are found within the GFF file, each is assigned its own track. Their thickness, colour, visibility, and label visibility can be modified using the appropriate options. | |
| A variety of adjustments can be made to the number of pixels that are placed between features. Negative numbers are valid if you want to overlap features. | |
| Line thickness can be set to control the size of Protein, RNA, and DNA features. | |
| The background panel and label colour can be set using these options. | |
| If generic tracks have been generated and are associated with a score, they can be shaded using the scaling feature. Select the method GATA should us to convert the reported scores to linear numbers. (e.g. Often hits to a position weight matrix are scored in log units. Select the appropriate base log 10, log 2, or natural log.) After converting the scores for a particular track, a range is estimated and used to adjust the opacity of each feature from 30% for the lowest scoring feature to 100% for the highest scoring feature. This allows visual comparison of features within a track. Comparisons between tracks are only valid if they have the same range. | |
| Retrieves and displays all hidden windows. | |
| Hides all windows except the main GATAPlotter alignment window. | |
| Redraws all panels using GATAPlotter default values. | |
| Extensive documentation for GATA including examples. |
GATAPlotter windows and features
| Sliders can be used to control the minimum score and maximum score used in deciding which sub-alignment box-line-boxes are displayed. The units are a normalized range where zero is set to the value assigned in GATAligner to the Lower Score Cut Off. 100 is set to the value obtained by multiplying the Window Size by the Nucleotide Match. The actual window bit and Expect values set by the sliders are shown in the adjacent boxes. Since the shading is relative to these minimum and maximum values, be careful in making comparisons in shading between two GATA plots. Such comparisons are only valid if the same window size, cut off score, and scoring scheme were used. Check the actual score by clicking on the shaded box or connecting line to see the real bit score. (Bit scores are scoring system independent and can be used to directly compare alignments. Raw scores are relative to the settings for match, mismatch, gap, etc.) | |
| The zoom buttons allow for zooming in and out. | |
| These numbers report the position of the mouse, in base pairs, when the mouse passes over one of the sequence bars. | |
| Single clicking a gene annotation feature retrieves and displays all information associated with that feature in the Text Console Window. Likewise single clicking an alignment box or line displays the sub-alignment information. Double clicking fetches the sub-alignment and its parental local alignment. The sub-alignment is indicated by the asterisks in the larger local alignment. All visible alignments beneath a mouse click are retrieved. Use the Score Sliders to determine which boxes are visible. | |
| If you are interested in a sub section of the alignment, drag the mouse over the region and a reformat box will appear. If you drag the mouse over one sequence and it contains box-line-boxes, these will to used to fetch the corresponding sequence from the other sequence. If you drag the mouse over both sequences, sub sequence sections will be retrieve regardless of the location of box-line-boxes. | |
| A resizable scrolling container for text messages generated by mouse clicking. |
Figure 4Example: gene triplication. An example of a gene triplication in D. melanogaster and D. pseudobscura surrounding gene CG14745
Figure 5Example: sequence inversion. An example of a sequence inversion event between D. melanogaster and D. pseudobscura surrounding gene CG8930.