| Literature DB >> 17118140 |
Ronald L Frank1, Ajay Mane, Fikret Ercal.
Abstract
BACKGROUND: Gene duplication events have played a significant role in genome evolution, particularly in plants. Exhaustive searches for all members of a known gene family as well as the identification of new gene families has become increasingly important. Subfunctionalization via changes in regulatory sequences following duplication (adaptive selection) appears to be a common mechanism of evolution in plants and can be accompanied by purifying selection on the coding region. Such negative selection can be detected by a bias toward synonymous over nonsynonymous substitutions. However, the process of identifying this bias requires many steps usually employing several different software programs. We have simplified the process and significantly shortened the time required by condensing many steps into a few scripts or programs to rapidly identify putative gene family members beginning with a single query sequence.Entities:
Mesh:
Year: 2006 PMID: 17118140 PMCID: PMC1683565 DOI: 10.1186/1471-2105-7-S2-S19
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Arabidopsis PAL contig comparisons. Tallies of first, second, and third position differences between contigs. Parentheses count gaps as differences. No gaps exist if numbers inside and out are the same. NS means bl2seq returned "no significant similarity." NO means bl2seq returned an alignment but the ORFs did not overlap.
| AtContig3 | AtContig4 | AtContig5 | AtContig6 | AtContig7 | AtContig8 | AtContig9 | |
| AtContig1 | 20(20) 6(6) 33(33) | 14(14) 4(4) 43(43) | NS | NS | NS | NS | 18(18) 5(5) 32(32) |
| AtContig3 | *** | 45(45) 19(19) 146(146) | NS | 19(19) 7(7) 70(70) | NS | 15(15) 7(7) 55(55) | 5(5) 5(5) 6(6) |
| AtContig4 | *** | *** | NS | 3(5) 4(4) 4(4) | NS | 1(2) 2(2) 1(1) | 10(10) 4(4) 27(27) |
| AtContig5 | *** | *** | *** | NO | NS | NS | NS |
| AtContig6 | *** | *** | *** | *** | 6(6) 5(5) 39(39) | 4(5) 3(3) 7(7) | NS |
| AtContig7 | *** | *** | *** | *** | *** | NS | NS |
| AtContig8 | *** | *** | *** | *** | *** | *** | NS |
Arabidopsis PAL contig and gene comparisons. Percent similarity of representative contigs from each grouping to the four actual Arabidopsis PAL gene sequences.
| Contig1 | Contig3 | Contig4 | Contig6 | Contig8 | ||
| AY045919 | AtPAL1 | 96% | 76% | 86% | 83% | 86% |
| AY133595 | AtPAL2 | 79% | 76% | 100% | 97% | 98% |
| NM_120505 | AtPAL3 | 81% | 83% | 76% | 77% | 77% |
| AC009400 | AtPAL4 | 73% | 99% | 76% | 85% | 86% |
Glycine max CAD contig comparisons. Tallies of first, second, and third position differences between contigs. Parentheses count gaps as differences. No gaps exist if numbers inside and out are the same. NS means bl2seq returned "no significant similarity." NO means bl2seq returned an alignment but the ORFs did not overlap.
| Contig5 | Contig7 | Contig8 | Contig11 | Contig12 | Contig15 | Contig16 | Contig17 | |
| Contig2 | NS | NS | 64(64) 44(44) 117(117) | NS | NS | NS | NS | NS |
| Contig5 | *** | NS | NS | NS | NS | 7(7) 11(11) 38(38) | NS | NS |
| Contig7 | *** | *** | NS | NS | 8(8) 5(5) 28(28) | NS | NS | NS |
| Contig8 | *** | *** | *** | NS | NS | NS | NS | NS |
| Contig11 | *** | *** | *** | *** | NS | NS | NS | 23(23) 8(8) 27(27) |
| Contig12 | *** | *** | *** | *** | *** | NS | NS | NS |
| Contig15 | *** | *** | *** | *** | *** | *** | NS | NS |
| Contig16 | *** | *** | *** | *** | *** | *** | *** | NS |
Glycine max UniGene cluster Gma.9010 contig comparisons. Tallies of first, second, and third position differences between contigs. Parentheses count gaps as differences. No gaps exist if numbers inside and out are the same. NS means bl2seq returned "no significant similarity." NO means bl2seq returned an alignment but the ORFs did not overlap.
| Contig1b | Contig2a | Contig2b | |
| Contig1a | 0(0) 0(0) 0(0) | 15(17) 6(8) 6(8) | 15(17) 6(8) 6(8) |
| Contig1b | ****** | 6(7) 6(7) 15(16) | 6(7) 6(7) 15(16) |
| Contig2a | ****** | ****** | 0(0) 0(0) 0(0) |
Figure 1Overview of SimESTs. Flow chart of SimESTs input, remote calls, and output.
Figure 2Overview of SCAT. Flow chart of SCAT input, remote calls, and output.
Figure 3Flow chart of automation. Flow chart of all steps in identification of a gene family.