| Literature DB >> 21087504 |
Huang-Wen Chen1, Sunayan Bandyopadhyay, Dennis E Shasha, Kenneth D Birnbaum.
Abstract
BACKGROUND: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here.Entities:
Mesh:
Year: 2010 PMID: 21087504 PMCID: PMC2998534 DOI: 10.1186/1471-2148-10-357
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Performance analysis of machine learning and single attribute classifiers. Receiver Operating Characteristic (ROC) curve for comparing (A) 5 different machine learning algorithms and one meta-algorithm (StackingC); The hashed diagonal line is the performance of a simple betting classifier, which represents probabilistic classification based on the frequency of positive and negative cases in the training set. (B) single-attribute classifiers using correlation of gene pairs across all microarray experiments (All Experiments) and BLAST E-values.
Figure 2The predicted depth of redundancy genome-wide. Genes are grouped into bins based on the number of paralogs with which they are predicted to be redundant. The first bin represents the number of genes that were predicted to have exactly one redundant paralog, using the cutoff of 0.4. The frequency distribution shows that most genes have relatively few predicted redundant duplicates.
List of attributes used for the predictions
| # | Attribute | Type | Description |
|---|---|---|---|
| 1 | CLUSTALW Score | Sequence | ClustalW alignment score |
| 2 | E-value | Sequence | BLAST alignment E-value |
| 3 | Isoe Pt Diff | Sequence | percent difference in isoelectric points |
| 4 | Mol W Diff | Sequence | percent difference in molecular weight |
| 5 | Nonsyn Subst Rate | Sequence | non-synonymous substitution rate |
| 6 | Protien Domain Sharing Index | Sequence | intersection/union of predicted protein domain |
| 7 | Score | Sequence | BLAST alignment bit score |
| 8 | All Experiments | Expression | 2799 ATH1 microarray experiments |
| 9 | Atlas of Arabidopsis Development | Expression | 264 ATH1 microarray experiments |
| 10 | Atmosphereric Conditions | Expression | 172 ATH1 microarray experiments |
| 11 | Change Light | Expression | 127 ATH1 microarray experiments |
| 12 | Change Temperature | Expression | 112 ATH1 microarray experiments |
| 13 | Compound Based Treatment | Expression | 248 ATH1 microarray experiments |
| 14 | Genetic Modification | Expression | 952 ATH1 microarray experiments |
| 15 | Genetic Variation | Expression | 22 ATH1 microarray experiments |
| 16 | Growth Condition Treatments | Expression | 74 ATH1 microarray experiments |
| 17 | Growth Conditions | Expression | 503 ATH1 microarray experiments |
| 18 | Hormone Treatments | Expression | 256 ATH1 microarray experiments |
| 19 | Induced Mutation | Expression | 18 ATH1 microarray experiments |
| 20 | Infect | Expression | 61 ATH1 microarray experiments |
| 21 | Injury Design | Expression | 28 ATH1 microarray experiments |
| 22 | Irradiate | Expression | 28 ATH1 microarray experiments |
| 23 | Light | Expression | 12 ATH1 microarray experiments |
| 24 | Media | Expression | 54 ATH1 microarray experiments |
| 25 | Organism Part | Expression | 806 ATH1 microarray experiments |
| 26 | Organism Status | Expression | 16 ATH1 microarray experiments |
| 27 | Pathogen Infection | Expression | 200 ATH1 microarray experiments |
| 28 | Root Cells | Expression | 59 ATH1 microarray experiments |
| 29 | Root Cells Iron Salt Treatments | Expression | 17 ATH1 microarray experiments |
| 30 | Root Cells Nitrate Treatments | Expression | 20 ATH1 microarray experiments |
| 31 | Root Developmental Zones | Expression | 11 ATH1 microarray experiments |
| 32 | Root Developmental Zones (Fine Scale) | Expression | 24 ATH1 microarray experiments |
| 33 | Root Regeneration | Expression | 11 ATH1 microarray experiments |
| 34 | Seed Development | Expression | 6 ATH1 microarray experiments |
| 35 | Set Temperature | Expression | 4 ATH1 microarray experiments |
| 36 | Starvation | Expression | 22 ATH1 microarray experiments |
| 37 | Stimulus or Stress | Expression | 320 ATH1 microarray experiments |
| 38 | Strain or Line | Expression | 32 ATH1 microarray experiments |
| 39 | Temperature | Expression | 15 ATH1 microarray experiments |
| 40 | Time Series Design | Expression | 427 ATH1 microarray experiments |
| 41 | Unknown Experimental Design | Expression | 8 ATH1 microarray experiments |
| 42 | Wait | Expression | 17 ATH1 microarray experiments |
| 43 | Water Availability | Expression | 40 ATH1 microarray experiments |
Figure 3Trends in redundancy predictions and attributes in different functional categories. Box and whisker plots show landmarks in the distribution of values, where the horizontal line represents the median value, the bottom and top of the box represent the 25th and 75th percentile values, respectively, and the whisker line represents the most extreme value that is within 1.5 interquartile range from the box. Points outside the whisker represent more extreme outliers. The category "all" represents all genes in the large size class (see text) and is used as a background distribution. The two other categories represent genes in the GO functional category named.