| Literature DB >> 18466551 |
Robert Yu1, Kevin Dehoff, Christopher I Amos, Sanjay Shete.
Abstract
Several genetic determinants responsible for individual variation in gene expression have been located using linkage and association analyses. These analyses have revealed regulatory relationships between genes. The heritability of expression variation as a quantitative phenotype reflects its underlying genetic architecture. Using support vector machine regression (SVMR) and gene ontological information, we proposed an approach to identify gene relationships in expression data provided by Genetic Analysis Workshop 15 that would facilitate subsequent genetic analyses. A group of related genes were selected for a shared biological theme, and SVMR was trained to form a regression model using the training gene expressions. The model was subsequently used to search for and capture similarly related genes. SVMR shows promising capability in modeling and seeking gene relationships through expression data.Entities:
Year: 2007 PMID: 18466551 PMCID: PMC2367560 DOI: 10.1186/1753-6561-1-s1-s51
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
SVMR 4-level search strategy and results
| Search level | ||||
| 1 | 2 | 3 | 4 | |
| Theme | Genes that contained highly correlated genes | From the same biological family | Across biological families | Random Walk (all genes) |
| Sample size | 1000 | 55 RPa 49 ZFPb | 49 | 3554 |
| Sample selection criteria | A total of 1000 genes that contained 100 highly correlated genes | all in RP family, all in ZFP family | RP, ZFP, and DEADc | The full data set of all 3554 genes |
| Training size | 2 genes per training, 3 trainings | 2 to 10 genes | 3 genes per training | 3 to 20 genes |
| Training selection criteria | Corr > 0.85, | Randomly from 55 RP genes or from 49 ZFP genes | Only from RP family | Randomly from entire sample |
| Best training size | 2 genes | 4–5 genes | 3 genes | 3–7 genes |
| Example of training genes | 1. 200088_x_at and 200809_x_at (both are different problems for RPL12) (Pearson corr > 0.92 and Spearman corr > 0.90, | RPS11 | RPS4X | C1D |
| Example of captured genes | 1. 200088_x_at and 200809_x_at | 1. RPL27, RPS3A(2000099_s_at), RPS3A(201257_x_at), RPS29, RPS28 | DDX39 | SCAP1 |
aRP, ribosomal proteins family
bZFP, zinc finger proteins family
cDEAD, DEAD box proteins, which are characterized by the conserved motif (Asp-Glu-Ala-Asp) (DEAD).
dThree pairs of highly correlated gene expressions as three separate training sets, and search separately back in the sample, and found itself and the others.
Figure 1Results of genome-wide linkage analysis of fourselected genes. Linkage results for expressions of four genes that were compared with the ones presented in Morley et al. [2]. The Affymetrix probeset IDs are listed in parentheses.
Figure 2Highly correlated expressions show similar linkage analysis results but may not be biologically related. Expression data from groups A and B are highly correlated and they all belong to the same biological group, ribosomal proteins. A1 and A2 are from the same gene, RPL12. Correlation coefficients for expression of genes C1 and C2 are > 0.987, but they appear to share no direct biological relationship, even though their NPL LOD score distributions show high similarity as well.
Figure 3SVMR training and searching results in one biological family – ribosomal proteins. Expression pattern (left) and genome-wide NPL LOD score distributions (right) of ribosomal protein genes. Group A is the genes selected for the SVMR training set. Groups B and C are the genes that were targeted and captured in two separate SVMR searches.
Figure 4SVMR searching results across biological families. Expression pattern (top panel) and genome-wide NPL LOD score distribution of genes in training set (three ribosomal protein genes, RPS4X, RPS4Y1, RPS5) and the four captured DEAD box genes (DDX39, DDX3Y, DDX58, DDX26).
Figure 5Biological pathways of two groups of genes from "random walk" search. Genes in set B were randomly picked for the SVMR training set, and then a random search of the gene pool hit a group of genes that formed set A. The pathway reconstruction was done using PathwayStudio 4.0 (Ariadne Genomics, Inc.).