| Literature DB >> 21812956 |
Tahir Mehmood1, Harald Martens, Solve Saebø, Jonas Warringer, Lars Snipen.
Abstract
BACKGROUND: Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21812956 PMCID: PMC3175482 DOI: 10.1186/1471-2105-12-318
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Over all distribution of model parameters and performance. Results obtained from the 20 ST-PLS model fits. The upper left panel shows the distribution of the d-index, which is a measure for a models explanatory power (range 0-1) obtained from the cross validation by using the finally selected shrinkage level and a number of components. The blue curve indicate its distribution for our 20 phenotypes, and the red curve is the distribution of this measure if the matrix is replaced by a random shuffling of its rows, i.e. a 'null'-distribution. The upper right panel shows the distribution of the number of components selected (range 1-10), the lower left panel similar for shrinkage level (range 0.7-0.96) and the lower right panel similar for the number of associated genes.
Enriched variations
| Label | Phenotype | Ess. genes | Paralog | Frame shifts | Stop codon | Copy no. | |
|---|---|---|---|---|---|---|---|
| Mel_R | Melibiose 2% Rate | 33 | 0 | 2.06* | 0.25 | 1.23 | 4.41* |
| Mel_E | Melibiose 2% Efficiency | 40 | 0 | 0.78 | 0.20 | 1.37 | 3.59 |
| Cup_R | Cupper chloride 0.375 mM Rate | 60 | 0.16 | 2.56***••• | 0.15 | 1.63 | 6.42***••• |
| Cup_E | Cupper chloride 0.375 mM Efficiency | 14 | 0 | 2.19 | 0.11 | 3.36* | 11.42**• |
| NaC1_R | NaCl 0.85 M Rate | 58 | 0.16 | 2.91***••• | 0.05 | 1.16 | 8.25***••• |
| NaC2_R | NaCl 1.25 M Rate | 47 | 0.31 | 1.12 | 0.13 | 1.46 | 0 |
| NaC1_E | NaCl 0.85 M Efficiency | 47 | 0.01 | 2.34***•• | 0.13 | 1.46 | 12.70***••• |
| NaC2_E | NaCl 1.25 M Efficiency | 43 | 0.11 | 1.25 | 0.14 | 0.92 | 3.33 |
| Mal_R | Maltose 2% Rate | 59 | 0.51 | 2.05**•• | 0.19 | 1.39 | 11.50***••• |
| Mal_E | Maltose 2% Efficiency | 45 | 0.32 | 1.37 | 0.21 | 0.87 | 13.37***••• |
| Gal_R | Galactose 2% Rate | 30 | 0 | 1.67 | 0 | 0.88 | 22.11***••• |
| Gal_E | Galactose 2% Efficiency | 49 | 0.19 | 2.67***••• | 0.27 | 1.40 | 22.11***••• |
| Hea1_R | Heat 37°C Rate | 33 | 0 | 2.06*• | 0.09 | 2.20* | 9.65***••• |
| Hea2_R | Heat 40°C Rate | 44 | 0 | 2.06**• | 0.07 | 1.23 | 13.73***••• |
| Hea1_E | Heat 37°C Efficiency | 44 | 0.11 | 1.21 | 0.26 | 1.58 | 1.568 |
| Hea2_E | Heat 40°C Efficiency | 49 | 0.40 | 1.78* | 0.12 | 1.72 | 9.99***••• |
| Sod1_R | Sodium arsenite oxide 3.5 mM Rate | 48 | 0 | 5.58***••• | 0.13 | 2.11*• | 12.38***••• |
| Sod2_R | Sodium arsenite oxide 5 mM Rate | 33 | 0.29 | 3.15***•• | 0.09 | 2.20* | 2.11 |
| Sod1_E | Sodium arsenite oxide 3.5 mM Efficiency | 44 | 0.22 | 1.83* | 0.18 | 1.95 | 13.73***••• |
| Sod2_E | Sodium arsenite oxide 5 mM Efficiency | 43 | 0 | 3.62***••• | 0.14 | 2.40**• | 3.33 |
Certain types of variations that are over-represented among the N influential genes for all phenotypes. The statistics are odds-ratios indicating potential enrichment of certain gene categories among the influential genes. The categories are: Essential genes, genes with known paralogs, genes with known frame shift variation, genes with known stop codon variation and genes with known copy number variations in yeast. Significance at 10% is marked with *, 5% is marked with ** and 1% is marked with ***. The corresponding significance based on adjusted p-values controlling the false discovery rate (q-values) are marked with •, •• and •••, respectively.
Figure 2Overall enrichments. Certain types of variations that are over represented (positive bars) and under represented (negative) among the overall influential genes for all phenotypes. The upper panel includes the variations like essential genes, genes with known paralogs, genes with known frame shift variation, genes with known stop codon variation and genes with known copy number variations in yeast. The lower panel includes enriched Gene Ontology process terms. On the y-axis significance at 10% is marked with *, 5% is marked with ** and 1% is marked with ***. Variations are also marked with significance based on adjusted p-values (False Discovery Rate adjusted).
Enriched Gene Ontology
| Label | GO terms |
|---|---|
| Mel_R | transposition** |
| Mel_E | generation of precursor metabolites and energy***•; cellular respiration***• |
| Cup_R | cellular respiration*; transposition***••• |
| Cup_E | generation of precursor metabolites and energy***••; heterocycle metabolic process*; cellular respiration**; transposition** |
| NaC1_R | cellular respiration*; transposition***••• |
| NaC2_R | generation of precursor metabolites and energy**; cellular respiration**; transposition***••• |
| NaC1_E | generation of precursor metabolites and energy**; transposition***•• |
| NaC2_E | generation of precursor metabolites and energy**; transposition***••• |
| Mal_R | generation of precursor metabolites and energy**; cellular respiration*; transposition***• |
| Mal_E | generation of precursor metabolites and energy*; transposition***• |
| Gal_R | generation of precursor metabolites and energy*; cellular respiration* |
| Gal_E | generation of precursor metabolites and energy***•; cellular respiration***• |
| Hea1_R | generation of precursor metabolites and energy***•; heterocycle metabolic process*; vesicle organization** |
| Hea2_R | generation of precursor metabolites and energy**; transposition***• |
| Hea1_E | generation of precursor metabolites and energy**; transposition***•••; vesicle organization**• |
| Hea2_E | generation of precursor metabolites and energy***•; heterocycle metabolic process*; cellular respiration**; transposition**••• |
| Sod1_R | transposition***••• |
| Sod2_R | generation of precursor metabolites and energy*; transposition***••• |
| Sod1_E | DNA metabolic process*; generation of precursor metabolites and energy**; cellular respiration**; transposition***•• |
| Sod2_E | transposition***••• |
Enriched Gene Ontology process terms are listed. Significance at 10% is marked with *, 5% is marked with ** and 1% is marked with ***. The corresponding significance based on adjusted p-values controlling the false discovery rate (q-values) are marked with •, •• and •••, respectively.
Figure 3Distribution of genes on chromosomal positions. The distribution of all genes related to at least one phenotype over the 16 chromosomes of S. cerevisiae strain S288C. Blue tags indicate a gene on the positive strand and red tags on the negative strand.
Figure 4Biplots for NaCL1_E and Hea1_R. The biplot for NaCL1_E (NaCl 0.85 M Efficiency) in upper panel and for Hea1 R (Heat 37°C Rate) in lower panel is presented. Genes are labeled by their names in gray color and strains are indicated by red color. For the model NaCL1_E most variant strain NCYC110 is identified and is marked by the blue cloud. For Hea1_R two most variant strains NCYC110 and DBVPG6044 are identified with their related genes in a green cloud.