| Literature DB >> 15960806 |
Nils Rostoks1, Justin O Borevitz, Peter E Hedley, Joanne Russell, Sharon Mudie, Jenny Morris, Linda Cardle, David F Marshall, Robbie Waugh.
Abstract
A probe-level model for analysis of GeneChip gene-expression data is presented which identified more than 10,000 single-feature polymorphisms (SFP) between two barley genotypes. The method has good sensitivity, as 67% of known single-nucleotide polymorphisms (SNP) were called as SFPs. This method is applicable to all oligonucleotide microarray data, accounts for SNP effects in gene-expression data and represents an efficient and versatile approach for highly parallel marker identification in large genomes.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15960806 PMCID: PMC1175974 DOI: 10.1186/gb-2005-6-6-r54
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Normalization of hybridization intensity profile of 25mer probes in a probe set. The y axis is background-corrected normalized log intensity and the x-axis shows the positions of the 11 features along the unigene. Black lines trace the Golden Promise arrays, while red trace the Morex arrays. Different line types differentiate tissues. Each panel illustrates normalization for one of the major sources of variation: probe effect; probe and tissue; probe and genotype; probe, genotype and tissue; probe, genotype, tissue and genotype by tissue. 100 such plots are available from [35].
SFP false discovery rate (FDR) estimates in RNA and genomic DNA hybridization data
| RNA hybridization: 17 Golden Promise 19 Morex, 6 tissues; SAM analysis for the two-class unpaired case assuming unequal variances; s0 = 0.0342 (the 5% quantile of the s values); number of permutations, 500. Mean number of falsely called genes is computed. | Delta | p0 | Called | False | FDR |
| 0.5 | 0.95 | 27,159 | 5,884 | 0.206 | |
| 1.0 | 0.95 | 17,744 | 594 | 0.032 | |
| 1.5 | 0.95 | 13,285 | 65 | 0.005 | |
| 2.0 | 0.95 | 10,504 | 7 | 0.001 | |
| 2.5 | 0.95 | 8,583 | 0 | 0.000 | |
| Genomic DNA hybridization three replicates three genotypes; SAM analysis for the multi-class case with three classes; s0 = 0.0123 (the 25 % quantile of the s values); number of permutations: 100; mean number of falsely called genes is computed. | Delta | p0 | Called | False | FDR |
| 1 | 0.95 | 4,017 | 2,073 | 0.47 | |
| 2 | 0.95 | 1,728 | 583 | 0.31 | |
| 3 | 0.95 | 1,090 | 258 | 0.22 | |
| 4 | 0.95 | 789 | 139 | 0.16 | |
| 5 | 0.95 | 631 | 86 | 0.13 |
The Bioconductor package siggenes [37,36] was used to derive SFP calls at various thresholds in the original data and randomly permuted data according to SAM [39]. Delta, the threshold; p0, the prior probability of the proportion of SFP in the null dataset; Called, the number of SFP at each threshold; False, the number of SFP in the mean permuted dataset.
Figure 2Distribution of single-feature polymorphisms. (a) The observed d-statistics (y-axis) is plotted against the expected d-statistics (x-axis) as determined by permutations. 10,504 significant SFPs exceeding the threshold of 0.1% FDR are shown in green. (b) Histogram of d-statistics truncated at ± 10. Positive scores above the threshold 3.38 are Golden Promise SFPs, and negative scores below -3.37 are Morex SFPs.
Single feature polymorphism (SFP) comparison with sequence-characterized SNPs
| GeneChip | ||||
| mxSFP | nonSFP | gpSFP | ||
| RNA sequence | 5,301 | 240,307 | 5,203 | |
| MX | 178 | 115 | 45 | 18 |
| Non-polymorphic | 2,200 | 27 | 2,045 | 128 |
| GP | 223 | 7 | 61 | 155 |
| Chi-square = 2,049.2, df = 4, | ||||
The categories for SFP calls from RNA data are shown in columns: mxSFP, SFP in Morex; nonSFP, no SFP at the 0.1% FDR; gpSFP, SFP in Golden Promise. The categories of sequence-characterized probes are in rows: MX, polymorphism in Morex; non-polymorphic, no polymorphism between probe and any of the two genotypes; GP, polymorphism in Golden Promise. Intersections of the columns and rows indicate different combinations of sequence-verified polymorphisms and SFP.
SFP discovery in individual tissue types
| Tissue | ALL | COL | CRO | GEM | LEA | RAD | ROO |
| Replicates (GP, MX) | 18,18 | 3, 3 | 3, 3 | 3, 3 | 3, 3 | 3, 3 | 2, 4 |
| Sensitivity | 67% | 52% | 58% | 63% | 51% | 62% | 60% |
| False sequence polymorphism rate | 40% | 35% | 34% | 34% | 34% | 34% | 35% |
| % variance explained | 38% | 30% | 33% | 37% | 32% | 31% | 34% |
Replicates indicate the number of arrays from each genotype analyzed for a given tissue type. Sensitivity is a percentage of correctly predicted SFP (270; Table 2) from the number of known sequence polymorphisms (401; Table 2). False sequence polymorphism rate is the percentage of predicted SFP that were found not to contain a DNA base-pair change. The % variance explained is that from a linear model fit of genotype (-1:MX; 0: no polymorphism; 1:GP) versus SFP d-statistic.
Figure 3Effect of SNP position on SFP identification. The positions of the SNP in 25mers are shown on the x-axis as distance from the edge in nucleotides (1 - 13 nucleotides). Multiple SNP category is provided separately by a single column. The y-axis indicates total number of probes identified for each SNP position. Each bar is divided into the SFP categories - mxSFP, nonSFP and gpSFP (see Table 2), and shows that more accurate SFP identification is made for SNPs that reside at internal sites. The number of 25mers in each category is shown within the bars.
Comparison of SFP prediction in RNA and genomic DNA hybridizations
| GeneChip RNA | |||
| SFPs | nonSFPs | ||
| GeneChip gDNA | 10,504 | 240,307 | |
| SFPs | 1,090 | 114 | 976 |
| nonSFPs | 24,9721 | 10,390 | 239,331 |
| Chi-square = 107.28, df = 1, | |||
SFP and non-SFP probes in the gene-expression data are in columns, while the genomic data are in rows.