| Literature DB >> 19709416 |
Wayne Wenzhong Xu1, Seungho Cho, S Samuel Yang, Yung-Tsi Bolon, Hatice Bilgic, Haiyan Jia, Yanwen Xiong, Gary J Muehlbauer.
Abstract
BACKGROUND: Single-feature polymorphism (SFP) discovery is a rapid and cost-effective approach to identify DNA polymorphisms. However, high false positive rates and/or low sensitivity are prevalent in previously described SFP detection methods. This work presents a new computing method for SFP discovery.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19709416 PMCID: PMC2746803 DOI: 10.1186/1471-2156-10-48
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Single-feature polymorphisms (SFPs) discovered in Golden Promise and Morex1
| GeneChip2 | 250,811 | 401 | 2,200 |
| SAM FDR0.13 | 15,026 | 305 | 84 |
| Weight score2.54 | 9,603 | 284 | 48 |
| Rate (%)5 | 3.82 | 70.82 | 2.18 |
1 Barley1 GeneChip transcript data obtained from two genotypes (Morex and Golden Promise) from six tissue types [47] were used in this analysis.
2 GeneChip indicates the Barley1 GeneChip which contains 250,811 probes. There are 401 polymorphic probes and 2200 non-polymorphic probes sequence verified between Morex and Golden Promise in this dataset [47].
3 SAM FDR0.1 indicates the number of SFPs detected using a false discovery rate (FDR) 0.1 in the SAM (significance analysis of microarray) analysis of the bioconductor siggenes package.
4 Weight score 2.5 indicates the number of SFPs detected after a weight score cutoff value of 2.5 was applied.
5 Rate (%) indicates the percent SFPs with a weight score cutoff of 2.5 divided by the number of polymorphic, non-polymorphic, or all probes.
Figure 1Weight score distribution of known polymorphic and non-polymorphic probes. The weight score distribution of the 305 known polymorphic probes (A) and 84 known non-polymorphic probes (B) contained in the potential SFP list that passed the cutoff false discovery rate (FDR) of 0.1 in the SAM (significance analysis of microarray) analysis. The curves with averaged counts of sequence confirmed polymorphic probes (true SFPs) and sequence confirmed non-polymorphic probes were shown (C). The probe numbers were plotted in different weight score bin intervals. Tissues: col, coleoptile; cro, seedling crown; gem, embryo from germinating seed; lea, seedling leaf; rad, radicle; roo, seedling root.
Figure 2Detection accuracy performances. Precision-Recall (PR) curve shows precision levels (y-axis) at different Recalls (or true positive rate TPR, x-axis). PASP line (blue), PILM (red), and PAOP (black) were plotted by 8, 10, and 9 selected weight score cutoffs (Additional file 4). The points with the best performances are indicated by red stars and the cutoff values reported in original methods are indicated by black stars. The Areas Under the Curve (AUC) were indicated by the bar chart.
Figure 3Single-feature polymorphism (SFP) plots. A: Probe set Contig11534_at in three Golden Promise samples (broken red) compared to three Morex samples (solid black). B: Contig11534_at in 19 Golden Promise samples (broken red) compared to 17 Morex samples (solid black) from six different tissue types. C: Contig16609_at in three barley Bowman samples (broken red) compared to three Bowman-cul2 samples (solid black). The x-axis represents the 11 PM probes. The y-axis represents log2 intensity of perfect match (PM) probes before normalization (raw data panel), the log2 normalized intensity subtracted by the expression index of that probe set (affinity panel), the probe affinity (affinity difference panel), the weight score of the SFPs after the 2.5 cutoff (weight score panel). The three SFPs detected in Contig11534_at (4,5,7) contain verified polymorphisms between barley Golden Promise and Morex (A, B). The SFP Contig16609_at probe 10 in barley Bowman-cul2 was verified by sequencing (C).
Single-feature polymorphism (SFP) discovery methods comparison
| Methods | |||||
| PASP | PILM | PILM | PAOP | PAOP | |
| probe affinity, shape power | probe intensity, linear model | probe intensity, linear model | probe affinity, outlier score | probe affinity, outlier score | |
| cutoff | weight2.5 | D3.0 FDR0%2 | D2.0 FDR0.1%1 | os pct0.152 | os pct0.051 |
| polymorphisms | 284/401 | 254/401 | 295/401 | 233/401 | 106/401 |
| non-polymorphisms | 48/2200 | 87/2200 | 155/2200 | 37/2200 | 16/2200 |
| detected SFPs | 9603 | 7150 | 10504 | 6820 | 2193 |
| TPR (%) | 70.82 | 63.34 | 73.56 | 58.10 | 26.43 |
| FPR (%) | 2.18 | 3.95 | 7.05 | 1.68 | 0.73 |
| FDR (%) | 14.46 | 25.51 | 34.44 | 13.70 | 13.11 |
| compScore | 1.00 | 0.21 | 0.15 | 0.80 | 0.28 |
The PASP (probe affinity shape power) method developed in this study was compared with the PILM (probe intensity linear model) and the PAOP (probe affinity outlier pursuit) methods using the same dataset used in the previous study [23,47]. Both the cutoff values used in original report 1 and the cutoffs at which the best performance was produced 2 in each method were tested. The 401 polymorphic and 2200 non-polymorphic sequences of Golden Promise and Morex were derived from the previously-published Barley1 GeneChip data sets [47]. The calculations for true positive rate (TPR), false positive rate (FPR), false discovery rate (FDR), detected SFPs, and compScore are described in the Materials and Methods section. CompScore is the comparison of the PASP method versus each of the other methods.
Figure 4Comparison of different single-feature polymorphism (SFP) detection methods. A: 9603 SFPs detected in Golden Promise and Morex by probe affinity shape power (PASP) method compared with 7150 SFPs detected by probe intensity linear model (PILM) method, and 6820 SFPs by probe affinity outlier pursuit (PAOP) method. The best performance cutoff values were used for each method. B: Comparison of 364 SFPs discovered by PASP in Bowman and Bowman-cul2 with 9603 SFPs discovered in Golden Promise and Morex, and 14212 SFPs discovered by PASP in Barke, Morex, Steptoe, and Oregon Wolf Barley Dominant and Recessive.
Single-feature polymorphisms (SFPs) discovered in Bowman and Bowman-cul2
| # of SFPs discovered in individual issues | |||||
| Genotype | crow | embr | infl | seed | Total |
| 76 | 112 | 104 | 79 | 183 | |
| Bowman | 64 | 98 | 97 | 63 | 181 |
| Total | 140 | 210 | 201 | 142 | 364 |
The SFPs detected in four tissues that exhibited a weight score of 2.5 or greater (crow, crown; embr, embryo; infl, inflorescence; and seed, seedling).
PCR sequence verification for 16 single-feature polymorphisms (SFPs)
| SFPs1 | Tissue2 | Genotype3 | Shape4 | Weight5 | PCR6 | Map info7 |
| Contig16609_at10 | 3 | U | 4 | 85.45 | + | NA |
| Contig4329_at3 | 4 | B | 4 | 70.77 | + | 6H,L, 67.7 cM |
| Contig4329_at6 | 2 | B | 4 | 71.33 | + | 6H,L, 67.7 cM |
| Contig5339_s_at8 | 4 | B | 4 | 65.08 | + | 6H,L, 70 cM |
| Contig7178_s_at8 | 4 | U | 3.67 | 50.82 | + | 6H,L, 70 cM |
| Contig4107_x_at11 | 4 | U | 2 | 23.01 | + | NA |
| Contig9366_at10 | 3 | U | 2.67 | 39.79 | + | NA |
| Contig159_at1 | 4 | U | 2 | 15.03 | + | NA |
| Contig159_at6 | 3 | U | 1 | 3.78 | + | NA |
| Contig9298_at7 | 2 | U | 1 | 3.64 | + | NA |
| Contig14687_at7 | 3 | B | 3 | 37.18 | + | 6H,L, 71.1 cM |
| HVSMEg0016A12r2_s_at5 | 4 | B | 4 | 75.24 | + | 6H,S, 49.4 cM |
| HVSMEg0016A12r2_s_at7 | 3 | B | 2 | 16.13 | + | 6H,S, 49.4 cM |
| HVSMEn0016F09r2_s_at7 | 1 | U | 1 | 4.48 | + | 6H,L, 71.1 cM |
| Contig2856_x_at6 | 3 | U | 1.67 | 8.16 | + | 6H,L, 60.2 cM |
| Contig8825_at7 | 3 | B | 1 | 5.48 | - | 3H,L, 55.6 cM |
1Indicates the contig and probe that the SFP was detected.
2The tissue column indicates the number of tissues out of four where the SFP was discovered.
3Genotype represents the SFP detected in Bowman (B) or in Bowman-cul2 (U).
4The shape power and 5weight score were calculated as described in the Materials and Methods section.
6The SFP was verified (+) or not verified (-) by sequencing PCR products.
7SFP map information was derived from barley EST Assembly #35 at the HarvEST website [50]. Map locations on chromosome 6H or others were designated as long (L) or short (S) arm, at length cM, unless mapping information was not available (NA).
Figure 5Map location of SFPs detected between Bowman and Bowman-. A total of 91 out of 263 single-feature polymorphism (SFP) probe sets discovered between Bowman and Bowman-cul2 mutant were mapped on barley chromosomes 1H to 7H using the map information from HarvEST: barley version 1.68 [50]. Black: Mapping of 91 SFP-containing genes in which SFPs have weight score cutoff of 2.5. Red: Mapping of 65 SFP-containing genes in which SFPs have weight score of 20.0 or greater.