| Literature DB >> 18366610 |
Wei Li1, Jason S Carroll, Myles Brown, x Shirley Liu.
Abstract
BACKGROUND: The ability to rapidly map millions of oligonucleotide fragments to a reference genome is crucial to many high throughput genomic technologies.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18366610 PMCID: PMC2386063 DOI: 10.1186/1471-2164-9-S1-S20
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
xMAN probe mapping of various Affymetrix tiling arrays to the most recent genome assembly
| Affymetrix tiling arrays | #UniqSeq | #Seq. MEntries | #Query Entries | #Seq. MGenomeMatches | #Seq.NoGenomeMatch | #Total Entries |
| Human ENCODE 1.0a | 721,043 | 14,322 | 756,555 | 16,444 | 506 | 884,634 |
| Human Chr21/22a | 979,553 | 21,930 | 1,054,324 | 49,316 | 75 | 1,627,746 |
| Human Promoterb | 4,220,999 | 40,099 | 4,275,079 | 251,460 | 2,537 | 5,706,819 |
| Human Tiling 1.0 & 2.0b | 41,370,900 | 301,947 | 41,782,720 | 1,215,226 | 13,120 | 48,332,137 |
| Mouse Tiling 1.0 & 2.0c | 38,788,060 | 431,551 | 39,576,383 | 993,890 | 437,877 | 51,036,801 |
| Mouse Promoterc | 4,096,798 | 30,835 | 4,154,546 | 192,119 | 37,483 | 5,716,068 |
| Arabidopsis thaliana 1.0d | 3,046,178 | 7,275 | 3,053,686 | 164,728 | 0 | 3,772,912 |
Table head line:
#UniqSeq: Number of unique 25-mer in the original BPMAP file
#Seq.MEntries: Number of 25-mer with multiple spots in the array
#QueryEntries: Number of array spots in the original BPMAP file
#Seq.MGenomeMatches: Number of 25-mer with multiple genomic copies
#Seq.NoGenomeMatch: Number of 25-mer with no match in the genome
#TotalEntries: Number of total entries in the xMAN mapping
aThe original Affymetrix probe mapping is from the NCBIv33 human genome. The new xMAN probe mapping is based on NCBIv35 human genome.
bThe original Affymetrix probe mapping is from the NCBIv34 human genome. The new xMAN probe mapping is based on the NCBIv35 human genome.
cThe original Affymetrix probe mapping is from the NCBIv33 mouse genome. The new xMAN probe mapping is based on the NCBIv35 mouse genome.
dBoth the original Affymetrix probe mapping and xMAN probe mapping are based on the TIGRv5 Arabidopsis genome.
Figure 1Copy number histogram of ~42 million probes on the Affymetrix human genome 1.0 tiling arrays. Only probes with more than one match in the genome are shown.
Whole genome ER ChIP-chip results based on either Affymetrix or xMAN probe mapping under different FDR thresholds
| FDR thresholds (%) | 0 | 1 | 2 | 5 |
| Affymetrixa | 2,646 (2,312) | 5,221 (4,572) | 5,714 (4,993) | 7,293 (6,413) |
| xMANa | 3,281 (2929) | 6,544 (5,820) | 7,563 (6,760) | 8,890 (7,925) |
| Shared Regionsb | 2,475 (2,217) | 5,006 (4,481) | 5,436 (4,876) | 6,871 (6,184) |
| Percentage of Shared Regionsc | 93.5 (95.9) | 95.9 (98.0) | 95.1 (97.6) | 94.2 (96.5) |
aThe numbers of ChIP-regions identified by MAT are shown in the table. A ChIP-region is annotated as repeat if more than 70% of the region is within RepeatMasker repeats, simple repeats, or segmental duplications. The numbers of non-repeat regions are shown in the parentheses.
bChIP-regions identified from Affymetrix_NCBIv34 probe mapping were converted into NCBIv35 version using LiftOver program (). Two regions are considered the same if they overlap by more than 50%.
cThe percentage of shared regions was defined as the number of shared regions divided by the number of regions identified using Affymetrix probe mapping.
Figure 2Characterization of ER binding sites identified only through xMAN probe mapping. Standard ChIP assays of ER were performed with anti-ER antibody. Immunoprecipitated DNA was quantified by qPCR using primers spanning 10 randomly selected regions identified by MAT only with xMAN probe mapping. The results are shown as vehicle (control, white bars) or estrogen (black bars) fold enrichment over input and are the average of three replicates±SE. The 10 regions are provided as NCBIv35 chromosomal coordinates: Site 1(chr10:94549192-94550690), Site 2(chr11:46253821-46255330), Site 3(chr11:100812061-100813479), Site 4(chr4:188058695-188060091), Site 5(chr5:52253648-52254807), Site 6(chr5:133383322-133384523), Site 7(chr7:150991115-150992368), Site 8(chr8:88996063-88997399), Site 9(chr8:99413611-99415262), Site 10(chr8:102555348-102556385).
Figure 3ROC-like curve for ENCODE spike-in data using either xMAN or Affymetrix probe mapping. We applied MAT with either xMAN or Affymetrix probe mapping to the spike-in data. xMAN and Affymetrix mapping achieved 100% and 96% True Positive Rate (TPR) at 0% False Discovery Rate (FDR) cutoff, respectively. A MAT prediction is considered correct if the center of the predicted region lies in the actual spike-in fragment. Please note that False Discovery Rate instead of False Positive Rate is used in this ROC-like curve.
Figure 4Estrogen receptor whole-genome ChIP-chip experiment using xMAN or Affymetrix probe mapping. We applied MAT with xMAN or Affymetrix probe mapping to the estrogen receptor ChIP-chip experiment on Affymetrix human genome 1.0 tiling array set, which consists of 14 arrays covering the non-repetitive human genome at 35 bp resolution.
A) MATscore histogram: The standard deviations estimated from the background NULL distribution are 1.07 and 1.09 using xMAN and Affymetrix probe mapping, respectively. Only the bottom part of the histogram was shown.
B) Scatter plot of false discovery rate (FDR) versus number of true positives. Under each cutoff, the number of true positive is estimated as the number of positive peaks minus the number of negative peaks; the FDR is estimated as the number of negative peaks divided by number of positive peaks. Under the same FDR cutoff, MAT predicts more true positive peaks using xMAN probe mapping than using Affymetrix probe mapping.
Figure 5xMAN mapping example. In this example, we provide 4 entries in the query (A1, B, A2 and C). A1 and A2 have exactly the same sequence, so they will be mapped to exactly the same genomic position(s). B has 3 copies and C has no match in the genome. The xMAN statistics for this example are: NumUniqSeq: 3; NumSeq.MEntries: 1; NumQueryEntries: 4; NumSeq.MGenomeMatches: 1; NumSeq. NoGenomeMatch: 1; NumTotalEntries: 5.