| Literature DB >> 22102592 |
Daniel E Newburger1, Georges Natsoulis, Sue Grimes, John M Bell, Ronald W Davis, Serafim Batzoglou, Hanlee P Ji.
Abstract
Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22102592 PMCID: PMC3245143 DOI: 10.1093/nar/gkr973
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schema for target-specific capture and amplification by selective genomic circularization. This schema for the Natsoulis et al. (3) capture protocol describes the major steps for conducting capture and amplification of a target region. The light blue squiggles at the top of the figure indicate restriction enzyme recognition sites that are cut by the addition of a single restriction enzyme. ROI stands for region of interest (i.e. target region), green bars indicate capture arms, green circles indicate capture arm hybridization sites and red bars indicate universal primer sequence. The protocol described by this figure is performed separately for each restriction enzyme.
Summary statistics for all capture oligonucleotides designed to target human genome Build 37/hg19
| Statistics for whole genome capture | BfaI | CviQI | MseI | Sau3AI | Total |
|---|---|---|---|---|---|
| Tier 1 only | |||||
| Total number of oligos | 4 049 706 | 2 999 049 | 4 825 988 | 3 246 400 | 15 121 143 |
| Average capture length (bases) | 401 | 483 | 269 | 430 | 381 |
| Total bases covered (megabases) | 1614 | 1441 | 1295 | 1388 | 2 311 |
| Percent of genome covered | 52.14 | 46.54 | 41.83 | 44.83 | 74.64 |
| Percent of oligos with U0 > 1 | 4.71 | 5.13 | 5.08 | 5.06 | 4.99 |
| Percent of oligos with paralogs > 0 | 0.07 | 0.07 | 0.06 | 0.07 | 0.07 |
| Percent of genome covered with paralogs removed | 52.10 | 46.50 | 41.80 | 44.80 | 74.60 |
| Tiers 1, 2 and 3 combined | |||||
| Total number of oligos | 5 787 809 | 4 362 946 | 6 757 372 | 4 938 767 | 21 846 894 |
| Average capture length (bases) | 410 | 496 | 280 | 426 | 391 |
| Total bases covered (megabases) | 2160 | 1978 | 1760 | 1938 | 2852 |
| Percent of genome covered | 69.79 | 63.89 | 56.85 | 62.61 | 92.14 |
| Percent of oligos with U0 > 1 | 23.99 | 24.60 | 23.23 | 28.60 | 24.92 |
| Percent of oligos with paralogs > 0 | 6.96 | 6.48 | 7.25 | 8.90 | 7.39 |
| Percent of genome covered with paralogs removed | 64.91 | 59.41 | 52.32 | 57.67 | 88.43 |
Tier 1 oligonucleotides are the subset of targeting molecules generated with the strictest repeat masking parameters based upon k-mer mapability. Tiers 1, 2 and 3 represent all oligonucleotides in the database. This table illustrates that the looser mapability masking parameters used in Tiers 2 and 3 allowed for increased coverage but with a higher probability of having off-target binding and amplification.
Summary statistics describing the in silico percent capture of CCDS regions by the combined set of oligonucleotide probes
| Statistics for CCDS capture for all tiers | BfaI | CviQI | MseI | Sau3AI | Total |
|---|---|---|---|---|---|
| Total number of oligos covering CCDS target area | 182 483 | 178 338 | 158 445 | 200 019 | 719 285 |
| Average capture length (bases) | 521 | 550 | 419 | 489 | 497 |
| Total bases covered (megabases) | 25.286 | 23.270 | 22.162 | 24.04 | 31.70 |
| Percent of CCDS covered | 79.49 | 73.15 | 69.67 | 75.58 | 99.65 |
| Percent of oligos with paralogs > 0 | 2.89 | 2.85 | 3.00 | 3.03 | 2.94 |
| Percent of CCDS covered with paralogs removed | 76.96 | 70.85 | 67.36 | 73.02 | 97.12 |
Exonic regions prove possible to capture with high sensitivity and specificity due to their high k-mer complexity.
Figure 2.In silico coverage by the set of capture oligonucleotides from the Human OligoGenome Resource. Coverage is across (a) the whole genome and (b) the regions defined by CCDS in each successive tier of 24-mer repeat masking. Tier 1 oligonucleotides are the subset of targeting molecules generated with the strictest repeat masking parameters based upon k-mer mapability. Tiers 1, 2 and 3 represent all oligonucleotides in the database. The restriction enzyme count on the x-axis is the number of restriction enzymes for which the OligoGenome database contains an oligonucleotide that can capture a given base. Zero depth indicates the set of positions for which no capture oligonucleotides exist. As expected, fewer repeat mask restrictions lead to a greater number of positions covered by multiple restriction enyzmes' oligonucleotides.
Figure 3.A brief overview of the OligoGenome website and its query tools. You may (a) download all capture oligonucleotides directly or (b) search for capture oligos that target a specific interval entered on the page or a set of intervals uploaded in bed format. (c) After the submission of queried regions, you may view the returned capture oligonucleotides on the website, download the table in bed format, or export the results to the UCSC Genome Browser to view as a track. (d) Additionally, clicking an oligo name will bring you to a page with additional information, including the full 80-bp capture oligonucleotide.