| Literature DB >> 20802225 |
Young Seok Ju1, Dongwan Hong, Sheehyun Kim, Sung-Soo Park, Sujung Kim, Seungbok Lee, Hansoo Park, Jong-Il Kim, Jeong-Sun Seo.
Abstract
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.Entities:
Mesh:
Year: 2010 PMID: 20802225 PMCID: PMC2978381 DOI: 10.1093/nar/gkq730
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Detection of NA10851 CNVs using conjugative methods, massively parallel sequencing and ultra-high resolution CGH arrays. (a) Distribution of RD of coverage of NA10851 sequencing for CNV segments identified from CGH arrays of 73 individuals. (b) Identifying CNVs of NA10851 using RD of sequence coverage on putative regions determined by CGH arrays. (c) Examples of four categories of candidate CNVs by visual inspection. Top row, Apparent CNV; second row, Indistinct region; third row, Indeterminable region due to extremely unstable RD; fourth and bottom, Nested CNV, the CNV candidate in the bottom row is removed since it is included in the CNV illustrated in the fourth row.
Summary of massively parallel sequencing of NA10851
| Library | Read length | Insert size | Total reads | Aligned reads | Aligned bases | Genome covered | RD | Total SNPs | Total Indels |
|---|---|---|---|---|---|---|---|---|---|
| Library #1 | 2 × 36 bp | 500 bp | 557 060 528 | 499 554 933 | 35 966395 532 | 99.71% | 25.01× | 3 683 016 | 319 174 |
| Library #2 | 2 × 76 bp | 500 bp | 159 462 248 | 125 776 263 | 19 116 529 680 | ||||
| Library #3 | 2 × 101 bp | 500 bp | 101 921 217 | 81 211 013 | 16 403 136 593 | ||||
| Overall | – | – | 818 443 993 | 706 542 209 | 71 486 061 805 |
RD: read-depth of sequence coverage.
Figure 2.Personal CNVs of NA10851. (a) Personal CNV distribution throughout the entire genome. (b) Numbers and lengths of CN losses and CN gains of NA10851. (c) Size distribution of 1309 NA10851 CNV regions. (d) Repetitive context of CN gains and losses.
Figure 3.Flowchart showing the CARA for detection of reference-independent CNVs using CGH arrays.
Figure 4.Evaluation of the utility of CARA. (a) Comparisons between CNV sets of AK1 before and after application of CARA. (b) Alterations in CNV segments upon application of CARA. (c) Concordance between CGH arrays and RD of sequence coverage after application of CARA.