| Literature DB >> 23157288 |
Jiantao Wu1, Krzysztof R Grzeda, Chip Stewart, Fabian Grubert, Alexander E Urban, Michael P Snyder, Gabor T Marth.
Abstract
BACKGROUND: DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.Entities:
Mesh:
Year: 2012 PMID: 23157288 PMCID: PMC3563612 DOI: 10.1186/1471-2105-13-305
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Workflow of the CNV detection method. A. Median Read Depth (MRD) is calculated for each sample, as a measure of sample coverage (NA18523 shown). B. The gene affinity is estimated for each gene as the slope of the least-square-error linear fit between MRD and RD for that gene (TRIM33 shown). C. Example of observed (magenta) and expected (green) read depth for three samples and four genes. The observed read depths were roughly half of the expected values for genes TRIM33 and NRAS, in sample NA18523, and detected as deletions.
Properties of datasets from different sequencing centers
| 117 | 352 | 161 | 93 | |
| 106 | 349 | 110 | 82 | |
| Illumina | 454 | Illumina | Illumina | |
| 0.21 | 0.30 | 0.50 | 0.72 | |
| 50 | 33 | 45 | 51 | |
| 56 ± 34 | 23 ± 12 | 70 ± 61 | 29 ± 9 | |
| 2309 ± 3166 | 106 ± 171 | 1329 ± 2053 | 977 ± 1382 | |
| 1710 ± 1073 | 97 ± 52 | 1070 ± 803 | 599 ± 164 | |
| 8174 | 8174 | 8174 | 8174 | |
| 458 (5.6%) | 458 (5.6%) | 458 (5.6%) | 458 (5.6%) | |
| 862 | 439 | 739 | 1 | |
| 29 (3.3%) | 11(2.5%) | 23(3.1%) | 0(0.0%) | |
| 7.9 ± 8.2 | 2.1 ± 1.1 | 6.4 ± 5.5 | N/A | |
| 9.4 ± 8.8 | 5.5 ± 2.3 | 7.6 ± 5.6 | N/A | |
| 0.46 | 0.20 | 0.41 | N/A | |
| 36 | 4 | 56 | N/A | |
| 17 | 0 | 11 | N/A |
Data characteristized by sequencing center and population
| | |||||||
|---|---|---|---|---|---|---|---|
| 18 | 14 | | 9 | | 51 | 14 | |
| 9/9 | 5/9 | | 5/4 | | 24/27 | 2/12 | |
| 1679 | 1701 | | 1597 | | 1617 | 1865 | |
| 36 | 36 | | 36 | | 36 | 36 | |
| | |||||||
| | | ||||||
| 40 | 62 | 78 | 16 | 108 | | 45 | |
| 20/20 | 15/47 | 38/40 | 5/11 | 51/57 | | 22/23 | |
| 178 | 131 | 171 | 243 | 128 | | 165 | |
| 258 | 323 | 339 | 300 | 336 | | 322 | |
| | |||||||
| | | | |||||
| 16 | 13 | 28 | 34 | | | 19 | |
| 9/7 | 11/2 | 12/16 | 16/18 | | | 12/7 | |
| 1623 | 1631 | 1675 | 1104 | | | 1612 | |
| 73 | 75 | 74 | 75 | 76 | |||
Population abbreviations:
CEU – Utah residents with Northern and Western European ancestry.
CHB – Han Chinese in Beijing.
CHD – Chinese in Denver, Colorado.
JPT – Japanese in Tokyo, Japan.
LWK – Luhya in Webuye, Kenya.
TSI – Toscans in Italy.
YRI – Yoruba in Ibadan, Nigeria.
Figure 2Data characteristics for the 1000 genomes exon sequencing pilot datasets. A. Principal component analysis of a “mixed” read depth matrix built with data from 3 different sequencing centers, SC (Wellcome Trust Sanger Institute), BI (Broad Institute) and BCM (Baylor College of Medicine). Each sample is represented as a point in the plot, with the first principal component plotted vs. the second principal component. Samples from different sequencing centers cluster separately from each other within this space, suggesting significant differences in the gene affinities among these three datasets. B. Distributions of MRD for each of the BCM, BI and SC samples C. Histogram of RD across all GSSs in the three datasets. D. Histogram of gene affinities across genes within each of the three datasets. E. Distributions of the RD over-dispersion factor (ODF) in our data.
Gene duplication calls in the SC dataset
| CEU | NA12348 | CD300LB | CD300 molecule-like family member b | 17 | 70030472 | 70039195 | 1.000 | 638 | 420 |
| TSI | NA20533 | CLDN10 | claudin 10 | 13 | 95003009 | 95028269 | 1.000 | 2108 | 1582 |
| CHB | NA18526 | SNRNP27 | small nuclear ribonucleoprotein 27 kDa (U4/U6.U5) | 2 | 69974621 | 69977184 | 1.000 | 530 | 383 |
| CHB | NA18532 | CES1 | carboxylesterase 1 (monocyte/macrophage serine esterase 1) | 16 | 54401930 | 54424468 | 1.000 | 501 | 337 |
| TSI | NA20752 | NOM1 | nucleolar protein with MIF4G domain 1 | 7 | 156435193 | 156455158 | 1.000 | 1335 | 966 |
| TSI | NA20796 | AHNAK | AHNAK nucleoprotein | 11 | 62040792 | 62059238 | 1.000 | 7330 | 5169 |
| TSI | NA20796 | ZNF264 | zinc finger protein 264 | 19 | 62408577 | 62416161 | 0.999 | 1276 | 888 |
| TSI | NA20801 | GPR128 | G protein-coupled receptor 128 | 3 | 101811391 | 101896535 | 0.998 | 14747 | 8265 |
| TSI | NA20772 | STX16 | syntaxin 16 | 20 | 56660469 | 56684753 | 0.998 | 2101 | 1605 |
| TSI | NA20769 | MRPS6 | mitochondrial ribosomal protein S6 | 21 | 34419511 | 34436770 | 0.998 | 1585 | 1203 |
| TSI | NA20774 | ELAVL4 | ELAV (embryonic lethal, abnormal vision, Drosophila)-like 4 (Hu antigen D) | 1 | 50383216 | 50439437 | 0.998 | 782 | 567 |
| TSI | NA20804 | CYP2A13 | cytochrome P450, family 2, subfamily A, polypeptide 13 | 19 | 46291375 | 46293686 | 0.997 | 1289 | 984 |
| TSI | NA20774 | CREB5 | cAMP responsive element binding protein 5 | 7 | 28494318 | 28825421 | 0.996 | 1435 | 954 |
| TSI | NA20796 | ZNF32 | zinc finger protein 32 | 10 | 43459504 | 43461587 | 0.996 | 911 | 646 |
| TSI | NA20520 | C6orf145 | chromosome 6 open reading frame 145 | 6 | 3668852 | 3683381 | 0.995 | 2015 | 1601 |
| CEU | NA12348 | GDNF | glial cell derived neurotrophic factor | 5 | 37851510 | 37870647 | 0.994 | 306 | 217 |
| CHB | NA18561 | PSMB4 | proteasome (prosome, macropain) subunit, beta type, 4 | 1 | 149638688 | 149640730 | 0.986 | 3461 | 2216 |
| CEU | NA12546 | DAZAP2 | DAZ associated protein 2 | 12 | 49920394 | 49922509 | 0.985 | 2265 | 1651 |
| TSI | NA20752 | AATF | apoptosis antagonizing transcription factor | 17 | 32380539 | 32488077 | 0.976 | 1157 | 843 |
| CEU | NA12749 | PAQR5 | progestin and adipoQ receptor family member V | 15 | 67439474 | 67483215 | 0.976 | 1684 | 1239 |
| TSI | NA20769 | BCL2L11 | BCL2-like 11 (apoptosis facilitator) | 2 | 111597794 | 111638279 | 0.965 | 1813 | 1435 |
| TSI | NA20804 | PILRA | paired immunoglobin-like type 2 receptor alpha | 7 | 99809603 | 99835466 | 0.909 | 962 | 752 |
| TSI | NA20589 | C8orf85 | chromosome 8 open reading frame 85 | 8 | 118019664 | 118024121 | 0.903 | 147 | 91 |
| TSI | NA20752 | CCKAR | cholecystokinin A receptor | 4 | 26092358 | 26100987 | 0.902 | 712 | 532 |
| JPT | NA18973 | HBG2 | hemoglobin, gamma G | 11 | 5278820 | 5523329 | 0.901 | 4151 | 3094 |
| TSI | NA20774 | HIPK1 | homeodomain | 1 | 114298778 | 114317657 | 0.900 | 2374 | 1626 |
| TSI | NA20774 | ODC1 | ornithine decarboxylase 1 | 2 | 10498301 | 10502609 | 0.897 | 1489 | 935 |
| TSI | NA20796 | STBD1 | starch binding domain 1 | 4 | 77446947 | 77450177 | 0.885 | 978 | 664 |
| TSI | NA20589 | CRIPAK | cysteine-rich PAK1 inhibitor | 4 | 1378300 | 1379640 | 0.877 | 76 | 38 |
| YRI | NA19189 | PSMB4 | proteasome (prosome, macropain) subunit, beta type, 4 | 1 | 149638688 | 149640730 | 0.853 | 2622 | 2090 |
| TSI | NA20774 | STX16 | syntaxin 16 | 20 | 56660469 | 56684753 | 0.811 | 949 | 704 |
| JPT | NA18980 | CES1 | carboxylesterase 1 (monocyte/macrophage serine esterase 1) | 16 | 54401930 | 54424468 | 0.788 | 1679 | 1036 |
| TSI | NA20774 | PAQR5 | progestin and adipoQ receptor family member V | 15 | 67439474 | 67483215 | 0.788 | 1048 | 676 |
| CHB | NA18561 | CRNN | cornulin | 1 | 150648694 | 150651333 | 0.778 | 4845 | 3172 |
| TSI | NA20774 | DKK4 | dickkopf homolog 4 (Xenopus laevis) | 8 | 42350775 | 42353720 | 0.760 | 493 | 362 |
| TSI | NA20589 | NOM1 | nucleolar protein with MIF4G domain 1 | 7 | 156435193 | 156455158 | 0.740 | 1052 | 801 |
| TSI | NA20769 | RNF122 | ring finger protein 122 | 8 | 33525813 | 33535831 | 0.734 | 2574 | 2004 |
| TSI | NA20796 | ZNF521 | zinc finger protein 521 | 18 | 20896674 | 21184908 | 0.721 | 3536 | 2738 |
| TSI | NA20769 | VLDLR | very low density lipoprotein receptor | 9 | 2625453 | 2631499 | 0.676 | 2092 | 1624 |
Gene deletion calls in the SC dataset
| YRI | NA18523 | BCL2L15 | BCL2-like 15 | 1 | 114225268 | 114231520 | 1.000 | 533 | 1158 |
| YRI | NA18523 | HIPK1 | homeodomain interacting protein kinase 1 | 1 | 114298778 | 114317657 | 1.000 | 2539 | 5272 |
| TSI | NA20533 | GLOD4 | glyoxalase domain containing 4 | 17 | 610163 | 632245 | 1.000 | 1322 | 2295 |
| TSI | NA20533 | C1QBP | complement component 1, q subcomponent binding protein | 17 | 5277059 | 5282317 | 1.000 | 793 | 1416 |
| TSI | NA20533 | C17orf91 | chromosome 17 open reading frame 91 | 17 | 1562414 | 1563890 | 1.000 | 369 | 574 |
| YRI | NA18523 | NRAS | neuroblastoma RAS viral (v-ras) oncogene homolog | 1 | 115052679 | 115060304 | 1.000 | 702 | 1462 |
| YRI | NA18523 | TRI3 | tripartite motif-containing 33 | 1 | 114741793 | 114808533 | 1.000 | 2610 | 5225 |
| TSI | NA20533 | TRPV3 | transient receptor potential cation channel, subfamily V, member 3 | 17 | 3363961 | 3404894 | 1.000 | 3365 | 5275 |
| TSI | NA20774 | PTMAP1 | prothymosin, alpha pseudogene 1 (gene sequence 26) | 6 | 30725671 | 30728671 | 1.000 | 132 | 260 |
| TSI | NA20796 | SNRNP27 | small nuclear ribonucleoprotein 27 kDa (U4/U6.U5) | 2 | 69974621 | 69977184 | 0.998 | 105 | 194 |
| TSI | NA20807 | HIST1H2BC | histone cluster 1, H2bc | 6 | 26231731 | 26232111 | 0.998 | 42 | 90 |
| TSI | NA20772 | ULBP1 | UL16 binding protein 1 | 6 | 150331436 | 150332954 | 0.997 | 104 | 205 |
| TSI | NA20807 | CYP2A13 | cytochrome P450, family 2, subfamily A, polypeptide 13 | 19 | 46291375 | 46293686 | 0.996 | 126 | 204 |
| YRI | NA18508 | PTMAP1 | prothymosin, alpha pseudogene 1 (gene sequence 26) | 6 | 30725671 | 30728671 | 0.992 | 145 | 230 |
| CEU | NA07000 | PSG8 | pregnancy specific beta-1-glycoprotein 8 | 19 | 47950287 | 47960273 | 0.990 | 29 | 70 |
| CEU | NA11893 | PSG8 | pregnancy specific beta-1-glycoprotein 8 | 19 | 47950287 | 47960273 | 0.985 | 43 | 86 |
| TSI | NA20771 | PTMAP1 | prothymosin, alpha pseudogene 1 (gene sequence 26) | 6 | 30725671 | 30728671 | 0.980 | 533 | 862 |
| TSI | NA20773 | CCK | cholecystokinin | 3 | 42274594 | 42280126 | 0.971 | 282 | 474 |
| CEU | NA07000 | HMGN4 | high mobility group nucleosomal binding domain 4 | 6 | 26653414 | 26653686 | 0.966 | 68 | 132 |
| CEU | NA12749 | HMGN4 | high mobility group nucleosomal binding domain 4 | 6 | 26653414 | 26653686 | 0.966 | 156 | 286 |
| TSI | NA20772 | AIF1 | allograft inflammatory factor 1 | 6 | 31692086 | 31692262 | 0.964 | 51 | 124 |
| CEU | NA12348 | DUSP10 | dual specificity phosphatase 10 | 1 | 219942377 | 219946216 | 0.962 | 155 | 242 |
| YRI | NA18508 | ULBP1 | UL16 binding protein 1 | 6 | 150331436 | 150332954 | 0.941 | 40 | 79 |
| YRI | NA18523 | PPM1J | protein phosphatase, Mg2+/Mn2+ dependent, 1 J | 1 | 113056116 | 113057756 | 0.891 | 560 | 924 |
| TSI | NA20807 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240884 | 31241803 | 0.891 | 124 | 193 |
| TSI | NA20772 | SERPINA11 | serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 11 | 14 | 93978696 | 93984864 | 0.889 | 786 | 1243 |
| CEU | NA07000 | KRT18P19 | keratin 18 pseudogene 19 | 12 | 51630379 | 51632393 | 0.887 | 85 | 174 |
| CEU | NA12348 | ULBP1 | UL16 binding protein 1 | 6 | 150331436 | 150332954 | 0.879 | 49 | 88 |
| YRI | NA18523 | RHOC | ras homolog gene family, member C | 1 | 113054308 | 113055529 | 0.867 | 557 | 955 |
| CEU | NA12348 | STBD1 | starch binding domain 1 | 4 | 77446947 | 77450177 | 0.839 | 246 | 395 |
| CEU | NA07000 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240884 | 31241803 | 0.823 | 106 | 169 |
| CEU | NA12749 | SNRNP27 | small nuclear ribonucleoprotein 27 kDa (U4/U6.U5) | 2 | 69974621 | 69977184 | 0.775 | 142 | 216 |
| TSI | NA20752 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240884 | 31241803 | 0.723 | 76 | 142 |
| TSI | NA20807 | HIST1H2BO | histone cluster 1, H2bo | 6 | 27969220 | 27969600 | 0.723 | 48 | 88 |
| TSI | NA20589 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240884 | 31241803 | 0.697 | 61 | 117 |
| TSI | NA20786 | NPSR1 | neuropeptide S receptor 1 | 7 | 34884213 | 34884321 | 0.678 | 51 | 88 |
Gene deletion calls in the BI dataset
| CHD | NA18695 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 1.000 | 166 | 337 |
| JPT | NA19066 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 1.000 | 169 | 288 |
| CHD | NA18687 | RPL27A | ribosomal protein L27a | 11 | 8661325 | 8663929 | 1.000 | 93 | 182 |
| JPT | NA18983 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 1.000 | 122 | 256 |
| JPT | NA19066 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 1.000 | 166 | 318 |
| JPT | NA19066 | RPL27A | ribosomal protein L27a | 11 | 8661325 | 8663929 | 1.000 | 106 | 203 |
| CHD | NA18687 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 1.000 | 155 | 258 |
| CHD | NA18687 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 1.000 | 156 | 285 |
| JPT | NA19054 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 1.000 | 135 | 230 |
| CHD | NA18695 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 1.000 | 194 | 371 |
| JPT | NA18960 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 1.000 | 221 | 347 |
| CHD | NA18164 | RPL27A | ribosomal protein L27a | 11 | 8661325 | 8663929 | 1.000 | 129 | 223 |
| JPT | NA19054 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 1.000 | 130 | 254 |
| CHD | NA18695 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 1.000 | 142 | 309 |
| CHD | NA18695 | RPL27A | ribosomal protein L27a | 11 | 8661325 | 8663929 | 1.000 | 128 | 238 |
| CHD | NA18695 | AKR1B1 | aldo-keto reductase family 1, member B1 (aldose reductase) | 7 | 133778020 | 133787045 | 1.000 | 310 | 554 |
| CHD | NA18164 | HAX1 | HCLS1 associated protein X-1 | 1 | 152512874 | 152514801 | 1.000 | 214 | 339 |
| CHD | NA18687 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 1.000 | 125 | 237 |
| JPT | NA19054 | HFE | hemochromatosis | 6 | 26201326 | 26202433 | 1.000 | 56 | 122 |
| JPT | NA18983 | RPL27A | ribosomal protein L27a | 11 | 8661325 | 8663929 | 0.990 | 95 | 164 |
| JPT | NA18983 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 0.990 | 147 | 232 |
| JPT | NA19561 | TRIM55 | tripartite motif-containing 55 | 8 | 67202058 | 67209944 | 0.990 | 119 | 193 |
| CHD | NA18687 | RBMS1 | RNA binding motif, single stranded interacting protein 1 | 2 | 160840394 | 160932124 | 0.990 | 334 | 575 |
| CHB | NA18757 | CRIPAK | cysteine-rich PAK1 inhibitor | 4 | 1378300 | 1379640 | 0.990 | 327 | 669 |
| JPT | NA19054 | PSAT1 | phosphoserine aminotransferase 1 | 9 | 80109471 | 80113319 | 0.980 | 140 | 253 |
| JPT | NA19066 | PSAT1 | phosphoserine aminotransferase 1 | 9 | 80109471 | 80113319 | 0.980 | 190 | 317 |
| CHD | NA18164 | TPM3 | tropomyosin 3 | 1 | 152396739 | 152422219 | 0.980 | 209 | 317 |
| JPT | NA19568 | OR8A1 | olfactory receptor, family 8, subfamily A, member 1 | 11 | 123945175 | 123946141 | 0.980 | 471 | 764 |
| JPT | NA19066 | RAN | RAN, member RAS oncogene family | 12 | 129923334 | 129926424 | 0.980 | 229 | 462 |
| CHD | NA18695 | KLHL12 | kelch-like 12 | 1 | 201128284 | 201160913 | 0.970 | 767 | 1358 |
| JPT | NA19066 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 0.970 | 154 | 265 |
| JPT | NA19066 | RPS15A | ribosomal protein S15a | 16 | 18706886 | 18707936 | 0.960 | 83 | 161 |
| CHD | NA18695 | RPS15A | ribosomal protein S15a | 16 | 18706886 | 18707936 | 0.960 | 88 | 188 |
| CHD | NA18687 | KLHL12 | kelch-like 12 | 1 | 201128284 | 201160913 | 0.960 | 621 | 1041 |
| JPT | NA18983 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 0.960 | 120 | 213 |
| JPT | NA18983 | DCTN5 | dynactin 5 (p25) | 16 | 23560365 | 23585966 | 0.960 | 177 | 298 |
| JPT | NA18983 | EIF2B5 | eukaryotic translation initiation factor 2B, subunit 5 epsilon, 82 kDa | 3 | 185500333 | 185509372 | 0.940 | 856 | 1482 |
| CHD | NA18687 | ARG2 | arginase, type II | 14 | 67187855 | 67187951 | 0.940 | 28 | 62 |
| CHD | NA18695 | PSAT1 | phosphoserine aminotransferase 1 | 9 | 80109471 | 80113319 | 0.930 | 221 | 371 |
| CHD | NA18695 | RBMS1 | RNA binding motif, single stranded interacting protein 1 | 2 | 160840394 | 160932124 | 0.900 | 442 | 750 |
| JPT | NA19561 | OR8A1 | olfactory receptor, family 8, subfamily A, member 1 | 11 | 123945175 | 123946141 | 0.890 | 254 | 466 |
| YRI | NA19247 | TIMM8B | translocase of inner mitochondrial membrane 8 homolog B (yeast) | 11 | 111461229 | 111462657 | 0.880 | 40 | 89 |
| CHD | NA18164 | POU5F1 | POU class 5 homeobox 1 | 6 | 31240357 | 31241803 | 0.850 | 226 | 349 |
| CHD | NA18164 | KLHL12 | kelch-like 12 (Drosophila) | 1 | 201128284 | 201160913 | 0.800 | 803 | 1276 |
| CHD | NA18164 | SETD8 | SET domain containing (lysine methyltransferase) 8 | 12 | 122441130 | 122455574 | 0.790 | 181 | 291 |
| CHD | NA18687 | RPS15A | ribosomal protein S15a | 16 | 18706886 | 18707936 | 0.790 | 81 | 144 |
| JPT | NA19066 | EIF2B5 | eukaryotic translation initiation factor 2B, subunit 5 epsilon, 82 kDa | 3 | 185500333 | 185509372 | 0.780 | 1137 | 1840 |
| JPT | NA19568 | GABARAPL2 | GABA(A) receptor-associated protein-like 2 | 1 | 157676173 | 157676631 | 0.760 | 254 | 476 |
| JPT | NA19560 | OR8A1 | olfactory receptor, family 8, subfamily A, member 1 | 11 | 123945175 | 123946141 | 0.750 | 614 | 1119 |
| JPT | NA19058 | RPL27 | ribosomal protein L27 | 17 | 38404294 | 38408463 | 0.730 | 356 | 518 |
| CHD | NA18699 | SDPR | serum deprivation response | 2 | 192408894 | 192419896 | 0.720 | 524 | 1033 |
| JPT | NA18983 | SPRR2G | small proline-rich protein 2 G | 1 | 151388989 | 151389210 | 0.670 | 81 | 147 |
| JPT | NA19066 | SPRR2G | small proline-rich protein 2 G | 1 | 151388989 | 151389210 | 0.670 | 105 | 182 |
| JPT | NA19066 | RBMS1 | RNA binding motif, single stranded interacting protein 1 | 2 | 160840394 | 160932124 | 0.670 | 404 | 642 |
| JPT | NA19054 | EIF2B5 | eukaryotic translation initiation factor 2B, subunit 5 epsilon, 82 kDa | 3 | 185500333 | 185509372 | 0.670 | 869 | 1470 |
| CHD | NA18695 | RAN | RAN, member RAS oncogene family | 12 | 129923334 | 129926424 | 0.660 | 290 | 539 |
Gene deletion calls in the BCM dataset
| LWK | NA19355 | MBD5 | methyl-CpG binding domain protein 5 | 2 | 148932798 | 148986980 | 0.999 | 618 | 973 |
| CHD | NA17970 | MTERFD2 | MTERF domain containing 2 | 2 | 241684086 | 241687982 | 0.996 | 255 | 393 |
| CHB | NA18618 | GABARAPL2 | GABA(A) receptor-associated protein-like 2 | 16 | 74159436 | 74168768 | 0.800 | 58 | 99 |
| CHD | NA18135 | PSMB4 | proteasome (prosome, macropain) subunit, beta type, 4 | 1 | 149638688 | 149640929 | 0.729 | 390 | 605 |
Figure 3Detected CNV events. A. Top-ranked (by posterior probability) deletion events in the SC dataset. B. Validation results for different callsets (left – without neighboring information, right – with use of neighboring information). Green denotes events positively validated either in our experiments or as known events [17]; red – calls validated negatively in our experiments; yellow – calls without validation status (not submitted for validation or validation experiments without conclusive outcomes). C. Detection sensitivity as a function of number of samples. D. Sensitivity of detecting common CNV as a function of the deleted allele frequency.
Validation results
| 4 | 2 | 7 | |
| 11 | 1 | 7 | |
| 4 | 1 | 0 | |
| 3 | 0 | 0 | |
| 0 | 10 | 3 | |
| 22 | 14 | 17 |
Figure 4Exon length distribution. A. Exon length distribution in the gene list used for our analysis. (median: 125 bp, standard deviation: 236 bp) B. Exon length distribution of the whole exome. (median: 127 bp, standard deviation: 264 bp) These two distributions are very similar to each other, suggesting our estimation of the number of events per sample is unbiased and is representative for a whole-exome study.
Figure 5Detection efficiency. A. Distributions of the detection efficiency estimated from the quality index for each gene-sample site. B. Theoretical detection efficiency (at posterior probability cutoff h = 0.65) as a function of expected read depth, plotted for various values of the over-dispersion factor. C. Histograms of the quality index (QI) distribution in the three datasets. Overall, QI was highest in SC: 9.4±8.8 (median 6.6); second highest in BI: QI = 7.6 ± 5.6 (median 6.2); and lowest in BCM: QI = 5.5 ± 2.3 (median 5.0).
Nominal prior probabilities corresponding to the range of gene region copy numbers derived from Conrad et al. [[17]]
| 0 | 6.34·10-4 |
| 1 | 2.11·10-3 |
| 2 | 9.96·10-1 |
| 3 | 5.38·10-4 |
| 4 | 6.68·10-4 |
| 5 | 3.57·10-5 |
| 6 | 7.52·10-6 |
| 7 | 1.39·10-6 |
| 8 | 3.61·10-7 |
| 9 | 4.37·10-8 |
Figure 6Analysis of genes that failed simple linear fit. Each row describes a different gene. Left panels – distribution of the ratio of RD at the GSS sites to the sample MRD. Right panels – distribution of the quality index for that gene. The non-multimodal distributions and the low quality-index values of these genes suggest that there are no common CNV events on these loci.
Figure 7Simulated Common CNVs. A. If a simple linear fit fails, the gene affinity is estimated for each gene as the slope of the least-square-error tri-linear fit between MRD and RD for that gene. B and Cr2 values of a simple linear fit (B) and a tri-linear fit (C) as a function of the deleted allele frequency.