| Literature DB >> 20525196 |
Irina Ostrovnaya1, Gouri Nanjangud, Adam B Olshen.
Abstract
BACKGROUND: Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (DGV) contains a list of all known CNVs, there is no standard methodology to use the database effectively.Entities:
Mesh:
Year: 2010 PMID: 20525196 PMCID: PMC2897829 DOI: 10.1186/1471-2105-11-297
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of CNVs and CNAs. One chromosome is shown. Segments in blue are found only in unsmoothed data. The upper panel contains tumor, while the lower panel is a matching normal sample. CNVs have either a matching segment in the normal sample identified by a segmentation algorithm or matching significantly extreme log-ratios identified by a permutation test. Regions of the normal sample corresponding to CNAs are normal.
Definition of predictors.
| Variable-Definition |
|---|
| Height - absolute value of the candidate segment mean |
Univariate results by logistic regression, training and test sets combined.
| Smoothed CBS | GLAD | Unsmoothed CBS | ||||
|---|---|---|---|---|---|---|
| Height | 3.95E - 01 | 1.16E - 20 | 4.88E - 01 | 7.93E - 27 | 8.88E - 01 | 6.47E - 117 |
| Relative height | 7.14E - 02 | 3.36E - 17 | 9.34E - 02 | 9.75E - 26 | 1.64E - 01 | 7.79E - 102 |
| Break | -3.43E - 01 | 1.06E - 24 | -3.39E - 01 | 2.47E - 26 | -3.42E - 01 | 3.33E - 36 |
| Close to other candidates | -1.19E + 00 | 2.17E - 32 | -8.06E - 01 | 6.78E - 17 | -1.17E + 00 | 1.19E - 37 |
| Overlap with CNAs | -7.10E - 02 | 1.24E - 26 | -6.67E - 02 | 6.88E - 25 | -5.07E - 02 | 9.44E - 22 |
| Database score | 3.06E - 01 | 4.08E - 306 | 3.06E - 01 | 1.98E - 323 | 2.27E - 01 | 8.60E - 186 |
| Database score II | 9.79E - 03 | 3.19E - 159 | 9.35E - 03 | 2.16E - 167 | 5.90E - 03 | 5.25E - 74 |
| Overlap w. other pts: % | 8.89E + 00 | 4.98E - 254 | 8.02E + 00 | 8.64E - 276 | 5.45E + 00 | 1.71E - 158 |
| Matching bkpt in other: % | 17.31 | 2.58E - 276 | 13.13 | 1.11E - 246 | 9.14 | 3.62E - 178 |
| Overlap w. other pts - GG | 3.42E - 01 | 3.63E - 199 | 3.37E - 01 | 1.04E - 208 | -1.70E - 01 | 6.08E - 258 |
| LG | 2.85E + 00 | 2.85E + 00 | 2.29E + 00 | |||
| LL | 1.68E + 00 | 1.73E + 00 | 2.13E + 00 | |||
| Closeness to centromere | 8.41E - 01 | 2.55E - 09 | 7.98E - 01 | 2.72E - 09 | 7.15E - 02 | 5.95E - 01 |
| Closeness to telomere | 4.24E - 01 | 2.15E - 04 | 5.78E - 01 | 2.46E - 07 | -1.55E - 01 | 1.27E - 01 |
| Length | -2.46E - 06 | 2.14E - 130 | -1.78E - 06 | 9.65E - 94 | -2.65E - 06 | 1.27E - 183 |
| Dat. score of other cand. | 5.98E - 02 | 6.69E - 17 | 5.45E - 02 | 9.41E - 14 | 5.17E - 02 | 9.02E - 11 |
| Percent of Normal | 3.40E + 00 | 5.29E - 59 | 2.70E + 00 | 3.35E - 50 | 2.78E + 00 | 1.72E - 60 |
| Segmental duplication | 6.81E - 01 | 5.71E - 11 | 7.16E - 01 | 1.51E - 13 | 1.79E - 01 | 5.85E - 02 |
| Sign | -2.74E - 01 | 5.10E - 14 | -2.62E - 01 | 2.73E - 13 | -7.05E - 01 | 2.06E - 107 |
| Surrounded by Normals | 1.30E + 00 | 2.28E - 23 | 1.07E + 00 | 5.02E - 23 | 1.31E + 00 | 7.06E - 32 |
Figure 2Fitted CART models.
Prediction rates: A - test set, B - CGH against self-reference (all CNAs), C - normal tissue (all CNVs).
| CBS smoothed | GLAD | CBS unsmoothed | |||||||
|---|---|---|---|---|---|---|---|---|---|
| A | B | C | A | B | C | A | B | C | |
| CART-full model | 0.86 | 0.79 | 0.90 | 0.83 | 0.91 | 0.78 | 0.80 | 0.66 | 0.92 |
| RF - full model | 0.87 | 0.82 | 0.95 | 0.86 | 0.91 | 0.87 | 0.84 | 0.77 | 0.92 |
| CART- database only | 0.85 | 0.88 | 0.81 | 0.83 | 0.89 | 0.80 | 0.72 | 0.34 | 0.99 |
| RF - no database | 0.85 | 0.79 | 0.94 | 0.86 | 0.89 | 0.85 | 0.82 | 0.74 | 0.95 |
| RF - one array | 0.85 | 0.81 | 0.97 | 0.84 | 0.91 | 0.89 | 0.83 | 0.75 | 0.96 |
Counts from the accuracy table of the test set.
| CBS smoothed | GLAD | CBS unsmoothed | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TN | FN | FP | TP | TN | FN | FP | TP | TN | FN | FP | TP | |
| CART-full model | 654 | 54 | 182 | 793 | 822 | 145 | 157 | 614 | 613 | 77 | 455 | 1541 |
| RF - full model | 659 | 42 | 177 | 805 | 828 | 93 | 151 | 666 | 752 | 124 | 316 | 1494 |
| CART- database only | 699 | 120 | 137 | 727 | 804 | 122 | 175 | 637 | 364 | 37 | 704 | 1581 |
| RF - no database | 644 | 57 | 192 | 790 | 824 | 91 | 155 | 668 | 686 | 112 | 382 | 1506 |
| RF - one array | 647 | 59 | 189 | 788 | 832 | 125 | 147 | 634 | 729 | 120 | 339 | 1498 |
(TN, True Negatives, are true CNAs predicted to be CNAs; FN, False Negatives, are true CNVs predicted to be CNAs; FP, False Positives, are true CNAs predicted to be CNVs; TP, True Positives, are true CNVs predicted to be CNVs)
Relative importance of variables in random forest models as measured by Gini index (higher is more important).
| CBS smoothed | GLAD | CBS unsmoothed | ||||
|---|---|---|---|---|---|---|
| Variable | w. DS | w/o DS | w. DS | w/o DS | w. DS | w/o DS |
| Relative height | 60.33 | 79.86 | 81.26 | 100.96 | 101.99 | 119.94 |
| Break | 39.98 | 51.92 | 48.57 | 65.75 | 60.33 | 72.19 |
| Close to other candidates | 8.27 | 10.83 | 4.18 | 6.90 | 6.19 | 8.20 |
| Overlap with CNAs | 17.21 | 24.15 | 17.87 | 27.10 | 26.16 | 34.09 |
| Database score | 165.44 | 206.34 | 108.89 | |||
| Overlap w. other pts: % | 70.11 | 107.05 | 95.23 | 129.03 | 74.62 | 93.67 |
| Matching bkpts in other: % | 86.67 | 108.45 | 94.02 | 135.80 | 116.34 | 116.83 |
| Overlap with other pts | 39.91 | 59.70 | 42.18 | 68.84 | 86.42 | 98.28 |
| Closeness to centromere | 4.37 | 5.82 | 3.74 | 6.07 | 4.64 | 7.17 |
| Closeness to telomere | 4.65 | 6.53 | 3.31 | 5.52 | 6.04 | 7.11 |
| Length | 112.20 | 133.69 | 64.26 | 92.25 | 170.78 | 182.46 |
| Dat. score of other cand. | 30.23 | 32.57 | 44.51 | |||
| Percent of Normal | 64.10 | 84.05 | 55.97 | 76.26 | 77.18 | 93.82 |
| Segmental duplication | 3.19 | 7.77 | 3.48 | 9.53 | 5.16 | 9.34 |
| Sign | 7.84 | 11.01 | 8.19 | 12.42 | 22.59 | 27.77 |
| Surrounded by Normals | 1.87 | 4.67 | 5.03 | 6.66 | 3.34 | 5.21 |
"'W. DS"' stands for the model that includes Database score (DS), "'w/o DS"' stands for the model where it was excluded.