| Literature DB >> 25849221 |
Xi Zhao, Einar Andreas Rødland, Robert Tibshirani, Sylvia Plevritis.
Abstract
INTRODUCTION: Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25849221 PMCID: PMC4365540 DOI: 10.1186/s13058-015-0520-4
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Figure 1Effect of estrogen receptor distribution on molecular subtype assignments. The University of North Carolina (UNC) cohort is the PAM50 training cohort. Only samples with available prototypic tumor subtypes and available estrogen receptor (ER) status are shown (n = 118). In each horizontal strip, the vertical bands represent individual patients and are arranged in the same sequence for each horizontal band. First, we considered the UNC cohort, where there was a balanced ER-positive to ER-negative distribution—46% ER-positive (54/118) and 54% ER-negative (64/118)—represented by the shaded pie chart labeled “UNC cohort.” In the first strip at the top, labeled “ER status”, the ER status on the UNC cohort is depicted as dark vs. light gray, representing ER-positive vs. ER-negative cases, respectively. In the second strip, labeled “Original subtype assignment,” the original subtype assignments on the UNC cohort are shown. Next, we considered a subset of the UNC cohort (n = 75), which we created by sampling ER-positive and ER-negative cases disproportionally, with 15% ER-positive (11/75) and 85% ER-negative (64/75), as represented by the pie chart labeled “UNC subset.” In the third strip, labeled “Standard gene centering,” assigned subtypes by standard gene centering on the subset of the UNC subset, where ER is disproportionally distributed, are shown. The misclassification rate is 33.3% (25/75) compared with the first 75 bands in the second strip. In the bottom strip, labeled “Subgroup-specific gene centering,” assigned subtypes by the proposed subgroup-specific gene centering on the subset of the UNC cohort, where ER is disproportionally distributed, are shown. The misclassification rate is 5.3% (4/75). Here the classification is similar to the actual classification, shown in the first 75 cases of the second strip, labeled “Original subtype assignment.” Her2, Human epidermal growth factor receptor 2; LumA, Luminal A; LumB, Luminal B.
Figure 2Overview of subgroup-specific gene-centering algorithm. (a) Distribution of gene expression for a representative gene from the entire University of North Carolina (UNC) training cohort, with the global mean represented by the gray vertical dotted line. (b) The gene expression baseline is approximated by the global mean (gray dotted line) shown on the global distribution, represented as a mixture of estrogen receptor (ER)-positive cases (shown in pink) and ER-negative cases (shown in green). (c) and (d) The global median is located on different percentiles for the ER-positive and ER-negative cases, and each differs with respect to each subgroup mean. (e) The distribution of gene expression for the same gene in a study cohort composed of only ER-positive cases. The baseline value for subgroup-specific gene centering is estimated at the corresponding percentile of the ER-positive subgroup in the study cohort and compared with the median value, represented by the red vertical dotted line. The difference between these values is the error introduced by standard gene centering. (f) Similar to (e), but for the ER-negative subgroup.
Comparison of gene centering with a subgroup-specific strategy on the UNC prototypic tumor set
|
| |||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| Prototypic basal | Basal-like | Standard | 12 (21.1) | 8 (14) | 17 (29.8) | 16 (28.1) | 4 (7) |
| (n = 57) | Subgroup-specific | 56 (98.2) | 0 (0) | 0 (0) | 0 (0) | 1 (1.8) | |
| Prototypic HER2 | Her2-enriched | Standard | 8 (22.9) | 7 (20) | 10 (28.6) | 6 (17.1) | 4 (11.4) |
| (n = 35) | Subgroup-specific | 2 (5.7) | 27 (77.1) | 1 (2.9) | 5 (14.3) | 0 (0) | |
| Prototypic LumA | Luminal A | Standard | 7 (30.4) | 1 (4.3) | 6 (26.1) | 3 (13) | 6 (26.1) |
| (n = 23) | Subgroup-specific | 0 (0) | 0 (0) | 21 (91.3) | 1 (4.3) | 1 (4.3) | |
| Prototypic LumB | Luminal B | Standard | 2 (16.7) | 3 (25) | 3 (25) | 2 (16.7) | 2 (16.7) |
| (n = 12) | Subgroup-specific | 0 (0) | 0 (0) | 0 (0) | 12 (100) | 0 (0) | |
| Prototypic normal | Normal-like | Standard | 2 (16.7) | 2 (16.7) | 2 (16.7) | 2 (16.7) | 4 (33.3) |
| (n = 12) | Subgroup-specific | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 12 (100) | |
aHER2, Human epidermal growth factor receptor 2; LumA, Luminal A; LumB, Luminal B; UNC, University of North Carolina.
Figure 3Comparison of standard with subgroup-specific gene centering for predicting the individual molecular subtypes on the prototypic datasets. Bar plot represents the counts of the predicted subtype classes in individual prototypic tumor dataset. Her2, Human epidermal growth factor 2; LumA, Luminal A; LumB, Luminal B.
Figure 4Comparison of various data transformation strategies for predicting molecular subtypes on study cohorts with varying estrogen receptor proportions. Datasets were constructed with percentages of estrogen receptor (ER)-positive cases ranging from of 0% to 100%. The ER-positive and ER-negative samples randomly drawn from the University of North Carolina set. Error rate is plotted against the composition with respect to ER for no, standard and subgroup-specific gene-centering strategies.
Comparison of gene centering with subgroup-specific strategy on the external study breast cohorts with skewed distribution
|
| |||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| Trondheim | ER-positive | Standard | 10 (20.8) | 8 (16.7) | 9 (18.8) | 12 (25.0) | 9 (18.8) |
| (n = 48) | Subgroup-specific | 2 (4.2) | 4 (8.3) | 23 (47.9) | 16 (33.3) | 3 (6.2) | |
| TNBC | Triple-negative | Standard | 28 (36.4) | 9 (11.7) | 19 (24.7) | 12 (15.6) | 9 (11.7) |
| (n = 77) | Subgroup-specific | 63 (81.8) | 6 (7.8) | 3 (3.9) | 3 (3.9) | 2 (2.6) | |
aER, Estrogen receptor; LumA, Luminal A; LumB, Luminal B; TNBC, Triple-negative breast cancer.