| Literature DB >> 30857229 |
Marina D Miller1, Eric J Devor2,3, Erin A Salinas4, Andreea M Newtson5, Michael J Goodheart6,7, Kimberly K Leslie8,9, Jesus Gonzalez-Bosquet10,11.
Abstract
In the era of large genetic and genomic datasets, it has become crucially important to validate results of individual studies using data from publicly available sources, such as The Cancer Genome Atlas (TCGA). However, how generalizable are results from either an independent or a large public dataset to the remainder of the population? The study presented here aims to answer that question. Utilizing next generation sequencing data from endometrial and ovarian cancer patients from both the University of Iowa and TCGA, genomic admixture of each population was analyzed using STRUCTURE and ADMIXTURE software. In our independent data set, one subpopulation was identified, whereas in TCGA 4⁻6 subpopulations were identified. Data presented here demonstrate how different the genetic substructures of the TCGA and University of Iowa populations are. Validation of genomic studies between two different population samples must be aware of, account for and be corrected for background genetic substructure.Entities:
Keywords: The Cancer Genome Atlas; endometrial cancer; genetic admixture; ovarian cancer; population substructure
Mesh:
Year: 2019 PMID: 30857229 PMCID: PMC6429328 DOI: 10.3390/ijms20051192
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1STRUCTURE and ADMIXTURE subpopulations structure analysis of UIHC patients. (a). ADMIXTURE analysis: Cross-validation error is minimal when K = 1; (b). STRUCTURE analysis: b.1. K method does not show any structure; b.2. Ln probability is higher for K = 1 and decreases for higher K values; both these results support the idea that UI sample of the population has no structure.
Figure 2STRUCTURE and ADMIXTURE subpopulations structure analysis of TCGA patients: (a). ADMIXTURE analysis: Cross-validation error is minimal when K = 4; (b). Bar plot of admixture results for admixture proportions organized by origin of tumor; (c). STRUCTURE analysis: K methods shows a best K = 6; (d). Bar plot of STRUCTURE results for admixture proportions organized by origin of tumor.
Figure 3STRUCTURE and ADMIXTURE subpopulations structure analysis of TCGA patients based on the origin of cancer: (a). ADMIXTURE analysis and bar plot with an optimal K = 3 subpopulation substructure for endometrial cancer patients of endometrioid type; (b). STRUCTURE analysis and bar plot with an optimal K = 2 subpopulation substructure for endometrial cancer patients of endometrioid type; (c). ADMIXTURE analysis and bar plot with an optimal K = 2 subpopulation substructure for serous ovarian cancer patients; (d). STRUCTURE analysis and bar plot with an optimal K = 4 subpopulation substructure for serous ovarian cancer patients.
Patient clinical characteristics. Data is divided by tumor type and by origin of samples (University of Iowa, UIHC, or TCGA). * Self-reported race and ethnicity.
| UIHC | TCGA | |||
|---|---|---|---|---|
|
| Ovarian | Endometrial | Ovarian | Endometrial |
|
| High grade serous | Endometrioid | High grade serous | Endometrioid |
|
| 50 | 62 | 351 | 395 |
|
| 59 | 61 | 59 | 65 |
|
| ||||
|
| 48 | 57 | 302 | 288 |
|
| 1 | 0 | 25 | 61 |
|
| 0 | 0 | 10 | 17 |
|
| 0 | 1 | 1 | 7 |
|
| 0 | 0 | 2 | 3 |
|
| 1 | 4 | 12 | 20 |
|
| ||||
|
| 0 | 0 | 8 | 9 |
|
| 49 | 58 | 201 | 275 |
|
| 1 | 4 | 142 | 111 |
|
| ||||
|
| 0 | 44 | 1 | 281 |
|
| 0 | 4 | 20 | 34 |
|
| 34 | 11 | 274 | 66 |
|
| 13 | 3 | 53 | 14 |
|
| 3 | 0 | 1 | 1 |