| Literature DB >> 28809811 |
Zura Kakushadze1,2, Willie Yu3.
Abstract
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.Entities:
Keywords: DNA; K-means; cancer signatures; clustering; correlation; covariance; eRank; exome; genome; industry classification; machine learning; matrix; nonnegative matrix factorization; quantitative finance; sample; somatic mutation; source code; statistical risk model
Year: 2017 PMID: 28809811 PMCID: PMC5575665 DOI: 10.3390/genes8080201
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096