| Literature DB >> 31424551 |
Aideen C Roddy1, Anna Jurek-Loughrey2, Jose Souza1, Alan Gilmore1, Paul G O'Reilly1, Alexey Stupnikov1,3, David Gonzalez de Castro1, Kevin M Prise1, Manuel Salto-Tellez1, Darragh G McArt1.
Abstract
Longitudinal next-generation sequencing of cancer patient samples has enhanced our understanding of the evolution and progression of various cancers. As a result, and due to our increasing knowledge of heterogeneity, such sampling is becoming increasingly common in research and clinical trial sample collections. Traditionally, the evolutionary analysis of these cohorts involves the use of an aligner followed by subsequent stringent downstream analyses. However, this can lead to large levels of information loss due to the vast mutational landscape that characterizes tumor samples. Here, we propose an alignment-free approach for sequence comparison-a well-established approach in a range of biological applications including typical phylogenetic classification. Such methods could be used to compare information collated in raw sequence files to allow an unsupervised assessment of the evolutionary trajectory of patient genomic profiles. In order to highlight this utility in cancer research we have applied our alignment-free approach using a previously established metric, Jensen-Shannon divergence, and a metric novel to this area, Hellinger distance, to two longitudinal cancer patient cohorts in glioma and clear cell renal cell carcinoma using our software, NUQA. We hypothesize that this approach has the potential to reveal novel information about the heterogeneity and evolutionary trajectory of spatiotemporal tumor samples, potentially revealing early events in tumorigenesis and the origins of metastases and recurrences. Key words: alignment-free, Hellinger distance, exome-seq, evolution, phylogenetics, longitudinal.Entities:
Mesh:
Year: 2019 PMID: 31424551 PMCID: PMC6878956 DOI: 10.1093/molbev/msz182
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Identifying optimal parameters for use with alignment-free. Application of Jensen–Shannon divergence (JSD) and Hellinger Distance (HD) to (A) clear cell renal cell carcinoma (ccRCC) patient RMH004 with a germline sample (GL), multiple samples from the ccRCC tumor (R2-4, R8, R10) and a tumor thrombus from the renal vein (VT) and (B) glioma patient P90 with a germline sample (Normal), multiple samples from the initial grade II glioma (Initials A–F) and two samples from a recurrent grade II glioma (Recur 1A and 1B). (C) A table summarizing branch-score distance (BSD) and symmetric distance (SD) values returned when comparing trees for six patients for which both JSD and HD have been applied. (D) A bar chart summarizing BSD and SD values returned when comparing trees for six patients for which both JSD and HD have been applied. (E) Tree topologies produced using k-mer lengths 13, 15, 17, 19, 21, and 23 in combination with JSD when applying alignment-free methods to patient RMH004. (F) A heatmap representing the BSD between trees produced using varying k-mer lengths and HD applied to patient RMH004. (G) A heatmap representing the BSD between trees produced using varying k-mer lengths and JSD applied to patient RMH004. (H) A line graph representing the BSD between trees produced using increasing k-mer lengths when applying JSD.
. 2.Applying alignment-free sequence comparison methods to glioma patient P90 and ccRCC patient RMH004. (A) Simulated data set “A” created using software XS and fastx-mutate-tools to represent SNVs and indels in small scale data such as WES. (B) Simulated data set “B” created using software pIRS to represent SNVs and indels and structural variants in WGS. (C) Least-square minimum-evolution tree produced based on a binary matrix of SNVs present in the samples for P90 reused with permisson from Mazor et al. (2015). (D) An unrooted Neighbor-Joining tree produced applying our alignment-free software (NUQA), incorporating JSD, to patient P90. (E) Multidimensional scaling plot representing the distances between samples produced applying NUQA, incorporating JSD, to patient P90. (F) A maximum parsimony tree produced based on a binary matrix of SNVs present in the samples for RMH004 adapted with permission from Gerlinger et al. (2014). (G) An unrooted Neighbor-Joining tree produced applying NUQA, incorporating JSD, to patient RMH004. (H) Multidimensional scaling plot representing the distances between samples produced applying NUQA, incorporating JSD, to patient RMH004.
. 3.Benchmarking of NUQA against other alignment-free softwares. Unrooted Neighbor-Joining trees produced when applying NUQA (A), AAF (B), and kWIP (C) to a simulated data set using a k-mer length of 17 and allowing 64 GB RAM and trees produced when applying NUQA (D), AAF (E), and kWIP (F) to patient P90 using a k-mer length of 21 and allowing 64 GB RAM.