| Literature DB >> 35891790 |
G Mazzocchetti1,2, A Poletti1,2, V Solli1,2, E Borsi1, M Martello1, I Vigliotta1, S Armuzzi1,2, B Taurisano1,2, E Zamagni1,2, M Cavo1,2, C Terragna1.
Abstract
Human cancer arises from a population of cells that have acquired a wide range of genetic alterations, most of which are targets of therapeutic treatments or are used as prognostic factors for patient's risk stratification. Among these, copy number alterations (CNAs) are quite frequent. Currently, several molecular biology technologies, such as microarrays, NGS and single-cell approaches are used to define the genomic profile of tumor samples. Output data need to be analyzed with bioinformatic approaches and particularly by employing computational algorithms. Molecular biology tools estimate the baseline region by comparing either the mean probe signals, or the number of reads to the reference genome. However, when tumors display complex karyotypes, this type of approach could fail the baseline region estimation and consequently cause errors in the CNAs call. To overcome this issue, we designed an R-package, BoBafit , able to check and, eventually, to adjust the baseline region, according to both the tumor-specific alterations' context and the sample-specific clustered genomic lesions. Several databases have been chosen to set up and validate the designed package, thus demonstrating the potential of BoBafit to adjust copy number (CN) data from different tumors and analysis techniques. Relevantly, the analysis highlighted that up to 25% of samples need a baseline region adjustment and a redefinition of CNAs calls, thus causing a change in the prognostic risk classification of the patients. We support the implementation of BoBafit within CN analysis bioinformatics pipelines to ensure a correct patient's stratification in risk categories, regardless of the tumor type.Entities:
Keywords: BAF, B-allele frequency; Baseline region; Bioinformatic pipeline; Breast cancer; CN, Copy number; CNAs, Copy number alterations; CNVs, Copy Number Variations; CR, Correction Factor; Clustering methods; Copy number alteration; Data correction; F-CL, Final Chromosome List; FISH, Fluorescence In Situ Hybridization; HD, Hyperdiploidy; HR, High Risk; LOH, Loss of Heterozygosity; MM, Multiple Myeloma; Multiple myeloma; NGS, Next Generation Sequencing; R-ISS, Revised International Staging System; S-CL, Starting Chromosome List; SNP, Single-Nucleotide Polymorphism; SR, Standard Risk; WES, Whole Exome Sequencing; WGD, Whole-genome doubling
Year: 2022 PMID: 35891790 PMCID: PMC9294200 DOI: 10.1016/j.csbj.2022.06.062
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1. The diagram shows how to organize a analysis and DRrefit algorithm steps. 0) First of all, from the segmentation file, the tumor specific chromosome list has to be obtained by ComputeNormalChromosome function; 1) Next, the CN mean, weighted on the segments’ length, is calculated for each chromosomal arm, thus obtaining the global arm CN. 2) NbClust package perform the clustering procedure based on CN of chromosomal arm. 3a) The clusters, obtained from the previously step, are compared to the S-CL, determining the “winner cluster” and the following F-CL. The JABBA plot, outputted by PlotChrCluster function, is used to illustrate how the comparison works. 3b) If NbClust fails the clustering, any cluster is available for the comparison and the S-CL remains the reference list. The S-CL directly becomes the F-CL. 4) At this point, the CR can be estimated as the difference between the old baseline region (CN = 2) and the median CN value of F-CL (new baseline region). Again, a JABBA plot shows the difference between the two baseline regions. 5) The segments CN values are corrected applying the Correction faction (CR), it returns three outputs: the “Report of clustering”, where all information about the clustering procedure is reported; “Segments corrected”, a data frame with the correct CN values of segments; and a sample plot where is possible to visualize the shift of the baseline region after the correction. 6) The CR value defines three class of profiles: No changes, Recalibrated and Refitted. That information is reported in the Report of clustering data frame.
Fig. 2three samples, labeled with class identified by function. In the panel are showed the tree DRrefit classes and how they are plotted. The x-axis reports the chromosomes with their genomic position and the y-axes the copy number value. The plots with CR ≤ 0.1 show that the new segments and the old segments are orange and red colored, respectively; on the contrary, the plots with CR > 0.1 show that the new segments and the old segments are green and red colored, respectively. a) No Changes class with CR 0.0077; b) Recalibrated class with CR 0.2; c) Refitted class with CR −0.688. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3StartingChromosome list(S-CL)plot. In the x-axes are reported the chromosomal arms and in the y-axes the alteration rate. Each bar reports the alteration rate of each arm. The dotted line indicates the chromosomal alteration rate’s tolerance value, below that the chromosomes are selected for the chromosome list as they are little altered. In this plot the chromosomal arms with a tolerance value less than or equal to 15% were selected (it is also the default value of the function).
Fig. 4JABBA Plot of a multiple myeloma sample made by The sample’s clusters are pictured in as ellipses, where the area corresponds the cluster confidence interval. The chromosomal arm labels are colored based on the cluster which they belong. The x-axes report the CN value and it’s possible to see how clusters correspond to different both clonal and sub-clonal CN states.
Starting Chromosome lists (S-CLs) obtained by The function was applied to all database with different tolerance values The tolerance rates have been chosen in order to include a sufficient number of chromosomal arms with the lowest probability of alteration. The TCGA-OV chromosome list was further revised due to the high percentage of alteration per chromosome (minimum 47% - see Fig. 5) in the dataset so it is different from the function output.
| Database | Chromosome list | Tollerance rate |
|---|---|---|
| MM-BO | 1p, 2p, 2q, 4p, 4q, 8q, 10p, 10q, 12p, 12q, 16p, 17p, 17q, 20p, 20q, | 15% |
| CoMMpass | 1p, 2p, 2q, 4p, 4q, 8q, 10p, 10q, 12p, 12q, 16p, 17p, 17q, 18q, 20p, 20q, 22q | 15% |
| TCGA-BRCA | 1p, 2p, 2q, 3p, 3q, 4p, 4q, 9q, 10p, 10q, 11p, 11q, 12q, 14q, 15q, 19p, 19q, 21q | 15% |
| TCGA-OV | 1p, 3p,7p, 7q, 11p,11q, 14q, 21q | 50% |
| TCGA-LUAD | 1p, 32q, 4q, 10q, 11p | 20% |
| TCGA-COAD | 1p, 2p, 2q, 3p, 3q, 6q, 10p, 10q, 11p, 11q, 19p | 20% |
Fig. 5Starting chromosome list(S-CL)plots. In the panel are showed the S-CL histograms of ComputeNormalChromosome once performed on each database. In the y-axes is reported the alteration rate of the chromosomal arm and the dotted line highlight the tolerance rate below which the chromosomes are considered “normal”. A) MM Bologna S-CL, tolerance rate = 0.15; B) the CoMMpass S-CL, tolerance rate = 0.15; C) the TCGA-BRCA S-CL, tolerance rate = 0.15; D) the TCGA-OV S-CL, tolerance rate = 0.50; E) the TCGA-LUAD S-CL, tolerance rate = 0.20; F) the TCGA-COADS-CL, tolerance rate = 0.20.
Fig. 6Summary of . The histograms report, for each database analyzed in the present study, the percentages and the number of samples (x-axis) belonging to the three ’s classes (Refitted, Recalibrated and No changes) A) MM- BO samples; B) CoMMpass samples; C) TCGA-BRCA samples; D) TCGA-OV samples; E) TCGA-LUAD samples; F) TCGA-COAD samples.
Fig. 7Clinically relevant alterations pre and post correctionin MM-BO samples. a) Five-way Venn diagram of pre-correctionalterations and b) Five-way Venn diagram of post-correctionalterations, which allow to appreciate the number of samples that belongs to each alteration group. In the top left are indicated the number of samples without alterations; c) Sankey network diagram, on the left are represented the starting alteration groups (pre-correction, purple) and on the right the final alteration groups (post-correction, yellow). The gray bands indicate the flow of the samples, some remain in the starting group while others acquire / lose alterations and change their alteration group. The thickness of the line changes according to the number of samples in the trajectory. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
MM-BO samples stratified according to R-ISS and mSMART cytogenetic guidelines pre and post correction.
| pre HR | post HR | 86 |
| pre HR | post SR | 27 |
| pre SR | post HR | 2 |
| pre SR | post SR | 480 |
| High risk (HR) = presence 17p deletion; Standard risk (SR) = no deletion | ||
High risk (HR) = presence 17p deletion and/or 1p amplification; Strandard risk (SR) = none of the two.
Fig. 8Sankey Network diagram of MM-BO samples. The diagram shows how samples change risk class from the start profile (pre) and end profile (post) according to R-ISS cytogenetic guidelines.