| Literature DB >> 29593270 |
Daniel Vodák1,2,3, Susanne Lorenz2,3,4, Sigve Nakken2,3, Lars Birger Aasheim2, Harald Holte2,5,6, Baoyan Bai6,7, Ola Myklebost2,3,8, Leonardo A Meza-Zepeda2,3,9, Eivind Hovig10,11,12,13.
Abstract
Sample pooling enabled by dedicated indexes is a common strategy for cost-effective and robust high-throughput sequencing. Index misassignment leading to mutual contamination between pooled samples has however been described as a general problem of the latest Illumina sequencing instruments utilizing exclusion amplification. Using real-life data from multiple tumour sequencing projects, we demonstrate that index misassignment can induce artefactual variant calls closely resembling true, high-quality somatic variants. These artefactual calls potentially impact cancer applications utilizing low allelic frequencies, such as in clonal analysis of tumours. We discuss the available countermeasures with an emphasis on improved library indexing methods, and provide software that can assist in the identification of variants that may be consequences of index misassignment.Entities:
Mesh:
Year: 2018 PMID: 29593270 PMCID: PMC5871786 DOI: 10.1038/s41598-018-23563-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A schematic overview of the three main conducted experiments. The same library material was used for any given sample included in multiple experiments.
Figure 2Amplification chemistry and its relationships to (a) sample contamination estimates and (b) variant counts and PC-AF values. All values are plotted separately for each combination of tumour type and amplification chemistry represented in the analysis. The colours distinguish between variants called in “high-contamination” samples (Conpair contamination estimate > = 0.5%) and variants coming from samples with “low contamination” (Conpair contamination estimate < 0.5%). BrAmp: bridge amplification; FL: follicular lymphoma; SARC: sarcoma; DLBCL: diffuse large B-cell lymphoma; SCV: suspected contaminant variant.
Figure 3Per-sample counts of apparently true somatic variants (a) and suspected contaminant variants (b) plotted against contamination estimates. All values, including Spearman’s rank correlation coefficients and their associated p-values, are plotted separately for each combination of tumour type and amplification chemistry represented in the analysis. The colours distinguish between “high-contamination” samples (Conpair contamination estimate > = 0.5%) and samples with “low contamination” (Conpair contamination estimate < 0.5%). BrAmp: bridge amplification; FL: follicular lymphoma; SARC: sarcoma; DLBCL: diffuse large B-cell lymphoma.
Overview of possible contamination types, their consequences and suitable filtering options. PC-AF: pool-complement allelic fraction.
| Contamination type | Cause (the type of co-multiplexed samples) | Possible somatic variant calling artefacts | Prevalence of given contamination type in affected datasets | Suitable post-sequencing filtering options |
|---|---|---|---|---|
| a) Contaminant germline variants in a tumour sample | Any samples from other individuals | False positive somatic variants in the form of germline variation from other individuals | The most likely contamination type to occur; | A variant filter based on an appropriate germline variant database or a relevant panel of normal samples; |
| b) Contaminant somatic variants in a tumour sample | Other tumour samples | False positive “recurrent” somatic variants in the form of somatic variation from other tumour samples – whether from other individual(s) or the same individual | Expected to be relevant in tumour sample pools enriched** for specific somatic variants; | A filter based on PC-AF values (non-discriminative filtering might lead to false negatives of high importance) |
| c) Contaminant germline variants in a control sample | Any samples from other individuals | False negatives/missed somatic variant calls – only concerning somatic variants that also occur as germline variants | Dependent on the occurrence of important variants as both germline and somatic in a given project’s setting | Review of calls not classified as somatic, adjustment of the variant caller parameters |
| d) Contaminant somatic variants in a control sample | Any tumour samples | False negatives/missed somatic variant calls – concerning all somatic variants | Elevated relevancy when matched samples are co-multiplexed; | Review of calls not classified as somatic, adjustment of the variant caller parameters |
*Copy number loss regions of high-purity tumour samples will be especially affected.
**The enrichment will increase together with given variant’s recurrence, as well as with purity of tumour samples that carry the variant.