| Literature DB >> 33489004 |
Joanna Hui Juan Tan1,2, Say Li Kong1, Joyce A Tai1, Huay Mei Poh1, Fei Yao3, Yee Yen Sia1, Edwin Kok Hao Lim1, Angela Maria Takano4, Daniel Shao-Weng Tan4, Asif Javed1,5, Axel M Hillmer1,6,7.
Abstract
Single cell genomics offers an unprecedented resolution to interrogate genetic heterogeneity in a patient's tumour at the intercellular level. However, the DNA yield per cell is insufficient for today's sequencing library preparation protocols. This necessitates DNA amplification which is a key source of experimental noise. We provide an evaluation of two protocols using micro-fluidics based amplification for whole exome sequencing, which is an experimental scenario commonly used in single cell genomics. The results highlight their respective biases and relative strengths in identification of single nucleotide variations. Towards this end, we introduce a workflow SoVaTSiC, which allows for quality evaluation and somatic variant identification of single cell data. As proof of concept, the framework was applied to study a lung adenocarcinoma tumour. The analysis provides insights into tumour phylogeny by identifying key mutational events in lung adenocarcinoma evolution. The consequence of this inference is supported by the histology of the tumour and demonstrates usefulness of the approach.Entities:
Keywords: ADO, Allelic dropout; CNV, Copy number variation; FP, False positives; GMM, Gaussian Mixture Model; Protocol aware bioinformatics; SNV, Single nucleotide variation; Single cell genomics; Single cell somatic variant caller; TP, True positives; WGA, Whole genome amplification; Whole genome amplification
Year: 2020 PMID: 33489004 PMCID: PMC7788095 DOI: 10.1016/j.csbj.2020.12.021
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Comparison metrics for GM12878 single cells amplified by microfluidics based WGA methods for exome sequencing. (A) Summary of experimental setup for comparison of WGA methods for whole exome sequencing. Two microfluidics methods were used for this experiment and a total of five single cells evaluated for each method. (B-F) Barplots compare the mapping rate, duplication rate, target region coverage, allelic drop out (ADO) and per base false positive (FP) rate respectively. C1-GE outperforms C1-REPLI to varying degrees across majority of the core statistics related to SNV calling (panels B, C, D, and F).
Fig. 2Overview of lung cancer single cell experiment. (A) Location and histology of the tumour sectors is shown. Two different sectors and far normal tissue were evaluated in bulk and single cell sequencing. The number of single cells selected from each sector is indicated. Bar in microscopic image, 50 µm. (B) Description of quality control steps for single cells. Two iterations of Gaussian Mixture Model (GMM) were used to cluster the single cells based on exonic coverage. The low coverage clusters were removed from further analysis (cells with target coverage < 10%, and cells with coverage between 10% to < 42%). In addition, cells were removed based on the allelic dropout rate (ADO) and false negative (FN) rate. (C) Contour plot used to determine the threshold for filtering of low quality genotypes. Red indicates region enriched for false positive variants, whereas blue indicates region enriched for true positive variants. (D)The density plot shows the variant allele frequency distribution for true positive and false positive variants. (E) The flow chart shows the sequence of serial filters applied to remove germline variants. Numbers of variants that remained after each step are indicated on the right. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Lung cancer single cell analysis. (A) Venn diagram shows shared somatic point mutations between bulk tumour sectors and single cells. (B) Phylogenetic tree depicts the relationship between single cells derived from the two tumor sectors. The root of the tree (at the top, indicated by N) consists of putative normal contaminant cells without EGFR exon 19 delE746_A750 mutation. Progressing down the tree EGFR and LMNA mutations are acquired as early truncal events present in the remaining single cells below. A split in evolutionary trajectory is observed with a second (on the right) clone acquiring a high number of mutations. The size of each node is proportional to the number of cells it represents, with the color representing their source (blue from T1 and green from T2). These numbers are indicated next to each node as well. (C) Dot plot showing the number of mutations observed in cells from different sectors. The Y axis indicates the number of mutations. Blue indicate cells that were derived from T1, while green indicate cells derived from T2. Grey colour shows cells that were likely to be normal contaminants. ** indicates pvalue ≤ 0.01, *** for pvalue ≤ 0.001, and **** for pvalue ≤ 0.0001. (D) Clustering of single cells using copy number profiles. The node colors indicate the source of the cell: Red from far normal, blue from T1, and green from T2. Colour codes at the bottom represent the clustering categories: black indicates outliers, green indicates T2 cluster, blue indicates T1 cluster, while red indicates normal cluster which also includes the normal contaminant cells. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)