| Literature DB >> 29178830 |
Jonathan A Griffiths1, Antonio Scialdone2,3, John C Marioni4,5,6.
Abstract
BACKGROUND: Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as "mosaic" aneuploidy. A cell that harbours an aneuploidy exhibits disrupted gene expression patterns which can alter its behaviour. However, detection of aneuploidies using conventional single-cell DNA-sequencing protocols is slow and expensive.Entities:
Keywords: Aneuploidy detection; Copy-number; RNAseq; Single-cell
Mesh:
Year: 2017 PMID: 29178830 PMCID: PMC5702132 DOI: 10.1186/s12864-017-4253-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Successful detection of aneuploidies from scRNA-seq data. a Overview of the method. Cells with aneuploid chromosomes (purple and green) have altered levels of transcription of genes on the affected chromosome (less and more, respectively). For a given chromosome and cell, we compute a score for how deviant the overall expression of genes on that chromosome is relative to that in other cells. b We applied our method to 8-cell stage mouse embryos that were sequenced via a parallel genome and transcriptome method (G&T-seq). Our method performs well compared to the ground truth provided by genomic sequencing (sensitivity 78.0%, specificity 99.5%, FDR 11.4%). The chromosome with high Z-score in embryo F is not called as aneuploid as it does not pass an effect size threshold (“Methods” section)
Fig. 2High variability in gene expression levels compromises performance. a Our method performs less well on cell-line G&T-sequencing data than on the mouse embryos. All cells were considered for the 8-cell embryos and HCC38-BL data. Trisomy 21 cells were downsampled to a ratio of 1 T21 cell : 4 control cells (normal ploidy chromosome 21), to ensure that these aneuploidies were in the minority and therefore can be detected. b The datasets with poor performance show more variable gene expression profiles. For 500 genes selected at random from each dataset (navy: HCC38-BL cell line; yellow: Reversine-treated 8-cell embryos; cyan: trisomy 21 iPS derived neurons) we plot the (log) standard deviation of expression (y−axis) against the (log) mean expression (x−axis). A linear model was fitted separately for each dataset using genes with a median count (per million reads) of at least 50 and overlaid. c Simulated datasets with different dispersion parameters are shown. We simulated four datasets to assess the impact of gene expression variability on the performance of our method. Genes from each simulation are shown and the different dispersion parameters used in the simulation are noted. The regression lines from the fit in 2b are overlaid. d As the data become more variable, the performance of our method degrades. For simulations with variability comparable to the HCC38-BL and Trisomy 21 neuron datasets, the sensitivity and precision are considerably impacted. Reported values are the mean of 10 simulations
Fig. 3Allele-specific expression datasets support the method’s aneuploidy calls. a Chromosomes with higher-confidence aneuploidy calls show greater allele-specific expression (ASE) deviation than cells that are not called (p<10−9, Mann-Whitney U test). The ASE deviation was corrected for systematic ASE differences for each embryo and for each autosome (Methods). Z-scores to the right of the dashed line are significant after FDR-correction. Values above Z-score bins are the number of chromosomes in each bin. b Trisomic and monosomic chromosomes show greater ASE deviation than chromosomes with normal-ploidy (p<10−8 and p<10−2, respectively; Mann-Whitney U test). Black circles indicate the mean deviation. Violin plots are area normalised. Values above the plots are the number of chromosomes considered