| Literature DB >> 30286710 |
Roelof J C Kluin1, Kristel Kemper2, Thomas Kuilman2, Julian R de Ruiter3,4, Vivek Iyer5, Josep V Forment6,7, Paulien Cornelissen-Steijger2, Iris de Rink1, Petra Ter Brugge3, Ji-Ying Song8, Sjoerd Klarenbeek8, Ultan McDermott9, Jos Jonkers3, Arno Velds1, David J Adams4, Daniel S Peeper10, Oscar Krijgsman11.
Abstract
BACKGROUND: Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from the murine host. The reads of murine origin result in false positives in mutation analysis of DNA samples and obscure gene expression levels when sequencing RNA. However, currently available algorithms are limited and improvements in accuracy and ease of use are necessary.Entities:
Keywords: Breast cancer; Cancer; Melanoma; Next-generation sequencing (NGS); Patient-derived xenografts (PDX); Sequencing; Xenograft
Mesh:
Year: 2018 PMID: 30286710 PMCID: PMC6172735 DOI: 10.1186/s12859-018-2353-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of XenofilteR workflow. Sequence reads (fastq) from PDX are mapped with the appropriate aligner (e.g. BWA, Tophat, STAR) to both a human and mouse reference genome. Sequence reads that only map to a single reference genome are classified to that specific organism. For seqeunce reads that map to both the human and mouse reference genome the edit distance is calculated which is defined by the number of base pairs different between the sequence read and the reference genome. Next, XenofilteR classifies the sequence reads as ‘human’ or ‘mouse’ based on the edit distance
Fig. 2Mapping of mouse DNA and RNA to the human reference genome. a: Pair-wise comparison of the number of sequence reads per exon from mouse WGS (BALB_Cj versus C57BL_6NJ) mapped to a human reference. b: Number of reads (log10) that originate from mouse that mapped to the human reference, sorted by reads count; per exon (left panel) and per gene (right). c: Number of mouse reads from WGS that mapped to the human gene PTEN. d Number of mouse reads from WGS that mapped to the human gene BCL9. e: Comparison for read count of BALB_Cj RNAseq and WGS, both mapped to a human reference. Read count is corrected for exon length. f: Comparison for exon read count of WGS and WES of mouse DNA, both mapped to a human reference. WES on mouse DNA was performed with a human-specific enrichment kit
Fig. 3The effect of mouse reads in PDX samples. a: Integrative Genome Viewer (IGV) image of exon 5 of PTEN. Top panel shows mouse DNA mapped to the human reference genome, middle panel melanoma PDX sample M005.X1 with 25% mouse stroma and bottom panel melanoma PDX sample M029.X1 with 1% of mouse stroma. Each grey horizontal line represents a single sequence read. Base pair differences between human reference genome and sequence reads (SNV) are indicated with a color (depending on the base pair change). b: Overlap between somatic SNVs detected in PDX, with high percentage mouse stroma (M005.X1), and low percentage of mouse stroma (M029.X1). c. The edit distance of sequence reads from mouse DNA aligned to a human reference genome (top panel) and from human DNA mapped to a human reference genome (bottom panel)
Fig. 4Performance of strict filtering, bamcmp, Xenome and XenofilteR on in silico mixed samples. a. Schematic overview of samples, dilutions and sequence read type for generation of the samples mixed in silico. b. Percentages of sequence reads remaining per species after filtering with strict filtering, bamcmp, Xenome and XenofilteR options for the 50:50, mouse (BALB/Cj):human (NA12878) WGS mixes. c. Variant Allele Frequency (VAF) of the SNVs in the original sample compared to unfiltered and filtered samples after in silico-mixing with mouse sequence reads. d. Venn diagrams of non-synonymous mutations in the original sample with filtered and unfiltered samples
Fig. 5Performance of XenofilteR and Xenome on PDX samples. a: Mutation calling on exome sequence data of a breast cancer PDX sample. The variant allele frequency (VAF) was plotted after filtering with XenofilteR (x-axis) and Xenome (y-axis). Plotted in black are mutations also detected in the patient sample, in green known SNPs and in red SNVs detected in the PDX only. b: Read count of each SNV used to calculate the VAF from A for Xenome and XenofilteR. c: Mutation calling on targeted sequencing of melanoma samples. In green all known SNPs are indicated, in black the remaining SNVs. d: Validation of the SNP rs7121 (GNAS) by Sanger sequencing with human-specific primers
Fig. 6XenofilteR on RNAseq data. a. Scatter plots showing the number of RNA sequence reads before and after filtering for mouse sequence reads with XenofilteR both for a sample with a high percentage of mouse stromal cells (M005.X1) and a sample with a low percentage of mouse stromal cells (M029.X1). b. Cluster analysis of PDX samples before and after XenofilteR with matched patient samples. c. Heat map of the top 250 most variable mouse genes retrieved from a dataset of 95 PDX samples. Bar graph below the heat map shows the number of mouse sequence reads. d. Gene Ontology (GO) analysis of the top 2 clusters from c. e. H&E staining of a PDX sample with adjacent mouse fat cells (left) and a PDX sample with mouse muscle cells (right)