| Literature DB >> 29304755 |
Maurizio Callari1, Ankita Sati Batra1, Rajbir Nath Batra1, Stephen-John Sammut1, Wendy Greenwood1, Harry Clifford1, Colin Hercus2, Suet-Feung Chin1, Alejandra Bruna1, Oscar M Rueda1, Carlos Caldas3.
Abstract
BACKGROUND: Patient-Derived Tumour Xenografts (PDTXs) have emerged as the pre-clinical models that best represent clinical tumour diversity and intra-tumour heterogeneity. The molecular characterization of PDTXs using High-Throughput Sequencing (HTS) is essential; however, the presence of mouse stroma is challenging for HTS data analysis. Indeed, the high homology between the two genomes results in a proportion of mouse reads being mapped as human.Entities:
Keywords: Alignment; High throughput sequencing; ICRG; In silico combined human-mouse reference genome; Mouse stroma; Patient-derived tumour xenografts; Short-reads
Mesh:
Year: 2018 PMID: 29304755 PMCID: PMC5755132 DOI: 10.1186/s12864-017-4414-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Alignment of WES data from the human-mouse DNA dilution series
| % of human DNA | %of mouse DNA | Replicate | Alignment efficiency human genome % | Alignment efficiency combined genome (%) | %of reads mapped on the human genome | %of reads mapped on the mouse genome | Estimated human DNA content |
|---|---|---|---|---|---|---|---|
| 100 | 0 | a | 90.81 | 90.64 | 99.98 | 0.02 | 99.89 |
| 100 | 0 | b | 90.38 | 90.16 | 99.98 | 0.02 | 99.88 |
| 100 | 0 | c | 90.26 | 90.13 | 99.98 | 0.02 | 99.89 |
| 90 | 10 | a | 88.57 | 90.43 | 97.59 | 2.41 | 90.22 |
| 90 | 10 | b | 88.53 | 90.39 | 97.58 | 2.42 | 90.16 |
| 50 | 50 | a | 75.65 | 89.37 | 83.14 | 16.86 | 50.73 |
| 50 | 50 | b | 74.90 | 89.71 | 82.18 | 17.82 | 49.01 |
| 50 | 50 | c | 75.41 | 89.55 | 82.78 | 17.22 | 50.07 |
| 25 | 75 | a | 58.07 | 89.25 | 62.44 | 37.56 | 23.97 |
| 25 | 75 | b | 60.25 | 89.48 | 64.58 | 35.42 | 26.17 |
| 0 | 100 | a | 7.01 | 88.77 | 0.10 | 99.90 | 0.00 |
| 0 | 100 | b | 7.17 | 88.56 | 0.10 | 99.90 | 0.00 |
| 0 | 100 | c | 6.65 | 88.95 | 0.08 | 99.92 | 0.00 |
Fig. 1Use of the ICRG in WES data. a IGV plot of PTEN exon 5 (WES data from 25% human/75% mouse DNA sample). Top panel: bam files after alignment to the HRG. Bottom panel: bam files after alignment to the ICRG. Mismatching bases are highlighted using the corresponding colour code (A = green, C = blue, G = orange, T = red). b Correlation plot between the percentage of reads mapped to the human genome and percentage of human DNA content in the sample. The solid line shows the calibration curve fitted to the data using penalized regression splines and grey dashed lines show the standard error. c Prediction of mouse DNA content in primary human and PDTX samples using the calibration curve in (b)
Fig. 2Impact of mouse reads on somatic mutation calling. a Bar plots showing numbers of somatic mutations identified in clinical tumours and matched PDTXs after alignment against either the HRG or the ICRG. Within each pair of clinical tumour and PDTX (n = 10), mutations were classified as ‘tumour specific’ (i.e. present in the tumour but not in the matched PDTX), ‘PDTX specific’ (i.e. present in the PDTX but not in the originating clinical tumour) and common (present in both tumour and PDTX). b Bar plots showing VAFs for all ‘PDTX specific’ mutations identified in the 10 pairs in (a) in the 100% mouse sample. Left panel- data aligned to the HRG; Right panel- data aligned to the ICRG
Alignment of RRBS data and CpG quantification
| Sampler | HRG | ICRG | Common human CpGs (%) | |||||
|---|---|---|---|---|---|---|---|---|
| Alignment efficiency (%) | n. of human CpGsa | Alignment efficiency (%) | Reads mapped to the human genome (%) | Reads mapped to the mouse genome (%) | n. of human CpGsa | n. of moue CpGsa | ||
| Case_1_Normal | 72.2 | 2,317,726 | 72.2 | 99.9 | 0.1 | 2,315,989 | 102 | 99.9 |
| Case_1_Tumuor | 74.1 | 2,676,050 | 74.1 | 99.9 | 0.1 | 2,674,961 | 119 | 99.9 |
| Case_1_PDTX3 | 71.5 | 3,560,299 | 73.8 | 96.8 | 3.2 | 3,556,996 | 223,997 | 99.9 |
| Case_2_Normal | 70.6 | 1,893,234 | 70.6 | 99.9 | 0.1 | 1,891,949 | 60 | 99.9 |
| Case_2_Tumuor | 71.2 | 2,071,412 | 72.1 | 98.8 | 1.2 | 2,070,098 | 452 | 99.9 |
| Case_2_PDTX1 | 68.5 | 2,132,270 | 73.4 | 93.2 | 6.8 | 2,130,231 | 90,072 | 99.8 |
| Case_3_PDTX2 | 67.4 | 2,620,064 | 70.8 | 95.2 | 4.8 | 2,617,504 | 99,418 | 99.8 |
| Case_3_PDTX3 | 54.7 | 1,623,532 | 71.0 | 76.9 | 23.1 | 1,620,101 | 339,859 | 99.7 |
| Case_4_PDTX1 | 73.1 | 2,812,733 | 75.0 | 97.4 | 2.6 | 2,811,314 | 47,455 | 99.9 |
| Case_4_PDTX2 | 69.1 | 1,503,423 | 72.3 | 95.6 | 4.4 | 1,501,890 | 13,819 | 99.8 |
| Case_4_PDTX3 | 71.1 | 1,468,861 | 74.0 | 96.1 | 3.9 | 1,467,659 | 24,082 | 99.9 |
| Case_4_PDTX5 | 69.2 | 3,185,468 | 73.5 | 94.0 | 6.0 | 3,182,452 | 264,898 | 99.9 |
| Case_5_Tumuor | 72.8 | 3,004,752 | 72.8 | 99.9 | 0.1 | 3,003,290 | 87 | 99.9 |
| Case_5_PDTX3 | 65.8 | 2,066,808 | 70.6 | 93.0 | 7.0 | 2,064,469 | 179,031 | 99.8 |
acoverage > 5
Alignment of RNA-seq data from the human and mouse universal reference RNA
| Sample | Replicate | HGR | ICRG | ||
|---|---|---|---|---|---|
| Alignment efficiency (%) | Alignment efficiency (%) | Reads mapped to the human genome (%) | Reads mapped to the mouse genome (%) | ||
| Human universal reference RNA | a | 79.22 | 79.29 | 99.82 | 0.18 |
| Human universal reference RNA | b | 78.92 | 78.92 | 99.83 | 0.17 |
| Human universal reference RNA | c | 78.99 | 78.99 | 99.84 | 0.16 |
| Mouse universal reference RNA | a | 5.16 | 5.16 | 1.08 | 98.92 |
| Mouse universal reference RNA | b | 5.10 | 5.10 | 1.08 | 98.92 |
| Mouse universal reference RNA | c | 5.18 | 5.18 | 1.23 | 98.77 |
Fig. 3Use of the ICRG in RNA-seq data. a-d Bar plots of log10 transformed read counts for 23,059 human genes (having a read count higher than 5 in at least one sample). HRR sample (a-b); MRR sample (c-d). Alignment: HRG (a, c); ICRG (b, d). e Percentage of reads mapped to mouse in the ICRG across samples analysed. f Principle Component Analysis scatter plot using FPKM values of 4275 mouse genes with median FPKM > 1 (15 PDTX samples; 5 models). Different colours represent the distinct 5 PDTX models
Comparison of the number of reads assigned to the human and mouse genome using the ICRG, Disambiguate or Xenome
| ICRG | Disambiguate | Xenome | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Data type | % human | % mouse | Replicate | Reads mapped as human | Reads mapped as mouse | Reads mapped as human | Reads mapped as mouse | Ambiguous reads | Reads mapped as human | Reads mapped as mouse |
| WES | ||||||||||
| 100 | 0 | a |
| 14,895 |
| 11,301 | 110,603 |
| 10,471 | |
| 100 | 0 | b |
| 12,132 |
| 9281 | 97,294 |
| 8847 | |
| 100 | 0 | c |
| 7216 |
| 5414 | 67,498 |
| 5359 | |
| 0 | 100 | a | 41,454 |
| 73,117 |
| 364,417 | 41,821 |
| |
| 0 | 100 | b | 39,132 |
| 70,693 |
| 366,702 | 39,238 |
| |
| 0 | 100 | c | 49,731 |
| 90,060 |
| 455,206 | 49,964 |
| |
| RNA-seq | ||||||||||
| 100 | 0 | a |
| 212,696 |
| 242,026 | 181,712 |
| 86,370 | |
| 100 | 0 | b |
| 232,876 |
| 260,810 | 212,462 |
| 84,236 | |
| 100 | 0 | c |
| 206,112 |
| 220,970 | 199,938 |
| 64,950 | |
| 0 | 100 | a | 1,510,196 |
| 1,353,926 |
| 428,688 | 875,920 |
| |
| 0 | 100 | b | 1,240,462 |
| 1,127,872 |
| 342,720 | 737,946 |
| |
| 0 | 100 | c | 1,611,790 |
| 1,457,070 |
| 446,048 | 956,948 |
| |
values in italic indicate the number of reads mapped to the correct genome
Comparison of the CPU time required by a WES alignment pipeline including either the ICRG, Disambiguate or Xenome
| CPU Time (s) | |||||
|---|---|---|---|---|---|
| %human | %mouse | Replicate | ICRG | Disambiguate | Xenome |
| 90 | 10 | a | 20,154 | 28,743 | 20,905 |
| 90 | 10 | b | 20,614 | 28,034 | 22,279 |