| Literature DB >> 26892726 |
Kunal Bhutani1, Kristopher L Nazor2, Roy Williams2, Ha Tran2, Heng Dai3, Željko Džakula3, Edward H Cho3, Andy W C Pang3, Mahendra Rao4, Han Cao3, Nicholas J Schork1, Jeanne F Loring2.
Abstract
There is concern that the stresses of inducing pluripotency may lead to deleterious DNA mutations in induced pluripotent stem cell (iPSC) lines, which would compromise their use for cell therapies. Here we report comparative genomic analysis of nine isogenic iPSC lines generated using three reprogramming methods: integrating retroviral vectors, non-integrating Sendai virus and synthetic mRNAs. We used whole-genome sequencing and de novo genome mapping to identify single-nucleotide variants, insertions and deletions, and structural variants. Our results show a moderate number of variants in the iPSCs that were not evident in the parental fibroblasts, which may result from reprogramming. There were only small differences in the total numbers and types of variants among different reprogramming methods. Most importantly, a thorough genomic analysis showed that the variants were generally benign. We conclude that the process of reprogramming is unlikely to introduce variants that would make the cells inappropriate for therapy.Entities:
Mesh:
Year: 2016 PMID: 26892726 PMCID: PMC4762882 DOI: 10.1038/ncomms10536
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Experimental and computational design for identifying variants caused by reprogramming.
(a) Diagram describing the derivation of three biological replicates of each three reprogramming methods: retrovirus, Sendai virus and non-integrating mRNA. (b) Kernel density estimation for VAF and coverage for a constituent sample from each reprogramming method: M1 (mRNA), R1 (retrovirus) and S1 (Sendai virus). For R1 and S1, there are denser clusters near 40 × coverage and 40–60% VAF than the M1 sample, which indicates they had a higher mutational load during initial doublings. However, it should be noted that all these samples also contained several subclonal variants that are not considered in further analyses. The histograms are intended to aid the readers in interpreting the results of the kernel density estimations. (c) Flow diagram detailing the filtering strategy employed to arrive at high-confidence set of SNVs unique to each reprogrammed cell line using MuTect and HaplotypeCaller.
Figure 2Characterization of variants caused by reprogramming method.
(a) Overall counts for the number of high-confidence SNVs and indels per sample. (b) The relative percentage of mutational subtypes for the SNVs in each sample. (c) A violin plot and box plot for the indel size distributions in the sample, a positive length indicates an insertion, whereas a negative one is a deletion. (d) Variant classifications based on their relative locations in the genome. The error bars indicate the low, median and high replicate for each reprogramming method. Introns and IGR variants are plotted on a different scale.
P values from the permutation-based ANOVA test for variant-type differences across the three reprogramming methods.
| SNV raw counts | SNV relative rates | Insertions raw counts | Deletions raw counts | Indel raw counts | Combined | Samples for 80% power | |
|---|---|---|---|---|---|---|---|
| CADD Phred>15 | 0.12 | 0.36 | ND | ND | ND | 0.12 | 4 |
| Coding | 0.03 | 0.18 | 0.34 | 0.23 | 0.24 | 0.16 | 4 |
| Damaging | 0.09 | 0.30 | ND | ND | ND | 0.09 | 4 |
| Near cancer gene | 0.10 | 0.94 | 0.11 | 0.20 | 0.14 | 0.39 | 10 |
| Total | 0.01 | ND | 0.18 | 0.16 | 0.16 | 0.20 | 6 |
ANOVA, analysis of variance; CADD, Combined Annotation-Dependent Depletion; ND, not determined; indels, insertions and deletions; SNV, single-nucleotide variant.
Rates were determined by dividing the number of variants by the total number of variants. ND either because it is not consistent with the calculations or there were too few variants to analyse. Combined is the sum of SNVs and indels. The last column lists the sample size estimates necessary based on 80% power for an ANOVA statistic given the current mean and variance for the grouping by reprogramming methods. This is based on the combined counts.
High-confidence variants in coding regions.
| Number of high-confidence variants | |||
|---|---|---|---|
| Sample | Synonymous | Nonsense and nonsynonymous | Coding regions affected by high-confidence nonsense and nonsynonymous variants |
| M1 | 0 | 2 | Nonsynonymous: |
| M2 | 2 | 2 | Nonsynonymous: |
| M3 | 2 | 1 | Nonsynonymous: |
| R1 | 2 | 7 | Nonsense: |
| R2 | 0 | 7 | Nonsense: |
| R3 | 3 | 7 | Nonsynonymous: |
| S1 | 0 | 5 | Nonsynonymous: |
| S2 | 3 | 4 | Nonsynonymous: |
| S3 | 0 | 2 | Nonsense: |
The number of high-confidence synonymous and nonsynonymous coding mutations identified with high-confidence SNVs in each sample. Nonsynonymous variants in protein-coding regions are listed. M1-3: mRNA vector; R1-3: Retrovirus; S1-3: Sendai virus.
*The CATSPERG gene has two nonsynonymous high-confidence mutations in this sample.
Functional impacts as calculated by ANOVA contrasts using a one versus all approach.
| M1 | M2 | M3 | R1 | R2 | R3 | S1 | S2 | S3 | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Non-integrating mRNA versus all others—transcription factor EZH2-binding sites | ||||||||||
| 13 | 11 | 6 | 1 | 7 | 2 | 3 | 0 | 3 | 0.008 | |
| Retrovirus versus all others—damaging mutations assessed by Condel, Polyphen or SIFT | ||||||||||
| 5 | 3 | 1 | 5 | 9 | 13 | 5 | 3 | 2 | 0.014 | |
| Sendai virus versus all others—mutations in coding regions | ||||||||||
| 27 | 16 | 9 | 16 | 20 | 24 | 12 | 9 | 5 | 0.044 |
ANOVA, analysis of variance; mRNA, messenger RNA
ANOVA contrasts were set-up to compare one reprogramming methods against the other two. For each reprogramming method, the most significant difference is presented. The EZH2-binding site overlaps were determined by ENCODE annotations (Methods).
Figure 3A 228.8-kb deletion at Xp22.11 in sample M3 detected by BioNano genome mapping.
Each assembly is compared with the GRCh37 reference genome. Black vertical marks show the position of the fluorescently labelled seven-base motif. For the M3 sample, observed individual DNA molecules and their labels are represented, showing the support for two haplotypes, one with the deletion at Xp22.11.