| Literature DB >> 30464253 |
Bernt Popp1, Mandy Krumbiegel1, Janina Grosch2, Annika Sommer3, Steffen Uebe1, Zacharias Kohl2, Sonja Plötz2, Michaela Farrell3, Udo Trautmann1, Cornelia Kraus1, Arif B Ekici1, Reza Asadollahi4, Martin Regensburger3, Katharina Günther5, Anita Rauch4, Frank Edenhofer5, Jürgen Winkler2, Beate Winner3, André Reis6.
Abstract
Genetic integrity of induced pluripotent stem cells (iPSCs) is essential for their validity as disease models and for potential therapeutic use. We describe the comprehensive analysis in the ForIPS consortium: an iPSC collection from donors with neurological diseases and healthy controls. Characterization included pluripotency confirmation, fingerprinting, conventional and molecular karyotyping in all lines. In the majority, somatic copy number variants (CNVs) were identified. A subset with available matched donor DNA was selected for comparative exome sequencing. We identified single nucleotide variants (SNVs) at different allelic frequencies in each clone with high variability in mutational load. Low frequencies of variants in parental fibroblasts highlight the importance of germline samples. Somatic variant number was independent from reprogramming, cell type and passage. Comparison with disease genes and prediction scores suggest biological relevance for some variants. We show that high-throughput sequencing has value beyond SNV detection and the requirement to individually evaluate each clone.Entities:
Mesh:
Year: 2018 PMID: 30464253 PMCID: PMC6249203 DOI: 10.1038/s41598-018-35506-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic summary of the study and nomenclature of genetic aberrations. (A) Culturing and QC steps. Step 1: genetic fingerprinting and conventional karyotyping. Step 2: high-resolution CMA. Step 3: exome sequencing. (B) Graph showing the age distribution (x-axis) and phenotype of all donors. Fibroblast cultures are plotted as symbols (white = unrelated individuals; blue, red, green = related individuals) on the grey timeline (male = square; female = circle). The three-letter codes in these symbols represent each individual’s donor IDs (see also Fig. S1C). The passage of the derived RiPSC (above) and SiPSC (below) cultures are plotted as circles connected to the respective fibroblast (y-axis; scattered for visualization). Derived NPCs are connected to the RiPSC they originated from. Red bars below the fibroblast symbols mark individuals with PBLs available selected for exome sequencing. See also File S1 for additional information. (C) Standardized nomenclature for variants/aberrations depending on the cell they arose in. The scheme compares the evolutionary history of a cancer cell (box “selection”) which is subject to a strong selective pressure with that of a cultured cell (box “genetic drift”) which is mainly subject to random genetic drift.
Figure 2Summary of somatic CNVs identified in iPSC cultures by chromosomal microarray analysis (CMA). Box- and scatterplots for (A) the total number of somatic CNVs detected per analyzed cell culture sample (grey dots), (B) the genomic length (hg19) in kb of all detected somatic CNVs (C) and the number of affected genes (GenBank) within identified somatic CNVs (red dots = copy number loss, blue dots = copy number gain). SiPSC and RiPSC are separated by a grey dashed line. In the NPCs derived from the RiPSCs no new somatic CNVs were identified. No significant differences regarding number, size and gene content of somatic CNVs between RiPSC (n = 49) and SiPSC clones (n = 23) were detected (two sided Wilcoxon signed-rank test). Aneuploidies are not included and CNV outliers (one in SiPSC and one in RiPSC) sized over 5000 kb are excluded from panels B and C. (D) The average number of CNVs in all iPSCs grouped per individual and passage number plotted vs. the passage number. The dashed blue line represents the linear regression model fit (R2 = 0.021, p-value = 0.264). (E) The average number of CNVs in all iPSC grouped per individual plotted vs. the donor age in years at biopsy. The dashed blue line represents the linear regression model fit (R2 = 0.049, p-value = 0.309). Diamonds in D and E mark the respective average CNV count and are intersected by a standard error bar where applicable. (F) Circos plot showing the genomic (hg19) distribution of somatic CNVs in RiPSC (orange) and SiPSC (blue) clones. NS, not significant.
Figure 3Examples of CNVs detected by SNP-based CMA. (A) Copy number analysis identified a chromosome 17q terminal gain not detectable with conventional karyotyping in the SiPSC line “CT1-S1-010”. (B) FISH analysis showing the unbalanced translocation 14p/17q in this clone (left = metaphase, right = interphase). (C) Conventional karyotyping and copy number analysis of chromosome 9 of the SiPSC line “i82A-S1-004” revealed unremarkable results (Log2Ratio top), but SNP allele peak distribution (xAllelePeaks bottom) uncovered a copy neutral allelic imbalance on the long arm of chromosome 9 (4 bands) while the short arm (left) shows normal allelic distribution (3 bands) (see also Fig. S2). (D) Two independent overlapping intragenic deletions in the CTNNA3 gene detected in the RiPSC lines “i88H-R1-001” (green bottom) and “iO3H-R1-001” (blue bottom) and absent from their fibroblast cultures “f88H-X-001” (green top) and “iO3H-X-001” (blue top).
Figure 4Summary of somatic SNVs/indels identified in iPSC cultures by exome sequencing. (A) Box- and scatterplot comparing the total number of fixed somatic SNVs/indels in independently reprogrammed SiPSC (n = 6) and RiPSCs (n = 8) from four donors (“82A” = grey, “88H” = orange, “AY6” = blue, “PX7” = green). (B) Box- and scatterplot comparing the total number of fixed somatic variants in RiPSC and derived NPCs from donors “82A” (grey) and “AY6” (blue). No significant differences were detected neither for somatic SNV/indel numbers between RiPSC and SiPSC clones nor between RiPSC and their derived NPCs (two sided Wilcoxon signed-rank test). Certain cultures have a much higher variant load (“82A” = grey, “88H” = orange). NPCs have the same variant profile as their progenitor cells. (C) Number of variants in four RiPSC lines (“i82A-R1-002” = grey, “i82A-R1-001” = yellow, “iAY6-R1-003” = blue, “iAY6-R1-004” = red) from donors “82A” and “AY6” cultured to higher passages vs. passage number. Diamonds mark the respective average SNV/indel count grouped by cell culture passage number (low passage numbers between 7 and 15 are considered as one group) intersected by a standard error bar. Dashed blue line represents the linear regression model fit using the actual passage number of the cells in the low group and the average of passage 30 and 40 (R2 = 0.036, p-value = 0.718). Note again the high spread influenced by the two cultures from individual “82A”. (D) The number of variants in all iPSC lines (RiPSC = ocher and SiPSC = lilac) from the four donors (n = 4 for “82A” and “88H”, n = 3 for “AY6” and “PX7”) plotted vs. the donor age. Diamonds mark the respective average SNV/indel count grouped by donor intersected by a standard error bar. Dashed blue line represents the linear regression model fit (R2 = 0.976, p-value = 0.012). NS, not significant.
Figure 5Mutational characteristics of somatic variants identified in iPSC cultures by exome sequencing. Stacked bar chart for the 14 primary RiPSC and SiPSC cultures from 4 individuals with passage numbers between 7 and 15 showing the relative number of variants partitioned (A) using SnpEff software annotated by variant impact group (HIGH = green, MODERATE = blue, LOW = light green), (B) by variant type (SNV = light green, MNP = blue, indel = green) and (C) by mutational subtype (transitions in brownish, transversions in greyish turquoise) of the SNVs in each iPSC sample. For A and B absolute variant counts are in the bars. (D) Distribution of three different SNV classifier scores represented as violin plots with median and quartiles. Red line represents the respective cutoff values (CADD = 20, M-CAP = 0.025, REVEL = 0.5). (E) Dot-plot showing the distribution of allele fraction (AF) in the analyzed iPSC cell cultures (x-axis) and their corresponding fibroblast culture (y-axis) with each point representing a variant shaded by read coverage in the iPSC exome (bright = low, dark = high read coverage at the respective variant position). Dotted vertical lines mark the expected AF for a heterozygous fixed variant (0.5) and typical variabilities seen in short read sequencing (0.3 to 0.7). (F) Dot-plot showing the relation between read coverage in the analyzed iPSC cultures and AF in the corresponding fibroblast culture. Dots are grouped and colored by fibroblast AF (no evidence in fibroblast = grey, ≤5.0% = blue, ≤10.0% = orange, >10% = green). The blue line represents the linear regression model fit (formula y ~ log(x); R2 = 0.202, p-value < 2.2e–16). The black line represents the theoretical AF in the fibroblast culture which is detectable at the respective coverage with a probability of 0.426 (variants with no evidence in fibroblast = 546, variants with at least 1 read in fibroblast = 405; 405/(405 + 546) ≈ 0.426) under a simple binomial draw model where one read is considered as sufficient evidence in the fibroblast. The red dotted line marks read coverage of below 20 where a high sampling variance is expected.
Figure 6Exome sequencing enables multiple cellular analyses. (A) Box- and scatterplots of the relative mitochondrial genome ratio for all samples. Average read coverage for the mitochondrial genome (chrM) was normalized to the targeted regions of chromosome 1 (chr1). The level of significance is annotated by asterisks or as not significant (NS) (two sided Wilcoxon signed-rank test). Fibroblast (FI) and RiPSC/SiPSC cultures show a higher mitochondrial genome dosage than PBLs (BL = blood samples from individuals in this study; BL-CNT = blood samples from 53 in-house control samples) and compared to NPC cultures. (B) Telomere content of all 16 RiPSC samples from the 4 individuals estimated from off-target telomeric reads by two different algorithms, telomerecat (upper panel) and telomerehunter (lower panel) plotted vs. the passage number. While both plots show a negative correlation of telomere content with higher passage number (telomerecat: Pearson’s r = −0.483, R2 = 0.233, p-value = 0.058; telomerehunter: Pearson’s r = −0.251, R2 = 0.062, p-value = 0.349) the results are not significant (see also Fig. S5). (C) Comparison of the read coverage profile at the KLF4 gene locus of different materials from individual “82A” (blood = brown, fibroblast = tan, SiPSC = green, RiPSC = blue). The sudden breaks at the exon-intron boundary indicate multiple integrations of a plasmid with a KLF4 transcription factor insert which has no introns (see also Fig. S6). (D) Example of a somatic deletion in the DLG2 gene called from the exome data of the NPC sample (“p82A-R1-002” = dark blue) and absent in the corresponding fibroblast culture (“f82A-X-001” = green). Dots represent target or anti-target coverage bins (y-axis = log2 ratio) and the orange line marks the copy number call by the CNVkit algorithm[32] for each segment. Note that the deletion was only called in the NPC and not in the RiPSC (“i82A-R1-002” = light blue) although the deletion had been previously confirmed in both samples by CMA (see also Fig. S6). NS, not significant; “***”, 0.001; “**”, 0.01, “*”, 0.05.
Fixed variants with predicted loss-of-function effect in known cancer associated genes according to the COSMIC database (CGC), known disease genes (OMIM) or genes highly expressed in the brain according to the Human Protein Atlas (HPA).
| Sample | Gene | HGVS | List | OMIM-G | OMIM-P | Phenotype | Inh. | pLI |
|---|---|---|---|---|---|---|---|---|
| i82A-S1-022 |
| c.1372+1G>T, p.? | OMIM, HPA | *300206 | #300143 | Mental retardation, XLR 21/34 | XLR | 1.00 |
| p82A-R1-001 |
| c.485G>A, p.(Trp162*) | OMIM | *615302 | #615286 | Mental retardation, AR 36 | AR | 0.00 |
| i82A-R1-001 |
| c.3949C>T, p.(Gln1317*) | OMIM | *603937 | #180100 | Retinitis pigmentosa 1 | AR, AD | 0.00 |
| i82A-S1-022 |
| c.1381dupA, p.(Thr461Asnfs*5) | OMIM | *604629 | #612529 | Amelogenesis imperfecta, type IIA2 | AR | 0.00 |
| i82A-R1-001 |
| c.747_748delinsTT, p.(Arg250*) | HPA | na | na | na | na | 0.02 |
| i88H-R1-002 |
| c.981-2A>G, p.? | CGC, OMIM | *164757 | #115150, #613707, #613706 | Cardiofaciocutaneous syndrome; LEOPARD syndrome 3; Noonan syndrome 7 | AD, AD, AD | 1.00 |
| i88H-R1-002 |
| c.3G>T, p.? | OMIM | *607556 | #200110, #209885, #227260 | Ablepharon-macrostomia syndrome; Barber-Say syndrome; Focal facial dermal dysplasia 3, Setleis type | AD, AD, AR | 0.44 |
| i88H-R1-002 |
| c.285_288del, p.(Ser96Argfs*44) | OMIM | *614336 | #613320 | Spondylometaphyseal dysplasia, Megarbane-Dagher-Melike type | AR | 0.14 |
| i88H-R1-001 |
| c.352-1G>A, p.? | OMIM | *605525 | #613804 | Meier-Gorlin syndrome 4 | AR | 0.00 |
| i88H-R1-002 |
| c.869_870del, p.(Val290Glyfs*13) | OMIM | *150240 | #615191 | Lissencephaly 5 | AR | 0.00 |
| i88H-R1-002 |
| c.1497del, p.(Trp499Cysfs*3) | OMIM | *300415 | #310400 | Myotubular myopathy, XLR | XLR | 1.00 |
| i88H-R1-001 |
| c.3060_3072+6del, p.? | OMIM | *603122 | #616433 | Immunodeficiency 40 | AR | 1.00 |
| i88H-R1-002 |
| c.894_897del, p.(Phe299Serfs*16) | HPA | *604087 | na | na | na | 0.75 |
| i88H-R1-002 |
| c.541G>T, p.(Glu181*) | HPA | *311030 | na | na | na | 0.94 |
| iAY6-R1-003 |
| c.723+1G>A, p.? | OMIM, HPA | *138244 | #611092 | Mental retardation, AR, 6 | AR | 0.99 |
| iAY6-R1-003 |
| c.4051C>T, p.(Gln1351*) | OMIM | *608442 | #612999 | Emery-Dreifuss muscular dystrophy 5 | AD | 0.00 |
| iAY6-R1-003 |
| c.1726_1730+2delinsC, p.? | OMIM | *615944 | #615948 | Orofaciodigital syndrome XIV | AR | 0.00 |
| iAY6-R1-003 |
| c.5759_5763delinsG, p.(Thr1920Argfs*42) | OMIM | *611192 | #148050 | KBG syndrome | AD | 1.00 |
| iPX7-R1-001 |
| c.214C>T, p.(Arg72*) | CGC, OMIM | *607210 | #616452, #615206, #617638 | B-cell expansion with NFKB and T-cell anergy; Immunodeficiency 11 A; Immunodeficiency 11B | AD, AR, AD | 1.00 |
| iPX7-R1-001 |
| c.32C>A, p.(Ser11*) | OMIM | *607905 | #616228 | Myasthenic syndrome, congenital, 14, with tubular aggregates | AR | 0.02 |
| iPX7-S1-004 |
| c.2049del, p.(Gln684Lysfs*7) | OMIM | *603335 | #608644 | Ciliary dyskinesia, primary, 3, with or without situs inversus | AR | 0.00 |
| iPX7-S1-004 |
| c.540dup, p.(Cys181Leufs*7) | OMIM | *606409 | #613385 | Autoimmune disease, multisystem, with facial dysmorphism | AR | 1.00 |
| iPX7-S1-004 |
| c.2492_2495del, p.(Leu831Glnfs*5) | OMIM | *118661 | #143200 | Wagner syndrome 1 | AD | 1.00 |
| iPX7-S1-004 |
| c.1923T>G, p.(Tyr641*) | HPA | *605488 | na | na | na | 0.00 |
| iPX7-S1-004 |
| c.1081-1_1081delinsAA, p.? | HPA | *615137 | na | na | na | 0.10 |
Inh., inheritance mode (“AD”: autosomal dominant, “AR”: autosomal recessive, “XLR”: X-linked recessive); HGVS, Human Genome Variation Society nomenclature (“c.”: coding DNA change, “p.”: protein change; “p.?”: consequence of the variant at protein level cannot be predicted without further functional assays); OMIM-G, OMIM (https://omim.org/) gene number; OMIM-P, OMIM phenotype number; CGC, COSMIC cancer gene census[56] gene list; HPA, human protein atlas[57] brain elevated gene set (File S6); pLI, probability of loss-of-function intolerance[58]; “na”: not available.