Kaivalya Molugu1,2, Giovanni A Battistini2, Tiffany M Heaster3,4, Jacob Rouw2,3, Emmanuel C Guzman4, Melissa C Skala3,4, Krishanu Saha2,3. 1. Biophysics Graduate Program, University of Wisconsin-Madison, Madison, Wisconsin, USA; Madison, Wisconsin, USA. 2. Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, USA; Madison, Wisconsin, USA. 3. Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, Wisconsin, USA; and Madison, Wisconsin, USA. 4. Morgridge Institute for Research, Madison, Wisconsin, USA.
The derivation of donor-specific induced pluripotent stem cells (iPSCs) from somatic cells through reprogramming generates a unique self-renewing cell source for disease modeling, drug discovery, toxicology, and personalized cell therapies.[1-3] These cells carry the donor's genome, facilitating elucidation of the genetic causes of disease, and are immunologically matched to the donor, facilitating the engraftment of cells derived from iPSCs.[4-6]With several clinical trials underway,[7] there has been significant progress in developing iPSC-based cell therapies in recent years. However, several challenges remain in the field.[8] First, the derivation of high-quality iPSCs must be efficient, rapid, and cost-effective to ensure that patients receive their treatments in a timely manner with autologous iPSC-derived products. Second, reprogramming to make iPSCs for both allogeneic and autologous cell therapy could benefit from standardized manufacturing processes to overcome the inconsistencies arising from variability in human material sources, reagents, delivery of reprogramming factors, microenvironmental fluctuations, or inherent stochasticity in epigenetic processes underpinning reprogramming.[9]Typical assays currently used for quality control of good manufacturing practices (GMPs)-grade iPSCs include testing for cell line identity (short tandem repeat analysis, single nucleotide polymorphism analysis, genomic sequencing), genomic instability (G-banding, chromosomal microarray, NanoString technology), pluripotency (marker expression analysis through flow cytometry or immunocytochemistry, embryoid body analysis, teratoma assays, Pluritest™, TaqMan Scorecard™ Assay), and residual expression of reprogramming factors (polymerase chain reaction or immunocytochemistry).[10-13] Each of these methods can be low throughput, labor intensive, time consuming, and require destructive processing.Nondestructive strategies such as automated machine learning can be used to identify various cell types and structures in cell cultures from bright-field images.[14-18] However, such automated methods to identify iPSCs, in particular, have had limited success in the field.[19-21] Deep learning has recently been employed to analyze monoclonal cell cultures of established iPSC lines,[22] but reprogramming cultures involve a higher number of cell fate transitions that have yet to be analyzed through deep learning pipelines. Hence, in complex cultures, like those in reprogramming, new standardized platforms with robust analytical methods for identifying high-quality iPSCs are still needed.Our strategy to identify iPSCs exploits metabolic and nuclear changes during reprogramming. Somatic cells primarily utilize mitochondrial oxidative phosphorylation (OXPHOS) to support cell proliferation,[23] while pluripotent stem cells favor glycolysis in a manner reminiscent of the Warburg effect in cancer cells.[23,24] During reprogramming, somatic cells thus undergo a metabolic shift from OXPHOS to glycolysis,[25,26] accompanied by a transient OXPHOS burst, resulting in the initiation and progression of reprogramming.[27-29]Recent evidence also indicates that this metabolic shift occurs before changes in gene expression and that the modulation of glycolytic metabolism or OXPHOS alters reprogramming efficiency.[24,30,31] High-resolution imaging of reprogramming cells has also identified that nuclear geometry is dramatically altered during reprogramming.[32-34] Therefore, simultaneous monitoring metabolic and nuclear changes during reprogramming could identify various cell states within reprogramming cultures.Optical metabolic imaging (OMI) is a noninvasive and label-free two-photon microscopy technique that provides dynamic measurements of cellular metabolism at a single-cell level. OMI is based on the endogenous fluorescence of metabolic coenzymes, reduced form of nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD),[35] that are both used across several cellular metabolic processes. NADH and reduced form of nicotinamide adenine dinucleotide phosphate (NADPH) have overlapping fluorescence properties and are collectively referred to as reduced form of nicotinamide adenine dinucleotide (phosphate) (NAD(P)H).[36] The optical redox ratio, defined as the ratio of NAD(P)H intensity to the sum of NAD(P)H and FAD intensity, provides a measure of the relative oxidation–reduction state of the cell [INAD(P)H/(INAD(P)H + IFAD)].[37,38]Fluorescence lifetime imaging microscopy of NAD(P)H and FAD provides additional information specific to protein binding activity. The two-component decays of NAD(P)H and FAD measure the short (τ1) and long (τ2) fluorescence lifetimes that correspond to the free or bound states of these coenzymes,[39-41] along with fractional contributions of short (α1) and long (α2) lifetimes. Since NAD(P)H and FAD are found predominantly in the cytoplasm, the lack of fluorescence signal in images can also be used to identify cell borders and nuclei.[42] Thus, OMI provides multiple readouts for cell metabolism and nuclear morphometry to track metabolic and nuclear changes of cells undergoing reprogramming.In this study, we address some of the challenges associated with the biomanufacturing of iPSCs by developing a microcontact printed (μCP) platform[34,43,44] to noninvasively monitor metabolic and nuclear changes over 22 days of reprogramming of human erythroid progenitor cells (EPCs) to iPSCs. We demonstrate that OMI is sensitive to the metabolic and nuclear differences during reprogramming, provide accurate identification of reprogramming status of cells using machine learning models, and subsequently build reprogramming trajectories at the single-cell level.[45] Our label-free, nondestructive, rapid, and scalable method to track reprogramming provides novel insights into human cell reprogramming and could enable the development of new technologies for biomanufacturing high-quality iPSCs.
Results
Reprogramming on patterned substrates
We first designed a μCP substrate to spatially control the adhesion of EPCs undergoing reprogramming.[34,43,46] The μCP substrate is formed by coating circular regions of 300 μm radius, referred to as μFeatures, with Matrigel on a 35-mm ibiTreat dish that allows for cell adhesion. The remaining regions of the dish are then backfilled with polycationic graft copolymer, poly(l-lysine)-graft-poly(ethylene glycol) (PLL-g-PEG), that resists protein adsorption and prevents cell adhesion in these regions (Supplementary Fig. S1A).[47,48] The ibiTreat dishes are made of gas-permeable material, enabling maintenance of carbon dioxide or oxygen exchange during cell culture and have high optical quality.These properties make the dishes suitable for two-photon microscopy during reprogramming. To verify proper coating of the circular μFeature regions, we immunostained for laminin, a major component of Matrigel.[49] Fluorescence imaging showed laminin consistently within the circular μFeatures indicating uniform patterning of Matrigel (Fig. 1A; left). We next assessed the ability of the μCP substrates to enable cell attachment by seeding two different cell types: human dermal fibroblasts (HDFs) and human embryonic stem cells (hESCs). We observed that both HDFs and hESCs remained viable, attached, and confined to the circular μFeatures indicating that the μCP substrates enable spatial control of cell adhesion (Supplementary Fig. S1B).
FIG. 1.
NAD(P)H and FAD autofluorescence imaging reveal metabolic differences during reprogramming.
(A) Left: Matrigel-coated μFeatures on the ibiTreat dish visualized with an anti-laminin antibody (red) show good fidelity in the transfer from the Matrigel-coated polydimethylsiloxane mold. Scale bar, 100 μm. Right: Representative images of the progression of EPCs on a single circular μFeature (300 μm radius) through a reprogramming time course.
(B) Left: Image analysis pipeline to identify metabolic and nuclear parameters using ilastik and CellProfiler software. Right: Schematic representation of cell metabolism with NADH and FAD highlighted as the fluorescent molecules in the diagram, and molecules in bold indicate the net direction of the reaction.
(C) Representative optical redox ratio, NAD(P)H τm, and FAD τm images (3 μFeatures selected from 36 μFeatures acquired from three different donors) for EPC, IM, and iPSC. Color bars indicated on the right are a representation of the values optical redox ratio, NAD(P)H τm, and FAD τm. Scale bar, 100 μm.
Single-cell quantitative analysis of (D) metabolic parameters: optical redox ratio, NAD(P)H τm, FAD τm; (E) nuclear parameters: area, perimeter, and mean radius (n = 561, 990, and 586 for EPC, IM, and iPSC, respectively). Data are presented as median with interquartile range for each cell type. Statistical significance was determined by one-way ANOVA using the Kruskal–Wallis test for multiple comparisons; ns for p ≥ 0.05, * for p < 0.05, ** for p < 0.01, *** for p < 0.001, **** for p < 0.0001. ANOVA, analysis of variance; EPC, erythroid progenitor cells; FAD, flavin adenine dinucleotide; IM, intermediate; iPSC, induced pluripotent stem cell; NADH, reduced form of nicotinamide adenine dinucleotide; NAD(P)H, reduced form of nicotinamide adenine dinucleotide (phosphate); ns, nonsignificant.
NAD(P)H and FAD autofluorescence imaging reveal metabolic differences during reprogramming.(A) Left: Matrigel-coated μFeatures on the ibiTreat dish visualized with an anti-laminin antibody (red) show good fidelity in the transfer from the Matrigel-coated polydimethylsiloxane mold. Scale bar, 100 μm. Right: Representative images of the progression of EPCs on a single circular μFeature (300 μm radius) through a reprogramming time course.(B) Left: Image analysis pipeline to identify metabolic and nuclear parameters using ilastik and CellProfiler software. Right: Schematic representation of cell metabolism with NADH and FAD highlighted as the fluorescent molecules in the diagram, and molecules in bold indicate the net direction of the reaction.(C) Representative optical redox ratio, NAD(P)H τm, and FAD τm images (3 μFeatures selected from 36 μFeatures acquired from three different donors) for EPC, IM, and iPSC. Color bars indicated on the right are a representation of the values optical redox ratio, NAD(P)H τm, and FAD τm. Scale bar, 100 μm.Single-cell quantitative analysis of (D) metabolic parameters: optical redox ratio, NAD(P)H τm, FAD τm; (E) nuclear parameters: area, perimeter, and mean radius (n = 561, 990, and 586 for EPC, IM, and iPSC, respectively). Data are presented as median with interquartile range for each cell type. Statistical significance was determined by one-way ANOVA using the Kruskal–Wallis test for multiple comparisons; ns for p ≥ 0.05, * for p < 0.05, ** for p < 0.01, *** for p < 0.001, **** for p < 0.0001. ANOVA, analysis of variance; EPC, erythroid progenitor cells; FAD, flavin adenine dinucleotide; IM, intermediate; iPSC, induced pluripotent stem cell; NADH, reduced form of nicotinamide adenine dinucleotide; NAD(P)H, reduced form of nicotinamide adenine dinucleotide (phosphate); ns, nonsignificant.Next, peripheral blood mononuclear cells (PBMCs) were isolated from healthy human donors and further enriched for EPCs before the delivery of reprogramming factors. We examined the enrichment of EPCs by flow cytometry with erythroid cell surface marker CD71.[50] Flow cytometry confirmed the presence of enriched EPCs showing that >98% of the cells expressed CD71 on day 10 of PBMC culture (Supplementary Fig. S1C).To initiate reprogramming, we electroporated EPCs with four episomal reprogramming plasmids[51,52]—encoding OCT4, shRNA knockdown of p53, SOX2, KLF4, L-MYC, LIN28, and miR302–367 cluster—and seeded them onto μCP substrates. We assessed the ability of the μCP substrates to sustain long-term reprogramming studies by performing high-content imaging to track individual μFeatures (>30 μFeatures per 35-mm dish) longitudinally at multiple time points over the ∼3 weeks of reprogramming time course. Day 22 was chosen as the reprogramming endpoint because there were several μFeatures with at least one iPSC colony at this time point without significant outgrowth beyond the boundaries of the μFeatures.Although the starting EPCs (day 0) are nonadherent, EPCs undergoing reprogramming start adhering to the μCP substrates on day 8. Cells in the middle of reprogramming (day 8–day 17) that lack EPC markers or iPSC markers are broadly termed as intermediates (IMs), consistent with prior nomenclature in the field.[53] Moreover, endpoint iPSCs on day 22 remain adhered to the μCP substrates within each circular μFeature (Fig. 1A; right), indicating that μCP substrates can support the full reprogramming of EPCs. Overall, the μCP platform provides unique spatial control over cells and enables high-content quantitative imaging of reprogramming.
OMI reveals metabolic states during reprogramming
Cellular metabolism plays an important role in regulating reprogramming and pluripotency of iPSCs,[54-58] and can be noninvasively monitored through OMI. NAD(P)H is an electron donor and FAD is an electron acceptor. Both are present in all cells as coenzymes and provide energy for metabolic reactions. For example, glycolysis in the cytoplasm generates NADH and pyruvate, whereas OXPHOS consumes NADH and produces FAD (Fig. 1B; right). Autofluorescence imaging of NAD(P)H and FAD can thus detect the oxidation–reduction state of a cell and is influenced by many biochemical reactions.[35,59]We tracked the autofluorescence dynamics of NAD(P)H and FAD by performing OMI on cells attached to μCP substrates at different time points during EPC reprogramming. In these images, the nucleus remains dark as NAD(P)H is primarily located in cytosol and mitochondria, and FAD is primarily located in mitochondria. The NAD(P)H images were used as inputs for an image analysis software, ilastik,[60] to identify the nuclei. The identified nuclei were then used as an input for a high-content image analysis pipeline in CellProfiler software[61] to segment the cytoplasm, and measure various metabolic and nuclear parameters (Fig. 1B; left).Altogether, 11 metabolic parameters (NAD(P)H intensity, INAD(P)H; NAD(P)H α1; NAD(P)H τ1; NAD(P)H τ2; NAD(P)H mean lifetime, τm = α1τ1 + α2τ2; FAD intensity, IFAD; FAD α1; FAD τ1; FAD τ2; FAD τm; optical redox ratio, INAD(P)H/[INAD(P)H + IFAD]) and 8 nuclear parameters[34] (area; perimeter; mean radius [MeanRad]; nuclear shape index [NSI]; solidity; extent; number of neighbors [#Neigh]; distance to closest neighbor [1stNeigh]) were measured by the analysis pipeline. Supplementary Figure S1D provides further details of each parameter.By fixing the cultures at these time points, we verified the cell type by immunofluorescent staining: EPCs (CD71+, Nanog−), IMs (CD71−, Nanog−), and iPSCs (CD71−, Nanog+) (Supplementary Fig. S1E). NAD(P)H and FAD autofluorescence imaging revealed metabolic differences between starting EPCs, IMs, and iPSCs (Supplementary Figs. S2 and S3).We observed a significant increase in the optical redox ratio (iPSC>IM>EPC) during reprogramming (Fig. 1C), indicating that individual EPCs are more oxidized than individual IMs and iPSCs (Fig. 1D). In addition, we noted that patterned IMs and iPSCs have significantly higher optical redox ratios than their nonpatterned counterparts (Supplementary Fig. S2K). This observation is consistent with previous studies that show that mechanical cues can regulate their relative use of glycolysis.[62-65]Next, we observed that NAD(P)H and FAD lifetime components undergo biphasic changes during the progress of reprogramming. FAD lifetime components undergo a more significant and pronounced change relative to the NAD(P)H components (Fig. 1D and Supplementary Fig. S2A–J). On average, the fraction of protein-bound FAD (FAD α1) first decreases from its levels in EPCs to those in IMs and then increases during the IM to iPSC transition, which could be reflective of the OXPHOS burst (Supplementary Fig. S2H).[27-29] FAD τm (Fig. 1D) is inversely related to FAD α1 and, therefore, undergoes a biphasic change that is opposite to that of FAD α1 (Supplementary Fig. S2H). Similar biphasic changes occur in nuclear parameters during reprogramming, which is consistent with our previous study (Fig. 1E and Supplementary Fig. S3).[34]We compared these measurements on cells undergoing reprogramming to established cell lines and primary cell populations. Both pluripotent stem cell lines—established hESCs and iPSC lines—have similar values for most metabolic and nuclear parameters, as expected. The significant differences observed between hESCs and iPSCs in some metabolic and nuclear parameters (Supplementary Figs. S2 and S3) may be because iPSCs have not undergone any passages in contrast to the hESC line.[24]Fibroblasts from human donors (HDFs) had metabolic parameters significantly different from those of EPCs (Supplementary Fig. S2). This difference could be because (1) fibroblasts are adherent whereas starting EPCs are nonadherent and (2) fibroblasts and EPCs have different proliferation rates and energy needs. Taken together, autofluorescence imaging of NAD(P)H and FAD revealed significant dynamic changes for various cell populations during reprogramming.
OMI enables the identification of iPSCs with high accuracy
To visualize cell states within the entire metabolic and nuclear morphometry data set, Uniform Manifold Approximation and Projection (UMAP)[66] was employed on the multidimensional measurements already described. Neighbors were defined through the Jaccard similarity coefficient computed across the metabolic and nuclear parameters. UMAP was chosen over t-distributed stochastic neighbor embedding (t-SNE) since UMAP (Fig. 2A) yielded more distinct clusters for two different known cell types—EPCs and iPSCs—than t-SNE (Supplementary Fig. S4A). Furthermore, the UMAP algorithm took less time than the t-SNE algorithm to implement with our data set. In addition, UMAP can include nonmetric distance functions while preserving the global structure of the data.
FIG. 2.
Optical metabolic imaging enables the classification of cells based on their reprogramming status.
UMAP dimensionality reduction was performed on (A) all 11 metabolic and 8 nuclear parameters, (B) only 11 metabolic parameters, and (C) only 8 nuclear parameters for each cell, projected onto two-dimensional space and enables separation of different cell types (EPC, IM, and iPSC). Each color corresponds to a different cell type. Data are from three different donors. Each dot represents a single cell, and n = 561, 990, and 586 cells for EPC, IM, and iPSC, respectively.
(D) Model performance of the different classifiers (random forest, simple logistic, k-nearest neighbor [IBk], naive Bayes) for iPSCs was evaluated by ROC curves using all 11 metabolic and 8 nuclear parameters. The AUC is provided for each classifier as indicated in the legend.
(E) Parameter weights for random forest classification of EPCs, IMs, and iPSCs using the gain ratio method. Analysis was performed at a single-cell level using three different donors.
(F) Classification accuracy with respect to number of parameters was evaluated based on the gain ratio parameter selection with the random forest model (parameters added from highest to lowest gain ratio in (E). The number of parameters included in the random forest model is indicated on the x-axis.
(G) Model performance of the random forest classifier for iPSCs was evaluated by ROC curves using different metabolic and nuclear parameter combinations as labeled. AUC is provided for each parameter combination as indicated in the legend.
(H) Imaging time (left y-axis) and accuracy score (right y-axis) evaluation of the random forest classifier for different metabolic and nuclear parameter combinations as labeled. AUC, area under the curve; OMI, optical metabolic imaging; ROC, receiver operating characteristic; UMAP, Uniform Manifold Approximation and Projection.
Optical metabolic imaging enables the classification of cells based on their reprogramming status.UMAP dimensionality reduction was performed on (A) all 11 metabolic and 8 nuclear parameters, (B) only 11 metabolic parameters, and (C) only 8 nuclear parameters for each cell, projected onto two-dimensional space and enables separation of different cell types (EPC, IM, and iPSC). Each color corresponds to a different cell type. Data are from three different donors. Each dot represents a single cell, and n = 561, 990, and 586 cells for EPC, IM, and iPSC, respectively.(D) Model performance of the different classifiers (random forest, simple logistic, k-nearest neighbor [IBk], naive Bayes) for iPSCs was evaluated by ROC curves using all 11 metabolic and 8 nuclear parameters. The AUC is provided for each classifier as indicated in the legend.(E) Parameter weights for random forest classification of EPCs, IMs, and iPSCs using the gain ratio method. Analysis was performed at a single-cell level using three different donors.(F) Classification accuracy with respect to number of parameters was evaluated based on the gain ratio parameter selection with the random forest model (parameters added from highest to lowest gain ratio in (E). The number of parameters included in the random forest model is indicated on the x-axis.(G) Model performance of the random forest classifier for iPSCs was evaluated by ROC curves using different metabolic and nuclear parameter combinations as labeled. AUC is provided for each parameter combination as indicated in the legend.(H) Imaging time (left y-axis) and accuracy score (right y-axis) evaluation of the random forest classifier for different metabolic and nuclear parameter combinations as labeled. AUC, area under the curve; OMI, optical metabolic imaging; ROC, receiver operating characteristic; UMAP, Uniform Manifold Approximation and Projection.UMAP was next used to visualize subsets of the entire data set to investigate which measurements were associated with different cell states. Distinct cell populations could be derived from data sets built exclusively from the 11 metabolic parameters (Fig. 2B) and data sets built exclusively from the 8 nuclear parameters (Fig. 2C). Although these UMAP representations revealed some distinct clusters of EPCs, IMs, and iPSCs, the UMAP generated using both metabolic and nuclear parameters displayed less overlap of cell clusters among EPCs, IMs, and iPSCs (Fig. 2A).We also plotted a heatmap representation of the z-score of metabolic and nuclear parameters at the donor level (each row is the mean of a single donor and cell type) to examine heterogeneity arising from individual donors (Supplementary Fig. S4B). Despite some donor-to-donor heterogeneity, EPCs and iPSCs could be distinguished visually (Fig. 2A) based on a combination of 11 metabolic and 8 nuclear parameters.Next, classification models were developed based on 11 metabolic and 8 nuclear parameters to predict the reprogramming status of cells, that is, as either EPCs, IMs, or iPSCs. Supervised machine learning classification (naive Bayes, K-nearest neighbor) and regression algorithms (logistic regression and random forest)[67] were implemented to test the prediction accuracy for iPSCs properly when all the metabolic and nuclear parameters are used. To protect against overfitting, various classification methods were trained using 15-fold cross-validation on single-cell data from three different donors with reprogramming status assigned based on morphological characteristics.Furthermore, we tested the various classification methods on data collected from CD71 and Nanog immunofluorescence staining with the same cells from three donors (completely independent and nonoverlapping observations). Receiver operating characteristic (ROC; one-vs-rest) curves of the test data revealed highest classification accuracy for predicting iPSCs (area under the curve, AUC = 0.993), IMs (AUC = 0.993), and EPCs (AUC = 0.999) when a random forest model was used (Fig. 2D and Supplementary Fig. S4C, D). We thus used the random forest classification model for further analysis in this study.Gain ratio analysis on the decision tree within this random forest model revealed that FAD lifetime components, FAD α1, FAD τ1, and FAD τm, are the most important parameters for classifying the reprogramming status of cells (Fig. 2E). This result is consistent with the observation that FAD lifetime components are significantly different among EPCs, IMs, and iPSCs (Fig. 1 and Supplementary Fig. S2). We then plotted the accuracy score as a function of the number of parameters, wherein the parameters with highest gain ratio values (Fig. 2E) were chosen (one parameter means FADα1; two parameters means FADα1, FADτ1, and so on).This plot revealed that the accuracy score increases with the number of parameters until eight parameters (FADα1, FADτ1, FADτm, perimeter, redox ratio, area, FADτ2, NAD(P)Hτ1) and plateaus thereafter (Fig. 2F). Notably, high classification accuracy can be achieved for predicting iPSCs (AUC = 0.944), IMs (AUC = 0.968), and EPCs (AUC = 0.987) when using only FAD lifetime variables (FAD τm, τ1, τ2, α1; collected in the FAD channel alone) (Fig. 2G and Supplementary Fig. S4E, F).Imaging using only FAD lifetime parameters requires a minimal time of 2.5 min per μFeature (Fig. 2H) without relying on intensity parameters that are associated with higher variability. Such variability can arise from confounding factors associated with intensity levels, such as laser power and detector gain. Hence, FAD lifetime parameters alone are sufficient to classify accurately the reprogramming status of cells.
Pseudotemporal ordering of single cells resolves cellular transitions
By sampling a process over a time course, profiles at a single-cell level can be used to order cells along a “pseudotemporal” continuum. Such ordering can resolve cellular transitions during complex processes such as organismal development.[45,68] Here we used 11 metabolic and 8 nuclear parameters to construct pseudotime trajectories of cellular reprogramming at a single-cell level using the Monocle3 algorithm.[45,69] Monocle3 is a trajectory inference method that learns combinatorial changes that each cell must go through as a part of a process and subsequently places each cell at its inferred location in the trajectory.The inferred pseudotime trajectories built on our entire data set consisted of EPCs, IMs, and iPSCs distributed across 10 clusters, 4 branching events, and 1 disconnected branch (Fig. 3A–C). The primary trajectory—colored by pseudotime and actual reprogramming time points—exhibited transitions from EPCs to IMs to iPSCs as expected (Fig. 3B and Supplementary Fig. S5A).
FIG. 3.
Inference of reprogramming trajectories at the single-cell level reveals heterogeneity during reprogramming.
Trajectory analysis of reprogramming EPCs constructed from the metabolic and nuclear parameters based on UMAP dimension reduction using Monocle3 revealed four branch points, colored by (A) cell type and (B) pseudotime.
(C) Monocle UMAP plots showing clustering of reprogramming EPCs. Samples were grouped into 10 clusters. Cells colored by cluster.
(D) Heatmap representing the metabolic and nuclear parameters of 10 clusters. Each column is a separate cell group based on the generated clusters, and each row represents a single metabolic or nuclear parameter. Z-score = (μobserved − μrow)/σrow, where μobserved is the mean value of each parameter for each cell; μrow is the mean value of each parameter for all cells together, and σrow is the standard deviation of each parameter across all cells. Dot plots indicating the expression of (E) FAD τ1, (F) FAD α1, (G) FAD τm,
(H) NAD(P)H τ2, (I) NAD(P)H α1, and (J) NAD(P)H τ2 along the pseudotime. Smooth lines are composed of multiple dots representing the mean expression level at each pseudotime, regardless of the cell type. Four branch points are labeled on the smooth lines.
Inference of reprogramming trajectories at the single-cell level reveals heterogeneity during reprogramming.Trajectory analysis of reprogramming EPCs constructed from the metabolic and nuclear parameters based on UMAP dimension reduction using Monocle3 revealed four branch points, colored by (A) cell type and (B) pseudotime.(C) Monocle UMAP plots showing clustering of reprogramming EPCs. Samples were grouped into 10 clusters. Cells colored by cluster.(D) Heatmap representing the metabolic and nuclear parameters of 10 clusters. Each column is a separate cell group based on the generated clusters, and each row represents a single metabolic or nuclear parameter. Z-score = (μobserved − μrow)/σrow, where μobserved is the mean value of each parameter for each cell; μrow is the mean value of each parameter for all cells together, and σrow is the standard deviation of each parameter across all cells. Dot plots indicating the expression of (E) FAD τ1, (F) FAD α1, (G) FAD τm,(H) NAD(P)H τ2, (I) NAD(P)H α1, and (J) NAD(P)H τ2 along the pseudotime. Smooth lines are composed of multiple dots representing the mean expression level at each pseudotime, regardless of the cell type. Four branch points are labeled on the smooth lines.Trajectory inference indicated that the starting EPCs were heterogeneous and occupied three clusters (Fig. 3C; clusters: 1, 2, and 3). Although cluster 2 consists of starting EPCs that undergo reprogramming, clusters 1 and 3 constituted the disconnected branch with EPCs that failed to progress through reprogramming. iPSCs predominantly occupied two clusters (clusters 7 and 10) irrespective of the reprogramming time point, whereas IMs belonged to several clusters (clusters 4, 5, 6, 8, and 9) with clusters 6 and 8 concentrated at the unsuccessful reprogramming branches (Fig. 3C).Overall, these various trajectories provide a detailed map of several cases of reprogramming heterogeneity within human cells. For example, cells that advance right at branch points 1, 2, and 3 (Fig. 3A, B) completely reprogram to iPSCs within 25 days while cells that proceed left at branch points 1 and 3 (Fig. 3A, B) remain as IMs.Cellular clusters that are adjacent to each other on the reprogramming pseudotime axis exhibit high correlation in their parameter values: that is, EPCs (cluster: 2) have a high correlation with early IMs (cluster: 4), whereas late IMs (clusters: 5, 6, 8, and 9) demonstrate high correlation with iPSCs (cluster: 7) (Fig. 3D). Moderate correlation of cells (cluster: 10) with the starting EPC cluster 1 could indicate more incompletely or partially reprogrammed cells in cluster 10 relative to the iPSCs in cluster 7. When we compared IMs that undergo reprogramming (cluster: 9) and the IMs that do not reprogram to iPSCs (cluster: 6), we noted differences in their NAD(P)H lifetime components, indicating that these parameters might play a role in identifying reprogramming cell fate.To further examine the parameters that distinguished the cell clusters, we performed another correlation analysis using Moran's I,[70] which is a statistic that reports whether cells at nearby positions on a trajectory will have similar (or dissimilar) expression levels for a given parameter (Supplementary Fig. S5B). When the parameters were ranked by Moran's I, FAD lifetime parameters (FAD τ1, τ2, τm) were most important in distinguishing clusters followed by NAD(P)H lifetime parameters (NAD(P)H τ2, α1, τ1), in agreement with expression level maps (Supplementary Fig. S5C–H).This result is consistent with high gain ratio values for FAD lifetime parameters (Fig. 2E) and the observation FAD lifetime parameters are significantly different among EPCs, IMs, and iPSCs (Fig. 1D). When we plotted the identified important metabolic parameters as a function of pseudotime, we observed that they undergo biphasic changes during reprogramming that could be representative of the OXPHOS burst (Fig. 3E–J). These pseudotime trajectories complement the UMAP visualizations (Fig. 2A–C) by providing higher temporal resolution of changes occurring during reprogramming.
Isolation of high-quality iPSCs
Although visualizing reprogramming heterogeneity at a high temporal resolution and single-cell resolution with our methods can be insightful, the terminal goal of any reprogramming platform is to successfully isolate iPSCs that can be used for downstream applications. As proof of concept, we used a combination of OMI, μCP platform, and machine learning models developed in this study to isolate high-quality iPSCs (Fig. 4). First, we tracked the metabolic and nuclear parameters of μFeatures throughout the reprogramming time course using OMI (Fig. 4A). Second, we employed our random forest classification model to predict the reprogramming status of the tracked μFeatures (Fig. 4B).
FIG. 4.
In situ OMI of μFeatures aids in the identification and isolation of iPSCs.
(A) Representative optical redox ratio images of a single μFeature at different days through the reprogramming time course. Color bar is indicated on the right. Scale bar, 100 μm.
(B) Stacked column bar graph showing the variation in distribution of cell types during reprogramming as classified by the random forest model using all metabolic and nuclear parameters. The color of the bar corresponds to the cell type and the height of the bar represents the percentage of the cell type.
(C) Violin plots showing the distribution of reprogramming pseudotime of single cells within a μFeature as a function of the actual reprogramming time point. Middle solid line indicates median and upper, and bottom solid lines indicate the interquartile range. Statistical significance was determined by one-way ANOVA using the Kruskal–Wallis test for multiple comparisons; ns for p ≥ 0.05, * for p < 0.05, ** for p < 0.01, *** for p < 0.001, **** for p < 0.0001).
(D) Representative images of cell subpopulations on μFeatures at different days through the reprogramming time course, stained with Hoechst (blue), TRA-1-60 (white), Nanog (green), and CD71 (magenta). Scale bar, 100 μm.
(E) Representative image of iPSC colony isolated from a μFeature, stained with Hoechst (nuclear dye), TRA-1-60, and Nanog (pluripotency markers). Scale bar, 50 μm.
(F) iPSCs derived from μCP substrates show normal karyotypes suggesting that no major chromosome abnormality was present within the cells after reprogramming.
In situ OMI of μFeatures aids in the identification and isolation of iPSCs.(A) Representative optical redox ratio images of a single μFeature at different days through the reprogramming time course. Color bar is indicated on the right. Scale bar, 100 μm.(B) Stacked column bar graph showing the variation in distribution of cell types during reprogramming as classified by the random forest model using all metabolic and nuclear parameters. The color of the bar corresponds to the cell type and the height of the bar represents the percentage of the cell type.(C) Violin plots showing the distribution of reprogramming pseudotime of single cells within a μFeature as a function of the actual reprogramming time point. Middle solid line indicates median and upper, and bottom solid lines indicate the interquartile range. Statistical significance was determined by one-way ANOVA using the Kruskal–Wallis test for multiple comparisons; ns for p ≥ 0.05, * for p < 0.05, ** for p < 0.01, *** for p < 0.001, **** for p < 0.0001).(D) Representative images of cell subpopulations on μFeatures at different days through the reprogramming time course, stained with Hoechst (blue), TRA-1-60 (white), Nanog (green), and CD71 (magenta). Scale bar, 100 μm.(E) Representative image of iPSC colony isolated from a μFeature, stained with Hoechst (nuclear dye), TRA-1-60, and Nanog (pluripotency markers). Scale bar, 50 μm.(F) iPSCs derived from μCP substrates show normal karyotypes suggesting that no major chromosome abnormality was present within the cells after reprogramming.Third, we inferred the pseudotimes during the reprogramming time course to monitor the progress of the μFeatures along the reprogramming trajectory (Fig. 4C). Finally, we performed immunostaining on the μFeatures. The reprogramming status predictions made by the machine learning models correlated well with the cell markers detected by immunostaining (Fig. 4D).Finally, we isolated iPSCs from the μCP culture platform based on the predictions made by the random forest classification model. The physical separation of μFeatures from one another, combined with a high fraction of predicted iPSCs, even up to 100% throughout the μFeature, resulted in easy picking and isolation of iPSCs. We further confirmed that the isolated iPSC lines expressed pluripotency markers (Fig. 4E) and showed no genomic abnormalities (Fig. 4F), indicating that our approach supports the generation of genetically stable iPSC lines.
Discussion
In this study, we report a noninvasive, high-throughput, quantitative, and label-free imaging platform to predict the reprogramming outcome of EPCs by combining micropatterning, live-cell autofluorescence imaging, and automated machine learning. We can predict the reprogramming status of EPCs at any time point during reprogramming with an accuracy of ∼95% and model performance of ∼0.99 (AUC of ROC) using a random forest classification model with 11 metabolic parameters and 8 nuclear parameters (Fig. 2G and Supplementary Fig. S4C–F). In addition, we provide a single-cell roadmap of EPC reprogramming, which reveals diverse cell fate trajectories of individual reprogramming cells (Fig. 3).Recent evidence indicates that metabolic changes during reprogramming include decreasing OXPHOS and increasing glycolysis,[26,71] along with a transient hyperenergetic metabolic state, called OXPHOS burst. This OXPHOS burst occurs at an early stage of reprogramming and shows characteristics of both high OXPHOS and high glycolysis, which could be a regulatory cue for the overall shift of reprogramming.[28,29,72,73] These changes are accompanied by alterations in the amounts of corresponding metabolites and have been confirmed by genome-wide analyses of gene expression, protein levels, and metabolomic profiling.[53,74-76]The shifts in cellular metabolism affect enzymes that control epigenetic configuration,[77] which can impact chromatin reorganization and provide a basis for changes in nuclear morphology as well as gene expression during reprogramming.[34,78-81] Consistent with these studies, the redox ratio increases during reprogramming (Fig. 1D), which could be indicative of increased glycolysis during reprogramming.[82]The changes in NAD(P)H and FAD lifetime parameters that occur during reprogramming (Fig. 1 and Supplementary Fig. S2) could reflect changes in quencher concentrations, such as oxygen, tyrosine, or tryptophan, or changes in local temperature and pH.[35,83,84] Specifically, the biphasic changes in the metabolic and nuclear parameters could be due to the increased production of reactive oxygen species (ROS) by mitochondria[35,59,85,86] during the OXPHOS burst. The generated ROS could further serve as a signal to activate nuclear factor (erythroid derived 2)-like-2 (NRF-2), which then induces hypoxia-inducible factors (HIFs) that promote glycolysis during reprogramming by increasing the expression levels of the glycolysis-related genes.[25,73,75]Moreover, the importance of FAD parameters for distinguishing various reprogramming cell types (Figs. 1D and 2E) could point to the significant changes in the mitochondrial environment during reprogramming. The differences in NAD(P)H lifetime parameters among IMs that successfully undergo reprogramming and those that do not (Fig. 3D) may suggest a role for molecular pathways involving NAD(P)H in impacting reprogramming barriers and thus the end cell fate of reprogramming cells.The classification models trained on all 11 metabolic and 8 nuclear parameters had the highest accuracy. Random forest models using only FAD lifetime parameters, in particular, yielded comparatively high ROC AUC values (Fig. 2G and Supplementary Fig. S4E, F). In addition, FAD lifetime parameters were more accurate for predicting reprogramming status than using nuclear parameters alone, which can be obtained using wide-field or confocal fluorescence microscopy.Imaging only FAD lifetime parameters instead of imaging all the parameters significantly reduced the time of imaging from 7 to 2.5 min per μFeature (Fig. 2H). This reduction in imaging time is especially helpful when assessing multiple μFeatures for iPSC quality at larger scales, and lifetime measurements benefit from fewer confounding factors and less variability compared with intensity measurements.Prior studies on the heterogeneity of reprogramming relied on bulk analysis[87-90] or single-cell analysis[91-99] techniques. Prior bulk analyses encountered challenges in characterizing the variability in both the starting cell population and during fate conversion, owing to the variable kinetics and low efficiency of reprogramming, and single-cell techniques disrupted the cellular microenvironment, resulting in significant changes in the biophysical properties of cells undergoing reprogramming. Our methods overcome these challenges with the combination of a μCP culture platform, OMI, and machine learning.First, the μCP platform dissects cell cultures undergoing reprogramming into hundreds of cell subpopulations that enables easy tracking of cells undergoing reprogramming with imaging time as low as 2.5 min per μFeature. Circular and isolated μFeatures, however, are not necessary to implement OMI. Moreover, the μCP platform ensures an intact microenvironment for reprogramming cells while enabling single-cell analysis. Second, OMI provides single-cell measurements nondestructively to assess the influence of neighboring cells and provides high temporal resolution for time-course studies of reprogramming. Finally, machine learning with trajectory inference is applied here to a new type of cellular measurement, single cell metabolism. These methods excel in analyzing time course data containing asynchronous processes within cells—as seen in prior studies with flow cytometry and gene expression data.[45,74,97,98,100] Machine learning here overcomes the problems of reprogramming trajectories built based on absolute time points. Overall, these methods could aid in the identification of somatic cells or early reprogramming cells that are refractory toward reprogramming and thus increase the success rate of iPSC generation from patient-derived primary cells.Our methods could be adapted for industrial-scale GMP-compliant manufacturing system. First, the μCP platform involves direct extracellular matrix printing onto optically clear substrates (Supplementary Fig. S1A) and does not involve any gold coating, unlike traditional microcontact printing methods.[101,102] Therefore, the μCP platform is cost-effective and relatively simple because it does not require cleanroom access. Plus, our μCP platform utilizes prior advances in the field with xeno-free and feeder-free medium, thus preventing reprogramming inconsistencies arising from the undefined nature of xeno components. Second, we used EPCs isolated from peripheral blood as the starting cell type for reprogramming due to their lack of genomic rearrangements and demonstrated reprogramming ability.[103,104] Moreover, peripheral blood collection is a routine laboratory procedure. The use of EPCs with episomal reprogramming plasmids is likely to generate genetically stable iPSCs devoid of reprogramming factors, as seen in prior studies with EPCs[103,104] and episomal plasmids.[51,52,105,106] Third, the autofluorescence imaging technique is label-free, unlike other common methods to study metabolism such as electron microscopy, immunocytochemistry, and colorimetric metabolic assays. Autofluorescence imaging provides nondestructive real-time monitoring of live cells with lower sample phototoxicity compared with single-photon excitation.[107] Taken together, the processes of μCP platform fabrication, reprogramming, cell culture, autofluorescence imaging, iPSC identification based on machine learning models, and iPSC isolation can all be automated and extended to different reprogramming methods[108] (e.g., mRNA, Sendai virus), to other starting cell types (e.g., fibroblasts, keratinocytes), to other parameters (e.g., cell morphology,[19-21] mitochondrial structure[71,76,109]), and to other processes (e.g., differentiation[110-113]).Limitations of our current approach include imaging resolution, culture duration, and per-μFeature image analysis. First, comprehensive three-dimensional imaging of each μFeature could provide maps at higher resolution to further dissect the metabolic and nuclear changes occurring throughout the entire depth of the reprogramming cultures. Second, any discrepancy in the z-plane during OMI and confocal fluorescence image acquisition on the same sample could lead discrepancies in the data that lead to poor classification of cell types by the random forest models. We mitigated this issue by programming a fixed z-plane for both OMI and confocal fluorescence image acquisition. Third, there is a limited duration of culture before cells overgrow within the μFeature. This could also result in cell detachment from the μFeature, which is difficult to image with OMI. For the reprogramming experiments described here, circular features with 300 μm radius have been used for ∼22 days of culture, although the cell seeding density or micropatterned geometry[34,43,101] could be easily changed. Fourth, because our primary focus was on identifying iPSCs that could be isolated to establish cell lines, this study did not distinguish the diverse cell states of IMs between day 8 and day 17 of reprogramming and subtle changes in iPSC state. For instance, the intensity levels of NANOG and other key pluripotency transcription factors were not accounted for in our analyses, making it difficult to distinguish between partially reprogrammed cells, early and late stage iPSCs, and competing cells in culture.[100,114-116] Higher resolution staining for NANOG along with additional stem cell markers (e.g., TRA-1-60, MYC, OCT4, E-CADHERIN) could provide further insights into the differences between cell clusters identified in our analysis. Finally, imaging analysis was performed at different reprogramming time points on a per-μFeature basis. Tracking single cells within the μFeature during reprogramming using cell tracking algorithms[117] could provide insights into metabolic and nuclear changes during reprogramming at a higher resolution.Overall, we developed a high-throughput, noninvasive, rapid, and quantitative method to predict the reprogramming status of cells and study reprogramming heterogeneity. Our studies indicate that OMI can predict the reprogramming status of cells, which could enable real-time monitoring during iPSC manufacturing, thereby aiding in the identification of high-quality iPSCs in a timely and cost-effective manner. Similar technologies could impact other areas of cell manufacturing such as direct reprogramming, differentiation,[110] and cell line development.
The Bigger Picture
Cell cultures undergoing epigenetic reprogramming are complex, as cells span a broad spectrum of cell states. In this study, we use two-photon microscopy to acquire high-dimensional data on the nuclear and metabolic characteristics of cells and develop machine learning models to predict reprogramming outcomes. This analysis can offer an orthogonal approach to investigating human somatic cell reprogramming that is complementary to the current studies of transcript and protein expression within reprogramming cells.With further development, strategies using these imaging techniques could enable a rapid, automated, and standardized method to isolate iPSCs. This proof-of-principle study indicates that an image-based approach in conjunction with trajectory inference methods can elucidate cellular and subcellular changes that accompany human cell fate transitions.
Materials and Methods
EPC isolation and cell culture
EPCs were isolated from fresh peripheral human blood that was obtained from healthy donors (Interstate Blood Bank, Memphis, TN). Research using purchased de-identified blood specimens are not considered human subjects research under the US Common Rule. Blood was processed within 24 h of collection, where hematopoietic progenitor cells were extracted from whole blood using negative selection (RosetteSep; STEMCELL Technologies) and cultured in polystyrene tissue culture plates in erythroid expansion medium (STEMCELL Technologies) for 10 days to enrich for EPCs.Enriched EPCs from day 10 were examined by staining with APC antihuman CD71 antibody (1:100; 334107; Biolegend) and incubating for 1 h at room temperature (RT). Data were collected on Attune Nxt flow cytometer and analyzed with FlowJo.
Micropattern design and polydimethylsiloxane stamp production
First, a template with the feature designs was created in AutoCAD (Autodesk). The template was then sent to the Advance Reproductions Corporation, MA, for the fabrication of a photomask, and a 6-inch (0.15 m) patterned silicon (Si) wafer was fabricated by the Microtechnology Core, University of Wisconsin-Madison, WI.[118] Using soft photolithography techniques, the Si wafer was spin coated with an SU-8 negative photoresist (MICRO CHEM) and exposed to UV light. The Si mold was then developed for 45 min in SU-8 developer (Sigma) that yielded features with a height of 150 μm. The Si mold was then washed with acetone and isopropyl alcohol.Elastomeric stamps used for microcontact printing were generated by standard soft lithographic techniques. The Si mold was rendered inert by overnight exposure in vapors of (tridecafluoro-1, 1, 2, 2-tetrahydrooctyl) trichlorosilane. Polydimethylsiloxane (PDMS; Sylgard 184 silicone elastomer base, 3097366-1004; Dow Corning) was prepared at a ratio of 1:10 curing agent (Sylgard 184 silicone elastomer curing agent, 3097358-1004; Dow Corning) and degassed in a vacuum for 30 min. The PDMS was then poured over the SU-8 Si mold on a hot plate and baked at 60°C overnight to create the PDMS stamp.
μCP well plate construction
μCP substrates were constructed based on previous studies.[47,48,119] In brief, PDMS stamps with 300 μm radius circular features were coated with Matrigel (WiCell Research Institute) for 24 h. After 24 h, the Matrigel-coated PDMS stamp was dried with nitrogen and placed onto 35 mm cell culture-treated ibiTreat dishes (81156; Ibidi). A 50 g weight was added on top of the PDMS stamps to ensure even pattern transfer from the Matrigel-coated PDMS stamp to the ibiTreat dish.This setup was incubated for 2 h at 37°C. The 35 mm ibiTreat dish was then backfilled with PLL (20 kDa)-g-(3.5)-PEG (2 kDa) (Susos), a graft polymer solution with a 20 kDa PLL backbone with 2 kDa PEG side chains, and a grafting ratio of 3.5 (mean PLL monomer units per PEG side chain), by using 0.1 mg/mL solution in 10 mM HEPES buffer for 30 min at RT. The ibiTreat dish was then washed with phosphate-buffered saline (PBS) and exposed to UV light for 15 min for sterilization to yield the micropatterned substrate.
Reprogramming
On day 10, EPCs were electroporated with four episomal reprogramming plasmids encoding OCT4, shRNA knockdown of p53 (#27077; Addgene); SOX2, KLF4 (#27078; Addgene); L-MYC, LIN28 (#27080; Addgene); miR302-367 cluster (#98748; Addgene), using the P3 Primary Cell 4D-Nucleofector Kit (Lonza) and the EO-100 program.[51,52] Electroporated EPCs were seeded onto micropatterned substrates with erythroid expansion medium (STEMCELL Technologies) at a seeding density of 2000k cells per dish. Cells were supplemented with ReproTeSR (STEMCELL Technologies) on alternate days starting from day 3 without removing any medium from the well. On day 9, the medium was entirely switched to ReproTeSR, and the ReproTeSR medium was changed daily starting from day 10.
Isolation of iPSCs
To isolate high-quality iPSC lines, candidate colonies were picked from micropatterns using a 200 μL micropipette tip and transferred to Matrigel-coated polystyrene tissue culture plates in mTeSR1 medium (WiCell Research Institute). If additional purification was required, one additional manual picking step with a 200 μL micropipette tip was performed. During picking and subsequent passaging, the culture medium was often supplemented with the Rho kinase inhibitor Y-27632 (Sigma-Aldrich) at a 10 μM concentration to encourage cell survival and establish clonal lines. iPSCs obtained from EPCs were maintained in mTeSR1 medium on Matrigel-coated polystyrene tissue culture plates and passaged with ReLeSR (STEMCELL Technologies) every 3–5 days. All cells were maintained at 37°C and 5% CO2.
Antibodies and staining
All cells were fixed for 15 min with 4% paraformaldehyde in PBS (Sigma-Aldrich) and permeabilized with 0.5% Triton-X (Sigma-Aldrich) for >4 h at RT before staining. Hoechst (H1399; Thermo Fisher Scientific, Waltham, MA) was used at 5 μg/mL with 15 min incubation at RT to stain nuclei. Primary antibodies were applied overnight at 4°C in a blocking buffer of 5% donkey serum (Sigma-Aldrich) at the following concentrations: anti-laminin (L9393; Sigma-Alrich) 1:500; TRA-1-60 (MAB4360; EMD Millipore, Burlington, MA) 1:100; NANOG (AF1997; R&D Systems) 1:200; CD71 (334107; Biolegend) 1:100.Secondary antibodies were obtained from Thermo Fisher Scientific and applied in a blocking buffer of 5% donkey serum for 1 h at RT at concentrations of 1:400 to 1:800. A Nikon Eclipse Ti epifluorescence microscope was used to acquire single 10 × images of each micropattern, and a Nikon AR1 confocal microscope was used to acquire 60 × stitched images of each micropattern using the z-plane closest to the micropatterned substrate for reprogramming studies. In brief, EPCs are identified as CD71+; Nanog−, IMs are indicated as CD71−; Nanog−, and iPSCs are indicated as CD71−; Nanog+.
Autofluorescence imaging of NAD(P)H and FAD
Fluorescence lifetime imaging was performed at different time points during reprogramming by an Ultima two-photon microscope (Bruker) composed of an ultrafast tunable excitation laser source (Insight DS+; Spectra-Physics) coupled to a Nikon Ti-E inverted microscope with time-correlated single-photon counting electronics (SPC-150; Becker & Hickl). The laser source enables sequential excitation of NAD(P)H at 750 nm and FAD at 890 nm. NAD(P)H and FAD images were acquired through 440/80 and 550/100 nm bandpass filters (Chroma), respectively, using gallium arsenide phosphide photomultiplier tubes (H7422; Hamamatsu).The laser power at the sample was ∼3.5 mW for NAD(P)H and 6 mW for FAD. Lifetime imaging using time-correlated single-photon counting electronics (SPC-150; Becker & Hickl) was performed within Prairie View Atlas Mosaic Imaging (Bruker Fluorescence Microscopy) to capture the entire μFeature. Fluorescence lifetime decays with 512-time bins were acquired across 512 × 512-pixel images with a pixel dwell time of 4.8 μs and an integration period of 60 s. Photon count rates were ∼1–5 × 105 and monitored during image acquisition to ensure that no photobleaching occurred.All samples were placed on a stage-op incubator and illuminated through a 40 × /1.15 NA objective (Nikon). The short lifetime of red blood cell fluorescence at 890 nm was used as the instrument response function and had a full-width half maximum of 240 ps. A yellow green (YG) fluorescent bead (YG microspheres, Polysciences Inc.; τ = 2.13 ± 0.03 ns, n = 6) was imaged daily as a fluorescence lifetime standard.[35,120]
Image analysis
Fluorescence lifetime decays were analyzed to extract fluorescence lifetime components through SPCImage software (Becker & Hickl). A threshold was used to exclude pixels with low fluorescence signals (i.e., background). A bin of 3 × 3 pixels was used to maintain spatial resolution, the fluorescence lifetime decay curve was convolved with the instrument response function and fit to a two-component exponential decay model, I(t) = α1e−1 + α2e−2 + C, where I(t) is the fluorescence intensity as a function of time t after the laser pulse, α1 and α2 are the fractional contributions of the short and long lifetime components, respectively (i.e., α1 + α2 = 1), τ1 and τ2 are the short and long lifetime components, respectively, and C accounts for background light.Both NAD(P)H and FAD can exist in quenched (short lifetime) and unquenched (long lifetime) configurations[39,40]; the fluorescence decays of NAD(P)H and FAD are, therefore, fit to two components. Fluorescence intensity images were generated by integrating photon counts over the per-pixel fluorescence decays.Images were analyzed at the single-cell level to evaluate cellular heterogeneity.[121]A pixel classifier was trained on 15 images using ilastik[60] software to identify the pixels within the nuclei in NAD(P)H images. An object classifier was then used to identify the nuclei in NAD(P)H images using the pixel classifier along with the following parameters: Method = Simple, Threshold = 0.3, Smooth = 1, Size Filter Min = 15 pixels, Size Filter Max = 500 pixels. A customized CellProfiler[61] pipeline was then used to obtain metabolic and nuclear parameters. The CellProfiler pipeline applied the following steps: Primary objects (nuclei) were inputted from ilastik. Secondary objects (cells) were then identified in the NAD(P)H intensity image by outward propagation of the primary objects.Cytoplasm masks were determined by subtracting the nucleus mask from the cell mask. Cytoplasm masks were applied to all images to determine single-cell redox ratio and NAD(P)H and FAD lifetime parameters. A total of 11 metabolic parameters were analyzed for each cell cytoplasm (Supplementary Fig. S1D): NAD(P)H intensity (INAD(P)H), NAD(P)H α1, NAD(P)H τ1, NAD(P)H τ2, NAD(P)H mean lifetime (τm = α1τ1 + α2τ2), FAD intensity (IFAD), FAD α1, FAD τ1, FAD τ2, FAD τm, optical redox ratio [INAD(P)H/(INAD(P)H + IFAD)]. A total of eight nuclear parameters were analyzed for each nucleus: area, perimeter, MeanRad, NSI, solidity, extent, number of neighbors (#Neigh), and distance to closest neighbor (1stNeigh).Representative images of the optical redox ratio, NAD(P)H τm and FAD τm, were computed using the Fiji software.
UMAP clustering
Clustering of cells across EPCs, IMs, and iPSCs was represented using UMAP. UMAP dimensionality reduction[66] was implemented using R on all 11 OMI parameters (optical redox ratio, NAD(P)H τm, τ1, τ2, α1, α2; FAD τm, τ1, τ2, α1, α2) and/or all 8 nuclear parameters (area, perimeter, MeanRad, NSI, solidity, extent, #Neigh, 1stNeigh) for projection in two-dimensional space. The following parameters were used for UMAP visualizations: “n _neighbors”: 20; “min_dist”: 0.3, “metric”: Jaccard, “n_components”: 2.
z-Score hierarchical clustering
z-Score of each metabolic and nuclear parameter for each cell was calculated. z-Score = (μobserved − μrow)/σrow, where μobserved is the mean value of each parameter for each cell; μrow is the mean value of each parameter for all cells together, and σrow is the standard deviation of each parameter across all cells. Heatmaps of z-scores for all OMI variables were generated to visualize differences in each parameter between different cells. Dendrograms show clustering based on the similarity of average Euclidean distances across all variable z-scores. Heatmaps and associated dendrograms were generated in Python.
Classification methods
Random forest, simple logistic, k-nearest neighbor (IBk), and naive Bayes classification methods were trained to classify reprogramming cells into EPCs, IMs, and iPSCs using Weka software.[122] All data were randomly partitioned into training and test data sets using 15-fold cross-validation for training and test proportions of 93.3% (1994 cells) and 6.7% (143 cells), respectively. Cross-validation of EPCs, IMs, and iPSCs was performed using immunostaining data. Each model was replicated 100 times; new training and test data were generated before each iteration.Parameter weights for metabolic and nuclear parameters were extracted using the GainRatioAttributeEval function in Weka to determine the contribution of each variable to the trained classification models. One-vs-rest ROC curves were generated to evaluate the classification model performance on the classification of test set data and are the average of 100 iterations of data that was randomly selected from training and test sets. All the ROC curves displayed were constructed from the test data sets using the model generated from the training data sets.
Karyotyping
Cells cultured for at least five passages were grown to 60–80% confluence and shipped for karyotype analysis to WiCell Research Institute, Madison, WI. G-banded karyotyping was performed using standard cytogenetic protocols.[123] Metaphase preparations were digitally captured with Applied Spectral Imaging software and hardware. For each cell line, 20 GTL-banded metaphases were counted, of which a minimum of 5 was analyzed and karyotyped. Results were reported in accordance with guidelines established by the International System for Cytogenetic Nomenclature 2016.[124]
Statistics
p-Values were calculated using the nonparametric Kruskal–Wallis test for multiple unmatched comparisons with GraphPad Prism software. Statistical tests were deemed significant at α ≤ 0.05. Technical replicates are defined as distinct μFeatures within an experiment. Biological replicates are experiments performed with different donors. No a priori power calculations were performed.
Pseudotime trajectories
To order cells by pseudotime, EPCs were designated the starting point. We used Monocle3 to define a pseudo-reprogramming time trajectory, termed pseudotime, where cells are linearly ordered relative to their progress or change in metabolic and nuclear parameters relative to the starting EPC population. The lengths of the trajectory between each branch point were used to define state by the Monocle3 algorithm implemented in Rstudio Version 1.3.1073 as described in CodeS1.R file.
Authors: Stephen Sullivan; Glyn N Stacey; Chihiro Akazawa; Naoki Aoyama; Ricardo Baptista; Patrick Bedford; Annelise Bennaceur Griscelli; Amit Chandra; Ngaire Elwood; Mathilde Girard; Shin Kawamata; Tadaaki Hanatani; Theodoros Latsis; Stephen Lin; Tenneille E Ludwig; Tamara Malygina; Amanda Mack; Joanne C Mountford; Scott Noggle; Lygia V Pereira; Jack Price; Michael Sheldon; Alok Srivastava; Harald Stachelscheid; Shaji R Velayudhan; Natalie J Ward; Marc L Turner; Jacqueline Barry; Jihwan Song Journal: Regen Med Date: 2018-09-12 Impact factor: 3.806
Authors: Kaivalya Molugu; Ty Harkness; Jared Carlson-Stevermer; Ryan Prestil; Nicole J Piscopo; Stephanie K Seymour; Gavin T Knight; Randolph S Ashton; Krishanu Saha Journal: Biophys J Date: 2019-10-22 Impact factor: 4.033
Authors: Clifford D L Folmes; Timothy J Nelson; Almudena Martinez-Fernandez; D Kent Arrell; Jelena Zlatkovic Lindor; Petras P Dzeja; Yasuhiro Ikeda; Carmen Perez-Terzic; Andre Terzic Journal: Cell Metab Date: 2011-08-03 Impact factor: 27.287
Authors: Yosef Buganim; Dina A Faddah; Albert W Cheng; Elena Itskovich; Styliani Markoulaki; Kibibi Ganz; Sandy L Klemm; Alexander van Oudenaarden; Rudolf Jaenisch Journal: Cell Date: 2012-09-14 Impact factor: 41.582
Authors: Kate E Hawkins; Shona Joy; Juliette M K M Delhove; Vassilios N Kotiadis; Emilio Fernandez; Lorna M Fitzpatrick; James R Whiteford; Peter J King; Juan P Bolanos; Michael R Duchen; Simon N Waddington; Tristan R McKay Journal: Cell Rep Date: 2016-02-18 Impact factor: 9.423
Authors: Sara E Howden; John P Maufort; Bret M Duffin; Andrew G Elefanty; Edouard G Stanley; James A Thomson Journal: Stem Cell Reports Date: 2015-11-12 Impact factor: 7.765