Literature DB >> 35412891

Simple, fast, and flexible framework for matrix completion with infinite width neural networks.

Adityanarayanan Radhakrishnan^1,2, George Stefanakis^1,2, Mikhail Belkin³, Caroline Uhler^1,2,4.

Abstract

Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semisupervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience.

Entities: Chemical

Keywords: drug response imputation; image inpainting; infinite width neural networks; matrix completion; neural tangent kernel

Mesh：

Year: 2022 PMID： 35412891 PMCID： PMC9169779 DOI： 10.1073/pnas.2115064119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

Matrix completion is a fundamental problem in machine learning, arising in a variety of applications from collaborative filtering to virtual drug screening and image inpainting/reconstruction. Given a matrix Y with only a subset of coordinates observed, the goal of matrix completion is to impute the unobserved entries in Y. For example, in collaborative filtering (Fig. 1), matrix completion is used to infer the interests of a user from the interests of other users. A prominent example is the Netflix challenge of inferring movie preferences from sparsely populated matrices of user ratings (1). For virtual drug screening (Fig. 1), matrix completion is used to predict the effect of a drug on a cell type/state given other drug and cell type/state combinations. For image inpainting (Fig. 1) and image reconstruction (Fig. 1), matrix completion is used to restore missing pixels in a corrupted image.

Fig. 1.

An overview of matrix completion applications. (A) Collaborative filtering example (the Netflix problem), where the goal is to predict how a user would rate (on a scale of 1 to 5) an unseen movie. (B) Virtual drug screening, where the problem is to predict the gene expression profile for an unobserved drug/cell type combination. In this application, entire columns are unobserved. (C and D) Image inpainting and reconstruction involves reconstructing a corrupted region of an image (shown as black pixels). Question marks in A and B and zero (black) pixels in C and D represent unobserved entries. (E) Our NTK matrix completion framework is easily adapted to solve all of the above problems by selecting a feature prior that represents an embedding of application specific metadata. Standard approaches to matrix completion such as nuclear norm minimization (2–4) or deep matrix factorization (5) aim for a completion that yields a low-rank matrix. While such methods can be effective in applications like collaborative filtering, where low rank can capture user similarity, such an objective function can lead to ineffective solutions for applications including drug response imputation, image inpainting, or image reconstruction. For example, in the case of drug response imputation, imputing a new drug would involve predicting the values of an entirely missing vector of gene responses (in contrast to the aforementioned Netflix problem, which involves imputing single scalar entries of the matrix). In this case, a low-rank reconstruction would replace all missing entries with a fixed constant, thereby leading to poor predictive performance. Similarly, for image inpainting and reconstruction, a low-rank completion is generally ineffective since it does not take into account local image structure (6, 7). Thus, there is a need for a more general approach to matrix completion that can easily adapt to the structures in different applications. In this work, we provide a simple, fast, and flexible framework for matrix completion. To accomplish this, we view matrix completion as an inverse problem; given a matrix such that a subset of coordinates are observed and the other entries are missing, we aim to construct such that for all observed coordinates . We use neural networks to model the observations in Y and use gradient descent to minimizewhere are the weights of a neural network with each and ; is a fixed element-wise nonlinearity; and is a fixed application-dependent matrix, which we call the feature prior (described in detail in the section Flexibility through Feature Prior). The completed matrix is then obtained using the forward model with the trained weights, i.e., . The main contribution of this work is showing that minimizing the loss in Eq. when the width of the neural network tends to infinity gives rise to a simple, fast, and flexible framework for matrix completion suitable for a range of applications. Superficially, the formulation in Eq. appears similar to that of traditional supervised learning, where a neural network is trained to map data (which would correspond to Z in our formulation) to corresponding labels Y. However, it is important to note that in our formulation, Z can be independent of the observations Y (Z could, for example, be the identity matrix or a random matrix). Thus, Z should be interpreted as a prior that can be chosen in an application-dependent manner. We will discuss the effect of this prior as well as how to choose it for very different applications like virtual drug screening and image inpainting.

Simple and Fast Algorithm for Matrix Completion through Infinite Width Networks

A trend for improving neural network performance is to make models larger (in multiple respects) (8–11). Underscoring this trend, several recent works have empirically demonstrated the advantage of larger (in particular, wider) networks with respect to generalization and performance for classification and representation learning tasks (12–15). There is also an emerging theoretical understanding of the benefit of larger models (16–18). The extreme case where network width approaches infinity is what we consider in this paper in the setting of matrix completion. While generally larger neural networks require more computational resources for training, quite unintuitively, the limit as network width approaches infinity may yield computational savings. Namely, it was recently shown that training infinite width networks is equivalent to solving kernel regression with a particular kernel known as the neural tangent kernel (NTK) (19). For fully connected networks, the NTK can be computed efficiently in closed form (19), and thus, training an infinite width network reduces to solving a linear system. While this may still be computationally expensive when the number of examples is large, we will use recent preconditioner methods (20–22) to overcome this limitation. For convolutional networks, no efficient computation of the NTK (the so-called CNTK) has been known (23–25). A major contribution of this work is to provide a memory and runtime efficient algorithm for computing the exact CNTK for matrix completion for a class of practical neural network architectures. As a consequence, our framework can be used to inpaint or reconstruct high-resolution images with hundreds of thousands of pixels. We also provide software for constructing the CNTK as well as precomputed kernels. The simplicity and speed of our framework is exhibited by the fact that most of the results in this work require only a central processing unit (CPU) and can be run efficiently on a laptop.

Flexibility through Feature Prior

The matrix Z in Eq. is key to making our framework easily adaptable to different applications. Unlike traditional supervised learning where the goal is to learn a mapping from data X to labels Y, the matrix Z in our framework can be independent of the observations in Y. We refer to Z as a feature prior since, as we will see, by minimizing the loss in Eq. , the entries of Z encode structure between the coordinates of Y (Fig. 1). We will demonstrate the flexibility of our framework by using it in two very different applications, namely, for drug response imputation and image inpainting/reconstruction. For drug response imputation, we will select feature priors that encode information about cell and drug type combinations. For image inpainting and reconstruction, we will select feature priors that encode information about image coordinates. In addition to being flexible, we will show that our approach is competitive in terms of speed and accuracy with prior approaches that were specifically developed for drug response imputation (26, 27) or image inpainting/reconstruction (28–30).

Matrix Completion with the NTK

In this section, we derive the NTK for matrix completion when using fully connected networks. Our derivation provides a principled method for selecting the feature prior, Z; namely, we show that Z should be an embedding of coordinate metadata, i.e., information describing the coordinates of Y. For example, in drug response imputation, each column of Z could correspond to a different drug, and two columns of Z should be similar if the drug metadata is similar (e.g., the molecular structures are similar). The resulting method is then equivalent to performing semisupervised learning to map from the columns of Z to observed entries in each row of Y. In Virtual Drug Screening with the NTK, we utilize this theoretical result to select an effective feature prior for virtual drug screening. Since the NTK forms the backbone of our framework, we start with the definition of the NTK (19) and briefly review how solving kernel regression with the NTK connects to training infinitely wide neural networks. Let denote a neural network with parameters w. The corresponding NTK, , is a symmetric, continuous, positive definite function given bywhere are the network parameters at initialization. For a review of kernel regression and kernel functions, see ref. 31. Given training data for , solving kernel regression with the NTK involves minimizing the loss:where , and with . The work of ref. 19 established that using kernel regression with the NTK is equivalent (under mild assumptions) to training a neural network to map to using the mean squared error, in the limit as the network width tends to infinity. Throughout this work, we assume that and that the nonlinearity in Eq. is homogeneous (which includes, for example, the rectified linear unit [ReLU], a widely used nonlinearity) so that the NTK corresponding to a fully connected network can be computed efficiently in closed form (19, 32, 33); see , for a short review of the relevant literature and notation.

Feature Prior Provides a Flexible Approach for Matrix Completion through Connection with Semisupervised Learning.

A natural approach for imputing missing entries in a matrix, Y, is to first obtain an embedding of the coordinates of Y [e.g., a map from coordinates (i, j) to ] and then learn a map from the coordinate embedding to the observed entries in Y (e.g., a map from to ) (see also ref. 34, chap. 1). For example, for virtual drug screening, one could first embed the drugs based on their molecular properties and then learn a map from this embedding to the measured output, such as gene expression. Such an approach in which a map is learned from an embedding to the observed samples is referred to as semisupervised learning (ref. 35, chap. 15). In this section, we prove that minimizing the loss in Eq. is equivalent to using a semisupervised learning approach for matrix completion. Namely, we show that the columns of Z represent an embedding of the coordinates of Y and that the NTK is used to map from the columns of Z to the entries in Y. It is a priori unclear how to compute the NTK for matrix completion since this requires training examples and labels. For this, we note the following equivalent formulation of Eq. :where for a constant c; denotes the trace inner product; and is an indicator matrix, i.e., it has a 1 in the (i, j) entry and zeros everywhere else. To ease notation, we will use M to denote the indicator matrix . The formulation in Eq. shows that we can view matrix completion as a problem where the training examples are indicator matrices M and the labels are the corresponding entries . This reformulation yields the following closed form for the NTK for matrix completion, where denotes the dual activation function (36) to . To keep notation simple, we here provide the theorem when is the ReLU activation function, but this result holds generally for homogeneous nonlinearities (). Assume , where each column is normalized with . Let be a d layer fully connected network with nonlinearity and in Eq. . Then, as widths , the NTK for matrix completion with is given bywhere , and for and . The proof as well as an example showing how can be used in practice to compute the NTK for matrix completion is presented in . Since the kernel value between M and is a function of columns j and of Z, implies that the NTK for matrix completion maps columns of Z to entries , and thus, the columns of Z encode structure between the coordinates of Y. By varying the nonlinearity , depth d, and feature prior Z, our framework encapsulates a variety of semisupervised learning approaches. To provide a nontrivial example, we prove in , that our framework for matrix completion generalizes Laplacian-based semisupervised learning (37). This insight regarding the connection between our framework for matrix completion and semisupervised learning represents the backbone for a simple and competitive approach to virtual drug screening described in Virtual Drug Screening with the NTK.

Virtual Drug Screening with the NTK

The Connectivity Map (CMAP) is a prominent, large-scale, publicly available drug screen that considers 20,413 different compounds and 72 different cell lines (38). Experiments in CMAP were performed on a subset of 201,484 drug/cell line pairs; for each of these pairs the gene expression profile of 978 landmark genes was measured. CMAP has been an important resource for computational approaches to drug discovery and drug repurposing (38–40). In these applications, the goal is to use a subset of observed drug/cell type pairs to predict the gene expression profile of new drug/cell type pairs. These profiles are then used to identify drug candidates of interest that can be tested experimentally (41, 42). The CMAP dataset can be viewed as a three-dimensional tensor (drugs, cell lines, and genes), where many of the entries are missing. In the following, we use the same preprocessing of the data as in ref. 26 to filter out drug/cell line combinations with very few or inconsistent samples; a description and a link to the dataset is provided in . The resulting drug/cell line combinations are shown in Fig. 2. The three-dimensional tensor can be flattened into a matrix, where the columns correspond to drug/cell line combinations and the rows represent genes (Fig. 2); i.e., following the notation from Virtual Drug Screening with the NTK, entry Y of the resulting flattened matrix is a real-valued number quantifying the gene expression of gene i in drug and cell type combination j. This matrix has a missing column for every missing drug/cell line combination. Classical low-rank matrix factorization methods would prove ineffective in this setting since they would replace each missing column by the same constant column. On the other hand, suggests the NTK as an effective way for imputing the missing gene expression profiles by selecting the feature prior Z such that two columns of Z are similar if they correspond to similar drug/cell line pairs. In the following, we discuss three different feature priors for this application; for a full description of these priors, see .

Fig. 2.

Our infinite width neural network framework outperforms DNPP (26), FaLRTC (27), and mean over cell types for drug response imputation on CMAP. (A) We visualize the availability of cell type and drug combinations of the subset from ref. 26. (B) Our method corresponds to first providing an embedding of cell type and drug combinations as the feature prior and then applying the NTK. We show that 1) using a feature prior consisting of one-hot vectors for drugs corresponds to imputation by performing mean across observations for each cell type and 2) using a feature prior that captures similarity between drugs and cell types is effective for imputation. (C and D) Our infinite width neural network framework (denoted NTK) outperforms DNPP and mean over cell type across three evaluation metrics. We use five rounds of 10-fold cross-validation to determine that the difference between our method and the next best method, DNPP, is statistically significant (P < ).

Feature Prior Corresponding to the Mean over Cell Type Baseline.

A simple baseline is to impute the gene expression profiles for each missing drug for a given cell line by the mean over all observed drugs for this cell line. Quite surprisingly, this simple approach gives rise to a strong baseline (26, 43) since cell type is the dominant factor, while drugs have subtle effects on gene expression. While it is generally nontrivial to improve upon this simple baseline without constructing a specialized algorithm (26, 44–46), our NTK framework provides an easy way for doing so. In particular, our framework makes it evident that the feature prior corresponding to the mean over cell type baseline is trivial since it corresponds to an embedding in which drugs are encoded via one-hot vectors (). Thus, to improve upon this baseline, we select any feature prior that can capture similarities between drugs.

Feature Prior Corresponding to Previous Algorithms.

We now demonstrate that our framework provides a direct approach to improve on previous methods for virtual drug screening by using the output of previous methods as a feature prior in our framework. Namely, if a method is used to produce an imputation, , then the columns in should represent an embedding of drug and cell type combinations that captures their similarity. Hence, we can use as the feature prior in our method. For illustration, we apply this approach to two state-of-the-art methods for virtual drug screening: 1) drug neighbor profile prediction (DNPP) (26), which is a weighted nearest neighbor scheme, and 2) fast low-rank tensor completion (FaLRTC) (27), which involves low-rank matrix completion along each slice of the CMAP tensor. We show that our framework using these feature priors yields an improvement over the individual methods ().

Proposed Feature Prior for Drug Response Imputation.

Observing the pattern of data availability in Fig. 2, it is apparent that a subset of cell lines have observations for many (>150) drugs (dense regime), while many cell lines have observations for only a few (150) drugs (sparse regime). While previous methods such as DNPP are quite effective in the dense regime, they are not as effective in the sparse regime (Fig. 2 and ). This can be explained by the fact that in the sparse regime, DNPP roughly imputes using the simple mean over cell type baseline. For effective drug response imputation in the sparse regime, our framework can be used to construct a simple feature prior by concatenating embeddings for cell types and drugs. In particular, we can use the gene expression values for a reference cell type for which there are a lot of drug observations (e.g., MCF7 in CMAP) as the embedding of drugs and the mean gene expression across all observations for a given cell type as the embedding of cell type. Fig. 2 shows that the NTK with this simple feature prior outperforms mean over cell type, FaLRTC, and DNPP in the sparse regime. We compare across Pearson’s r value, mean R2, and mean cosine similarity. A description of all evaluation metrics is provided in . By combining our feature prior for the sparse regime with the FaLRTC-based feature prior for the dense regime, we obtain a drug imputation method that significantly outperforms DNPP, FaLRTC, and mean over cell type on the full dataset (Fig. 2) (P < based on five rounds of 10-fold cross validation, with an improvement on every fold of every round across all metrics; ).

Matrix Completion with the Convolutional NTK

While we have thus far derived and applied the NTK for matrix completion using fully connected networks, these architectures are not nearly as effective as convolutional networks for matrix completion tasks in which the target matrix is an image. Similar to the case of fully connected networks, a closed form for the NTK corresponding to convolutional networks (the so-called CNTK) is known in the regression setting (23), but it has not been considered in the setting of matrix completion. Moreover, the runtime for computing the CNTK for regression scales quadratically with each image dimension. In this section, we derive the CNTK for matrix completion and provide a computationally efficient method for computing the CNTK for matrix completion for a class of feature priors that are effective for image inpainting and reconstruction. We begin by deriving the CNTK for matrix completion for a simple class of convolutional networks, when there are no downsampling or upsampling layers. We show that in this setting, the CNTK for matrix completion can be computed using terms from the CNTK for classification. In the following proposition (proof in ), denotes the tensor corresponding to the CNTK of a d layer convolutional network in the classification setting (ref. 23, section 4). Let be a d layer convolutional network used to map from feature prior, , to the target matrix, . Then as the number of convolutional filters per layer approaches infinity, the CNTK of is given bywhere denote indicator matrices.

CNTK Performs Semisupervised Learning Using Image Coordinate Features.

In Matrix Completion with the NTK, we established a connection between semisupervised learning and matrix completion using the NTK. We now establish a similar connection between semisupervised learning and matrix completion with the CNTK for a class of feature priors defined in . This class includes feature priors that are heavily used in image inpainting applications, namely, where the channels of Z are drawn independently and identically distributed (i.i.d.) from a stationary distribution (24, 30). The following theorem (proof in ), which is analogous to for the NTK, implies that using the CNTK for matrix completion is equivalent to mapping from coordinate features to observed entries in the target matrix Y. Consider a convolutional network of depth d with homogeneous activation and in which all filters have size q and circular padding. Let satisfyfor some with maximum at (0, 0) and (odd q). Then as the number of convolutional filters per layer goes to infinity, the CNTK simplifies to where is a function that can be computed from ψ (a recursive formula is provided in ). Since the function depends only on the positions of the coordinates, shows that the CNTK for matrix completion is equivalent to semisupervised learning using kernels on features corresponding to coordinates.

Closed Form for the CNTK of Modern Architectures for Matrix Completion.

Unlike the convolutional networks considered thus far, state-of-the-art architectures for unsupervised image inpainting such as refs. 24, 30 incorporate a variety of layer structures including strided convolution, nearest neighbor and bilinear upsampling, skip connections, and batch normalization. We derive (in ) the CNTK for matrix completion using convolutional networks with the following layer structures: 1) downsampling through strided convolution, 2) nearest neighbor upsampling, and 3) bilinear upsampling.*

Efficient Computation of the CNTK of Modern Architectures for Matrix Completion.

A key insight that we use to speed up the computation of the CNTK is that the kernel in Eq. depends only on the feature prior and not on the values of the observed pixels in an image. Hence, the CNTK need only be computed once for all images of a given resolution. This enables a drastic speedup over recomputing the kernel for every new image, as is currently required in classification. However, using such a direct approach to compute the CNTK is still computationally prohibitive for high-resolution images. In particular, computing the CNTK for a network with d convolutional layers to complete an image of size requires runtime and space. In order to overcome these limitations, prior work (25) used the Nyström method (47) to approximate the kernel. Instead of relying on such approximations, we here present an algorithm for computing the exact CNTK in a memory and runtime efficient manner for any convolutional neural network with circular padding, strided convolution, and nearest neighbor upsampling layers, when using a feature prior with i.i.d. random entries. Such networks and feature priors are heavily used for image completion tasks (30). Our main insight that enables such an algorithm is that for convolutional networks with strided convolution and nearest neighbor upsampling layers, the CNTK for low-resolution images can be expanded to high-resolution images for any feature prior with i.i.d. random entries. In particular, if a neural network with s downsampling and upsampling layers is used to inpaint images of resolution , our algorithm requires only an array of size , while storing the full CNTK requires an array of size . In practice, s is exponentially smaller than p, q, and so our method is significantly more memory efficient; see the following specific example. In addition, since our method only requires computing the CNTK for images of size , the runtime of our method is instead of , and thus, our method is significantly faster than a direct computation. A detailed description and proof of our expansion algorithm is presented in .

Example.

Let represent a convolutional neural network with circular padding, three layers of strided convolution with a stride size of 2 in each direction, and three nearest neighbor upsampling layers with a feature prior satisfyingwhere are constants. Suppose is used to inpaint images of size . Then, by computing the CNTK for resolution images, , we can expand up to the exact CNTK for images. Computing takes roughly 11 s when using a CPU with 1 thread, and uses less than 100 MB of memory with floating point precision. On the other hand, even storing the true kernel would require roughly 256 GB memory when using floating point precision. This is twice the amount of random-access memory (RAM) available on our server and 16 times the amount of RAM available on most laptops.

Image Inpainting and Reconstruction with the CNTK

We now utilize the results of Matrix Completion with the Convolutional NTK to perform large hole image inpainting and reconstruction. As illustrated in Fig. 1 , large hole inpainting involves imputing a large contiguous region in an image, while image reconstruction involves imputing random missing pixels in an image. Recent work (30) demonstrated that using convolutional neural networks with downsampling and upsampling layers to impute the missing pixels in images leads to competitive results for these applications. The methods from ref. 30 are a special case of our framework in Eq. , namely, using convolutional layers and letting the feature prior, Z, be a tensor with i.i.d. uniform random entries. Thus, we can use our framework for performing image completion tasks, and instead of training deep networks, we can simply solve kernel regression with the CNTK. We will demonstrate that this gives rise to a simple, fast, flexible, and competitive alternative to training deep networks for high-resolution image completion problems. Moreover, we will demonstrate that our framework can be used to identify the role of architecture and feature prior on image completion problems and aid in identifying effective architectures and feature priors.

Application 1: Large Hole Inpainting with the CNTK.

We utilize the CNTK for large hole inpainting tasks from refs. 24, 30. We compute the CNTK for the architecture used in ref. 24 with six downsampling and nearest neighbor upsampling layers for the feature prior Z with i.i.d. entries , where and . We compute the CNTK on resolution images and then expand it to the CNTK for high-resolution images via our expansion technique in Matrix Completion with the Convolutional NTK. We compare our method against neural networks of the same architecture using the training procedures from refs. 24, 30 (see , for details). We also compare our method against inpainting with biharmonic functions (28), which is currently the default inpainting method in scikit-image (29). Fig. 3 shows examples of the resulting reconstructions, and Fig. 3 shows the peak signal-to-noise ratio (PSNR) across all methods. Our method on average outperforms both inpainting with finite width neural networks and inpainting with biharmonic functions. In , we show that our method also outperforms the other methods in terms of structural similarity index measure (SSIM) and that the runtime is comparable (within 2 min on average) across all methods in this setting. The reconstructions across all images and methods are provided in .

Fig. 3.

Large hole inpainting using 1) the CNTK, 2) neural networks with sigmoid last layer and batch normalization layers that are trained with Adam, and 3) biharmonic functions. (A) Qualitative comparison of inpainting results across the three methods. Results for all images are provided in SI Appendix, Fig. S5. (B) Comparison of PSNR across three methods with the CNTK providing the highest average PSNR. Runtime and SSIM for the three methods are provided in SI Appendix, Fig. S4.

Application 2: Image Reconstruction with the CNTK.

We next analyze the performance of the CNTK on the image reconstruction tasks considered in (30). While the networks considered in refs. 24, 30 make use of skip connections for image reconstruction, we only consider architectures without skip connections for which we can derive the CNTK exactly (see , for details). We again compare the CNTK to neural networks of the same architecture and to biharmonic inpainting. For this comparison, we use networks with 128 filters per layer, as is done in refs. 24, 30. In , we show that our model performs comparably to inpainting with biharmonic functions and outperforms neural networks of the same architecture. In , we additionally show that our method performs comparably to biharmonic inpainting in terms of SSIM and that our method is up to 10 times faster than using small width neural networks on the same hardware. While our method performs comparably to inpainting with biharmonic functions in this application, our framework is more flexible since we can adjust architecture and feature prior, and it outperforms inpainting with biharmonic functions for the problem of large hole inpainting (see Application 1: Large Hole Inpainting with the CNTK). Since methods such as Adam with Langevin dynamics (24) have enabled performance boosts for neural networks (), an interesting direction for future work could be to incorporate such techniques for image completion applications using the CNTK.

Using Our Framework to Select Feature Prior and Architecture for Image Completion.

In the following, we demonstrate that our framework provides a theoretical underpinning for understanding how a given architecture and feature prior influence image completion. In particular, we use our framework to explain why the uniform random feature prior and architectures with downsampling and upsampling layers are effective for image completion while other feature priors such as the identity feature prior are ineffective for this application. The key observation enabling such interpretability is that for kernel methods, every prediction (a missing pixel value) is a linear combination of training examples (observed pixel values). Hence, for each imputed pixel, the CNTK can be used to provide a heat map describing which observed pixels were most heavily weighted in the linear combination. In order to generate such heat maps, we reshape the CNTK into a four-dimensional tensor. Namely, given a CNTK , we reshape K to a tensor where . To generate a heat map for a given a coordinate (i, j), we visualize the matrix . This visualization allows us to decipher how architecture and feature prior change the resulting imputation from a neural network.

The Uniform Random Feature Prior and Modern Architectures Are Effective for Image Completion.

In Fig. 4, we visualize the kernel values computed for a 128 × 128 image when varying the number of down and upsampling layers and as well as the feature prior Z. Namely, we consider the cases where Z is the identity, the mesh grid from ref. 30, or the uniform random tensor used in large hole inpainting experiments of ref. 30. A key observation is that the kernel values for the uniform random feature prior are highest around the coordinate of interest regardless of the amount of down and upsampling, which is in stark contrast to other feature priors. This implies that neighboring pixels are most heavily used when imputing using the uniform random feature prior (see , for additional visualizations). Moreover, when using the uniform random feature prior, the amount of downsampling and upsampling increase (by powers of 2) the size of the region considered for imputation (see the first row of Fig. 4). These heat maps identify the minimum amount of downsampling necessary for large hole inpainting: if there is an region of missing pixels (), we need least layers of downsampling to ensure that no pixel is filled in as an average of all other pixels. This result explains the observation from ref. 30, which showed that using neural networks with four or fewer downsampling and upsampling layers led to worse large hole inpainting performance on images with large missing regions.

Fig. 4.

We use the CNTK to understand the impact of architecture and input on image inpainting. (A) Heat map visualizations of the CNTK when varying the number of downsampling/upsampling layers and input. The visualization makes clear that the uniform random feature prior, unlike other feature priors, results in kernels that use the region surrounding a missing pixel value for imputation regardless of the number of downsampling layers. (B) The heat map visualizations of the CNTK make transparent which observed pixels are being used to inpaint a given missing pixel when using the identity feature prior. (C) A comparison between inpainting a 128 × 128 resolution image of a rabbit with a finite width neural network and with the CNTK when the feature prior is the identity. The CNTK is able to accurately predict the unexpected behavior of the neural network.

The Identity Feature Prior Is Ineffective for Image Completion.

The standard feature prior for matrix completion is given by choosing Z to be the identity matrix (3, 5, 48). As shown in Fig. 4, unlike the uniform random feature prior, the identity feature prior uses pixel observations from nonlocal regions for completion. Thus, we expect this feature prior to be ineffective for image completion tasks. Fig. 4 shows the result of using the CNTK for a network with six downsampling and upsampling layers and the identity feature prior to impute a rabbit image. The identity feature prior visually appears to translate observed pixels from a nonlocal region to perform imputation. The regions that are being translated are precisely those given by the corresponding heat maps; e.g., the upper right quadrant is imputed using the lower left quadrant in Fig. 4. We note that our framework accurately predicts the behavior of finite width neural networks used for image inpainting. In Fig. 4, we show the result of using a neural network with six downsampling and upsampling layers, sigmoid activation on the last layer, and identity feature prior. We observe that the neural network completes the image by translating observed pixels similarly to the imputation provided by the corresponding CNTK. This example highlights the power of using our framework for rapidly prototyping feature priors and architectures for image inpainting tasks.

Discussion

In this work, we presented a simple, fast, and flexible framework for matrix completion using the infinite width limit of neural networks, i.e., the NTK. Below, we highlight the aspects of our framework that enable such simplicity, speed, and flexibility. Our framework is conceptually simple since we are using kernels to learn a map from features of coordinates, (i, j), to entries in the target matrix, . Our framework is computationally simple since solving kernel regression involves solving a linear system of equations. Our framework is naturally fast when using the NTK of fully connected networks for matrix completion due to the simple closed form of the kernel (). We develop a memory and runtime efficient algorithm to compute and use the NTK of convolutional networks (the CNTK) for matrix completion (Matrix Completion with the Convolutional NTK). Our framework is easily adapted to various applications by the choice of the feature prior, thereby making our framework flexible. Moreover, we provided a principled approach for selecting the feature prior by establishing a connection with semisupervised learning ( and ) and providing a visualization of the effect of the feature prior (Image Inpainting and Reconstruction with the CNTK). The simplicity and speed of our framework is illustrated by the fact that many of our results (including inpainting high-resolution images) can be run on a CPU and even on a laptop (see Materials and Methods for a link to our code). We demonstrated that our framework is flexible by using it to achieve competitive results for virtual drug screening (Virtual Drug Screening with the NTK) and image inpainting/reconstruction (Image Inpainting and Reconstruction with the CNTK). We envision that our work provides a simple and accessible framework for producing strong baselines for several matrix completion applications. We conclude with a discussion of possible future extensions and applications.

Future Applications of Our Framework.

In this work, we demonstrated the flexibility of our framework by constructing feature priors for two different applications, namely, virtual drug screening and image completion. An interesting future direction is the extension of our framework to other modalities such as tensors, video, or audio data. For example, by using a feature prior that captures the structure of coordinates in three-dimensional images, we could apply our framework to impute missing regions in three-dimensional data.

Efficient Computation of the CNTK.

In classification and regression settings, a major hindrance for using the CNTK in practice is the computational complexity in computing the kernel for a large image dataset. In this work, we presented an expansion technique to efficiently compute and store the exact CNTK for inpainting high-resolution images, which was previously considered infeasible (24, 25). By understanding the properties of the CNTK that make it effective for image problems, we envision that similar techniques could be applied to produce efficient kernel machines for image classification.

Developing Techniques to Improve the Performance of the NTK.

While a large number of techniques such as skip connections, batch normalization, etc., have been developed to augment the performance of neural networks, such techniques have yet to be adapted to improve the performance of kernels. The simplicity and effectiveness of the NTK and CNTK based on simple architectures considered in this work motivates the development of techniques to further boost the performance of the NTK and kernel methods in general.

Materials and Methods

For solving kernel regression with the NTK, we use the direct linear system solver from ref. 49 when the number of equations is fewer than 30,000, and we use EigenPro (20, 22) otherwise. For training neural networks, we use the PyTorch library (50). All methods requiring a graphics processing unit (GPU) are run on a single NVIDIA Titan RTX GPU. Our experiments are run on a shared server with 4 Titan RTX GPUs, 128 GB CPU RAM, and 64 threads. For the virtual drug screening experiments, we use the subset of the CMAP dataset (38) provided in ref. 26. A detailed description of all the methods (including random seeds and hyperparameters for DNPP and FaLRTC) and evaluation metrics for the virtual drug screening experiments is provided in . A description of the t test used for determining the significance of our results for virtual drug screening is presented in . We provide code to replicate our results for the virtual drug screening experiments with the NTK, DNPP, FaLRTC, and mean over cell type at https://github.com/uhlerlab/ntk_matrix_completion. We use the codebase from ref. 26 for performing imputation with FaLRTC. For the image completion applications, we use the datasets from refs. 24, 30. The rabbit image used in Fig. 4 is from ref. 51 and is provided in our codebase (linked above). For the neural network and NTK methods used in our image inpainting and reconstruction experiments, we provide a description of all architectures and training hyperparameters in . We provide a library for computing and using the CNTK for image inpainting and reconstruction applications in the codebase linked above. Our library lets the user define a custom neural network (similarly to network definitions in PyTorch) and then provides a function to compute the CNTK from the given architecture. Our method for computing the CNTK runs entirely on the CPU, and we enable parallelization across CPU threads. Our library includes functions for computing the CNTK for networks with nearest neighbor and bilinear upsampling layers, which are not readily available in the Neural Tangents library (52). We additionally provide functions to solve kernel regression using the CNTK via a linear system solver or EigenPro. A full description of the library and an example of how to use our library for image inpainting is provided in Jupyter notebooks in our linked code. We additionally release several precomputed kernels that can be used for high-resolution inpainting and reconstruction.

14 in total

1. Tensor completion for estimating missing values in visual data.

Authors: Ji Liu; Przemyslaw Musialski; Peter Wonka; Jieping Ye
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2013-01 Impact factor: 6.226

2. scikit-image: image processing in Python.

Authors: Stéfan van der Walt; Johannes L Schönberger; Juan Nunez-Iglesias; François Boulogne; Joshua D Warner; Neil Yager; Emmanuelle Gouillart; Tony Yu
Journal: PeerJ Date: 2014-06-19 Impact factor: 2.984

3. Reconciling modern machine-learning practice and the classical bias-variance trade-off.

Authors: Mikhail Belkin; Daniel Hsu; Siyuan Ma; Soumik Mandal
Journal: Proc Natl Acad Sci U S A Date: 2019-07-24 Impact factor: 11.205

4. Depth Image Inpainting: Improving Low Rank Matrix Completion With Low Gradient Regularization.

Authors: Hongyang Xue; Shengming Zhang; Deng Cai
Journal: IEEE Trans Image Process Date: 2017-06-21 Impact factor: 10.856

5. Benign overfitting in linear regression.

Authors: Peter L Bartlett; Philip M Long; Gábor Lugosi; Alexander Tsigler
Journal: Proc Natl Acad Sci U S A Date: 2020-04-24 Impact factor: 11.205

6. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

Authors: Aravind Subramanian; Rajiv Narayan; Steven M Corsello; David D Peck; Ted E Natoli; Xiaodong Lu; Joshua Gould; John F Davis; Andrew A Tubelli; Jacob K Asiedu; David L Lahr; Jodi E Hirschman; Zihan Liu; Melanie Donahue; Bina Julian; Mariya Khan; David Wadden; Ian C Smith; Daniel Lam; Arthur Liberzon; Courtney Toder; Mukta Bagul; Marek Orzechowski; Oana M Enache; Federica Piccioni; Sarah A Johnson; Nicholas J Lyons; Alice H Berger; Alykhan F Shamji; Angela N Brooks; Anita Vrcic; Corey Flynn; Jacqueline Rosains; David Y Takeda; Roger Hu; Desiree Davison; Justin Lamb; Kristin Ardlie; Larson Hogstrom; Peyton Greenside; Nathanael S Gray; Paul A Clemons; Serena Silver; Xiaoyun Wu; Wen-Ning Zhao; Willis Read-Button; Xiaohua Wu; Stephen J Haggarty; Lucienne V Ronco; Jesse S Boehm; Stuart L Schreiber; John G Doench; Joshua A Bittker; David E Root; Bang Wong; Todd R Golub
Journal: Cell Date: 2017-11-30 Impact factor: 41.582

Review 7. Drug repurposing: progress, challenges and recommendations.

Authors: Sudeep Pushpakom; Francesco Iorio; Patrick A Eyers; K Jane Escott; Shirley Hopper; Andrew Wells; Andrew Doig; Tim Guilliams; Joanna Latimer; Christine McNamee; Alan Norris; Philippe Sanseau; David Cavalla; Munir Pirmohamed
Journal: Nat Rev Drug Discov Date: 2018-10-12 Impact factor: 84.694

8. Predicting drug-induced transcriptome responses of a wide range of human cell lines by a novel tensor-train decomposition algorithm.

Authors: Michio Iwata; Longhao Yuan; Qibin Zhao; Yasuo Tabei; Francois Berenger; Ryusuke Sawada; Sayaka Akiyoshi; Momoko Hamano; Yoshihiro Yamanishi
Journal: Bioinformatics Date: 2019-07-15 Impact factor: 6.937

9. Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing.

Authors: Anastasiya Belyaeva; Louis Cammarata; Adityanarayanan Radhakrishnan; Chandler Squires; Karren Dai Yang; G V Shivashankar; Caroline Uhler
Journal: Nat Commun Date: 2021-02-15 Impact factor: 14.919

10. Drug repurposing for Alzheimer's disease based on transcriptional profiling of human iPSC-derived cortical neurons.

Authors: Gareth Williams; Ariana Gatt; Earl Clarke; Jonathan Corcoran; Patrick Doherty; David Chambers; Clive Ballard
Journal: Transl Psychiatry Date: 2019-09-06 Impact factor: 6.222