| Literature DB >> 29921923 |
Can Li1, Daniel Belkin1,2, Yunning Li1, Peng Yan1,3, Miao Hu4,5, Ning Ge6, Hao Jiang1, Eric Montgomery4, Peng Lin1, Zhongrui Wang1, Wenhao Song1, John Paul Strachan4, Mark Barnell7, Qing Wu7, R Stanley Williams4, J Joshua Yang8, Qiangfei Xia9.
Abstract
Memristors with tunable resistance states are emerging building blocks of artificial neural networks. However, in situ learning on a large-scale multiple-layer memristor network has yet to be demonstrated because of challenges in device property engineering and circuit integration. Here we monolithically integrate hafnium oxide-based memristors with a foundry-made transistor array into a multiple-layer neural network. We experimentally demonstrate in situ learning capability and achieve competitive classification accuracy on a standard machine learning dataset, which further confirms that the training algorithm allows the network to adapt to hardware imperfections. Our simulation using the experimental parameters suggests that a larger network would further increase the classification accuracy. The memristor neural network is a promising hardware platform for artificial intelligence with high speed-energy efficiency.Entities:
Year: 2018 PMID: 29921923 PMCID: PMC6008303 DOI: 10.1038/s41467-018-04484-2
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Memristive platform for in situ learning. a An optical image of a wafer with transistor arrays. b Close-up of chip image showing arrays of various sizes. c Microscope image showing the 1T1R (one transistor one memristor) structure of the cell. Scale bar, 10 µm. d Cross-sectional scanning electron microscopic image of an individual 1T1R cell, which is cut in a focused ion beam microscope from the dashed line in c. Scale bar, 2 µm. e Cross-sectional transmission electron microscopic image of the integrated Ta/HfO2/Pt memristor. Scale bar, 2 nm. f All responsive devices over 20 potentiation/depression epochs of 200 pulses each. g Evolution of conductance during 20 cycles of full potentiation and depression for a single cell with 200 pulses per cycle, showing low cycle-to-cycle variability. More results are shown in Supplementary Fig. 1. h Evolution of conductance over one 200-pulse cycle of full potentiation and depression for all responsive devices in the array, with median conductance indicated by the yellow line. i Conductance of a 128 × 64 array after single-pulse conductance writing of the discrete Fourier transform matrix. Several stuck devices are visible (in yellow)
Fig. 2In situ training algorithm. a Schematic diagram of a two-layer neural network. Each neuron computes a weighted sum of its inputs and applies a nonlinear activation function. b The implementation of the network with a set of memristor crossbars. Each synaptic weight (arrows in a) corresponds to the conductance difference between two memristors (as illustrated by the orange columns). Each crossbar computes weighted sums of its input voltages. Between the crossbars is a layer of circuits that read the current from each wire, convert it to a voltage, and apply the activation function. The activation function was implemented in software in this work. c Flow chart of the in situ training. Steps in green boxes were implemented in hardware in this work, while those in yellow boxes were computationally expensive steps that can be accomplished with circuits integrated onto the chip in the future. The algorithm is described in detail in Methods
Fig. 3In situ online training and inference experiments on Modified National Institute of Standards and Technology (MNIST) handwritten digit recognition. a Typical handwritten digits from the MNIST database. b Photo of the integrated 128 × 64 array during measurement. The array was partitioned into two parts for the first and second layers, respectively. In all, 54 hidden neurons were used, so the first layer weight matrix is 64 × 54 (implemented using 6912 memristors) and the second layer matrix is 54 × 10 (implemented using 1080 memristors). The blue and green false-colored areas are the positive and negative parts of the differential pairs. c Minibatch accuracy increases over the course of training. Experimental data followed the defect-free simulation closely, with a consistent 2–4% gap. d The conductance-gate voltage relation extracted from data collected during training. The conductance was read using the scheme described in the Methods. The conductance includes the effects of sneak-paths and wire resistance, which makes the measured values smaller and the variance larger than those in Fig. 1b, c. The dashed line indicates the mean conductance, while the error bars show a 95% confidence interval for the measured conductance. The real-time online training accuracy with the readout weight values is shown in an animation in Supplementary Movie 1. e–g Typical correctly classified digit “9” and h–j misclassified digit “8” after the in situ training. e, h Images of the actual digits from the MNIST test set used as the input to the network. f, i The raw current measured from the output layer neurons. The neuron representing the digit “9” has the highest output current, indicating a correct classification. g, j The corresponding Bayesian probability of each digit, as calculated by a softmax function. More inference samples are shown in Supplementary Fig. 7 and Supplementary Movies 2 and 3
Fig. 4Analysis based on experimental-calibrated simulation. a The experimental classification error (accuracy shown in Supplementary Fig. 10) matches the simulated accuracy. The simulation considers experiment parameters, including 11% devices stuck at 10 μS, 2% conductance update variation, limited conductance dynamic range, etc. The simulation on defect-free assumption shows an accuracy approaching that from TensorFlow. Each data point is the classification error rate on the complete testing set (10 000 images) after 500 images (simulation or TensorFlow) or 5000 images (experiment). b The impact of non-responsive devices on the inference accuracy with in situ and ex situ training approach. The non-responsive device was stuck in a very low-conductance state (10 µS), which is the typical defect device value observed in the experiment. The result shows that the in situ training process adapts to the defects, providing a much higher defect tolerance compared with pre-loading ex situ training weights into the network. With 50% stuck OFF devices, the network can still achieve over 60% accuracy. The error bar shows the s.d. over 10 simulations. c The multilayer network also helps with the defect tolerance. If one device is stuck, the associated hidden neuron will adjust the connections accordingly. The error bar shows the s.d. over 10 simulations. d The simulation of a larger network constructed on a larger memristor crossbar (1024 × 512) with experimental parameters (e.g., 11% defect rate) could achieve accuracy above 97%, which suggests a large memristor network could narrow the accuracy performance gap from the conventional CMOS hardware. The network architecture is shown in Supplementary Fig. 14