Literature DB >> 36002539

A photosensor employing data-driven binning for ultrafast image recognition.

Lukas Mennel¹, Aday J Molina-Mendoza¹, Matthias Paur¹, Dmitry K Polyushkin¹, Dohyun Kwak¹, Miriam Giparakis², Maximilian Beiser², Aaron Maxwell Andrews², Thomas Mueller³.

Abstract

Pixel binning is a technique, widely used in optical image acquisition and spectroscopy, in which adjacent detector elements of an image sensor are combined into larger pixels. This reduces the amount of data to be processed as well as the impact of noise, but comes at the cost of a loss of information. Here, we push the concept of binning to its limit by combining a large fraction of the sensor elements into a single "superpixel" that extends over the whole face of the chip. For a given pattern recognition task, its optimal shape is determined from training data using a machine learning algorithm. We demonstrate the classification of optically projected images from the MNIST dataset on a nanosecond timescale, with enhanced dynamic range and without loss of classification accuracy. Our concept is not limited to imaging alone but can also be applied in optical spectroscopy or other sensing applications.

Entities: Chemical

Mesh：
Algorithms

Year: 2022 PMID： 36002539 PMCID： PMC9402579 DOI： 10.1038/s41598-022-18821-5

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.996

Introduction

With the recent advances in machine vision and its applications, there is a growing demand for sensor hardware that is faster, more energy-efficient, and more sensitive than frame-based cameras, such as charge-coupled devices (CCDs) or complementary metal–oxide–semiconductor (CMOS) imagers[1,2]. Beyond event-based cameras (silicon retinas)[3,4], which rely on conventional CMOS technology and have reached a high level of maturity, there is now increasing research on novel types of image acquisition and data pre-processing techniques[5-18], with many of them emulating certain neurobiological functions of the human visual system. One image pre-processing technique, that is being used since decades, is pixel binning. Binning is the process of combining the electric signals from adjacent detector elements into one larger pixel. This offers benefits such as (1) increased frame rate due to a -fold reduction in the amount of output data, and (2) an up to -fold improvement in signal-to-noise ratio (SNR) at low light levels or short exposure times[19]. The latter can be understood from the fact that dark noise is collected in normal mode for every detector element, but in binned mode only once per elements. Binning, however, comes at the expense of reduced spatial resolution or, in more general terms, loss of information. In pattern recognition applications this reduces the accuracy of the results even if the SNR is high. Here, we push the concept of binning to its limit by combining a large fraction of the sensor elements into a “superpixel” whose optimal shape is determined from training data using a machine learning algorithm. We demonstrate the classification of optically projected images on an ultrashort timescale, with enhanced dynamic range and without loss of classification accuracy.

Results and discussion

Pixel binning

In Fig. 1 we schematically depict different types of binning and its impact on the classification accuracy of an artificial neural network (ANN). Besides the aforementioned conventional approach (orange lines), we also illustrate our concept of data-driven binning (green line). There, a substantial fraction of pixels are combined into a “superpixel” that extends over the whole face of the chip, thus forming a large-area photodetector with a complex geometrical structure that is determined from training data. For multi-class classification with one-hot encoding, one such superpixel is required for each class. As for conventional binning, the system becomes more resilient towards noise and its dynamic range increases. However, for large light intensities there is no loss of classification accuracy and hence no compromise in performance, in contrast to the conventional case. These benefits come at the cost of less flexibility, as a custom configuration/design is required for each specific application.

Figure 1

ANN classification accuracy for different types of binning. Simulated light intensity-dependent classification accuracy (MNIST, digits ‘0’…’9’, additive Gaussian noise) of the ANN in Fig. 4a. Blue line: without binning (28 28 pixels). Orange lines: with binning (14 14 pixels; ) and (7 7 pixels; ). Green line: data-driven binning. The network has been retrained in each case. Conventional binning allows to extend the dynamic range towards lower light intensities, but comes at the expense of reduced accuracy at large illumination intensities. Data-driven binning does not suffer from this drawback.

Figure 4

ANN photosensor. (a) Sketch of the ANN with weight and bias constraints. (b) Confusion matrix for the ANN sensor. (c) Relative difference between the highest and all other output currents. The ANN exhibits a larger spread in output currents than the NB classifier. (d) Superpixel shapes for the ANN.

Photosensor implementation

Figure 2a shows a schematic of our photosensor, employing data-driven binning. A microscope photograph of the actual device implementation is shown in Fig. 2b. For details regarding the fabrication, we refer to the “Methods” section. The device consists of pixels, arranged in a two-dimensional array. Each pixel is divided into at most subpixels that are connected–binned–together to form the superpixels, whose output currents are measured. Each detector element is composed of a GaAs Schottky photodiode (Fig. 2c) that is operated under short-circuit conditions (Fig. 2d) and exhibits a photoresponsivity of 0.1 A/W, where is the photocurrent and the incident optical power. GaAs was chosen because of its short absorption and diffusion lengths, which both reduce undesired cross-talk between adjacent pixels; with some minor modifications the sensor can also be realized using Si instead of GaAs. The design parameters, that depend on the specific classification task and are determined from training data, are the geometrical fill factors for each of the subpixels, where denotes the subpixel area and is the total area of each pixel. From Fig. 2a, we find for the output currents , or with being a vector that represents the optical image projected onto the chip, the output current vector, and a fill factor matrix that depends on the specific application. The -th row of is a vector that represents the geometrical shape of the -th superpixel.

Figure 2

Photosensor implementation. (a) Schematic illustration of the photosensor. Each pixel is divided into subpixels, with fill factors , that are connected together to form superpixels whose output currents are measured. (b) Microscope image of a NB classifier for MNIST classification with pixels and 10 output channels. Scale bar, 500 μm. Inset: microscope image of the -th pixel showing subpixels. (c) Cross section of a GaAs Schottky photodiode with two metal layers for routing of the electrical signals. The band diagram is presented in Supplementary Figure S1. (d) Current–voltage characteristic for one of the detector elements under optical illumination. is the short-circuit photocurrent.

Naïve Bayes photosensor

Let us now discuss how to design the fill factor matrix for a specific image recognition problem. As an instructive example, we present the classification of handwritten digits (‘0’, ‘1’, …, ’9’) from the MNIST dataset[20] by evaluating the posterior (the probability of an image being a particular digit ) for all classes and selecting the most probable outcome. By applying Bayes' theorem and further assuming that the features (pixels) are conditionally independent, one can derive a predictor of the form , known as Naïve Bayes (NB) classifier[21,22]. We use a multinomial event model , where is the probability that the -th pixel for a given class exhibits a certain brightness and express the result in log-space to obtain a linear discriminant function with weights . The bias terms can be omitted (), as all classes are equiprobable. The similarity to Eq. (1) allows us to map the algorithm onto our device architecture: . To match the calculated -value range to the physical constraints of the hardware implementation, we normalize the weights according to In Fig. 3a we exemplify the working principle of the photosensor. A sample from the MNIST dataset is optically projected onto the chip using the measurement setup shown in Fig. 3b (see “Methods” section for experimental details). Each of the superpixels generates a photocurrent proportional to the inner product . If we visualize for each class (Fig. 3c), we obtain an intuitive result: The shape of each superpixel resembles that of the average-looking digit for the respective class. It is apparent that the superpixel with the largest spatial overlap with the image delivers the highest photocurrent.

Figure 3

Naïve Bayes photosensor. (a) Schematic illustration of the working principle. An image from the MNIST dataset is projected onto the chip and detected by each superpixel. The channel with the largest output current is selected. We perform this operation in the digital domain; in the analogue it could be realized by a winner-take-all circuit[23]. (b) Sketch of the experimental setup. (c) Superpixel shapes for the NB classifier as determined from the MNIST training dataset. (d) Calculated confusion matrix. (e) Measured photoresponsivity maps. (f). The experimental confusion matrix was determined by optical projection of 104 digits, one after the other, and comparison of the known/true class labels with the labels predicted by the sensor (channel with highest output current). Figure 3e shows experimental photocurrent maps for the device in Fig. 2b. Here, each pixel of the sensor is illuminated individually and the output currents are recorded. The currents are proportional to the designed fill factors in Fig. 3c (apart from device imperfections such as broken lithographic connections), confirming negligible cross-talk between neighbouring subpixels. To evaluate the performance, we projected all 104 digits from the MNIST test dataset and recorded the sensor’s predictions. The classification results are presented as a confusion matrix in Fig. 3f. The chip is able to classify digits with an accuracy that closely matches the theoretical result in Fig. 3d.

Artificial neural network photosensor

Beyond the instructive example of NB, the same device structure also allows the implementation of other, more accurate, classifiers. Specifically, we present the design and simulation results for a single-layer ANN[21] for the same MNIST classification task as discussed before. In Fig. 4a the architecture of the network is shown. It makes its predictions according to ANN photosensor. (a) Sketch of the ANN with weight and bias constraints. (b) Confusion matrix for the ANN sensor. (c) Relative difference between the highest and all other output currents. The ANN exhibits a larger spread in output currents than the NB classifier. (d) Superpixel shapes for the ANN. Note the similarity to Eq. (2), apart from a nonlinearity which can be readily implemented, either in the analogue or the digital domain, using external electronics. We choose a softmax activation function for . Again, due to the physical constraints of the sensor hardware, we train the network with bias using categorical cross-entropy loss. In order to obey Eq. (3), we further introduce a constraint that enforces a non-negative weight matrix by performing the following regularization after each training step: with denoting the Hadamard product and the Heaviside step function. This leads to a < 1% penalty in accuracy. The fill factor matrix , plotted in Fig. 4d, is directly related to by a geometrical scaling factor. Although the superpixel shapes do not clearly resemble the handwritten digits, the ANN shows better performance than the NB classifier, as demonstrated by the confusion matrix in Fig. 4b. In addition, the ANN shows a larger spread between the highest and all other output currents (Fig. 4c), which makes it more robust against noise (Supplementary Figure S2). A number of other machine learning algorithms can be described by an equation of the form (5) and can be implemented in a similar fashion. Also the realization of an all-analogue deep-learning network is feasible by feeding the sensor output into a memristor crossbar array[24,25].

Benefits of data-driven binning

In Fig. 5 we demonstrate the benefits of data-driven binning. It is evident that the readout of photodetector signals requires less time, resources, and energy than the readout of the whole image in a conventional image sensor. In fact, the photodiode array itself does not consume any energy at all; energy is only consumed by the electronic circuit that selects the highest photocurrent. Pattern recognition and classification occur in real-time and are only limited by the physics of the photocurrent generation and/or the electrical bandwidth of the data acquisition system. This is demonstrated in Fig. 5a, where we show the correct classification of an image on a nanosecond timescale, limited by the bandwidth of the used amplifier.

Figure 5

Evaluation of the device performance. (a) Demonstration of the high-speed capabilities of the sensor, measured with a 40-ns pulsed laser source. A ‘1’ is projected onto the device and the currents of all superpixels of the NB classifier are recorded with an oscilloscope. The channel corresponding to the correct digit produces the highest output current. (b) Experimental (symbols) and calculated (lines) light-intensity dependent accuracies for the NB classifier (blue) and a reference device without binning (red). Furthermore, it is known that binning can offer an -fold improvement in SNR[19]. In our case, a substantial fraction ( 0.6 for NB) of all sensor pixels are binned together (), with each pixel being split into elements. Together, this results in a -fold SNR gain over the unbinned case. To characterize the noise performance, we performed binary image classification (NB, MNIST, ‘0’ versus ‘1’) at different light intensities. For the reference measurements, we projected the images sequentially, pixel by pixel, onto a single GaAs Schottky photodetector (fabricated on the same wafer and with an area identical to that of two subpixels), recorded the photocurrents, and performed the classification task in a computer. In the simulations, Gaussian noise was added by drawing random samples from a normal distribution with zero mean value. The noise was added once per superpixel in the data-driven case, and per each pixel in the reference case. was used as a single fitting parameter to reproduce all experimental results. The results are presented in Fig. 5b. The classification accuracy is affected by the amplifier noise. For large intensities, the system operates with its designed accuracy. As the intensity is decreased, the classification accuracy drops and eventually, when the noise dominates over the signal, reaches the baseline of random guessing. Our device, employing data-driven binning, can perform this task at lower light intensities than the reference device without binning.

Conclusions

We conclude with proposed routes for future research. The main limitation of our current device implementation is its lack of reconfigurability. While this may be appropriate in some cases (e.g. a dedicated spectroscopic application), reconfigurability of the sensor would in general be preferred. This may, for example, be achieved by employing photodetectors with tunable responsivities, or a programmable network based on a nonvolatile memory material[26-28] to bin individual pixels together. Other schemes than standard one-hot encoding may allow to save hardware resources and extend the dynamic range further. Possible applications of our technology include industrial image recognition systems that require high-speed identification of simple objects or patterns, as well as optical spectroscopy, where the incoming light is dispersed into its different colors and the sensor is trained to recognize certain spectral features. In both cases classical machine learning algorithms will provide sufficient complexity and sophistication for the approximation of the dataset.

Methods

Device fabrication

Device fabrication started with the growth of a 400 nm thick -doped ( ) GaAs epilayer by molecular beam epitaxy on a highly -doped GaAs substrate. An ohmic contact on the -side was defined by evaporation of Ge/Au/Ni/Au (15 nm/30 nm/14 nm/300 nm) and sample heating at 440 °C for 30 s. On the -GaAs epilayer we deposited a 20 nm thick Al2O3 insulating layer by atomic layer deposition (ALD). We then defined a first metal layer (M1) by electron-beam lithography (EBL) and Ti/Au (3 nm/25 nm) evaporation. In the next step we deposited a 30 nm thick Al2O3 layer by ALD. We then defined an etch mask for the via holes, which connect metal layers M1 and M2, by EBL and etched the Al2O3 with 30% potassium hydroxide (KOH) aqueous solution. We then wrote an etch mask for the pixel windows via EBL and etched the aggregated 50 nm thick Al2O3 with a 30% KOH aqueous solution in two steps. Inside the pixel windows, we defined the subpixels with EBL by removing the naturally formed oxide on the GaAs substrate with a 37% hydrochloric acid (HCl) aqueous solution and evaporating 7 nm thick semitransparent Au. Finally, we defined the M2 metal layer with EBL and Ti/Au (5 nm/80 nm) evaporation. The continuity and solidity of the device was confirmed by scanning electron microscopy and electrical measurements.

Experimental setup

A schematic of the experimental setup is shown in Fig. 3b. A light-emitting diode (LED) source (625 nm wavelength) illuminates, through a linear polarizer, a spatial light modulator (SLM). The SLM is operated in intensity-modulation mode and changes the polarization of the reflected light according to the displayed image. The reflected light is then filtered using a second linear polarizer, and the image is projected onto the chip. The photocurrents generated by the sensor are probed with a needle array, selected by a Keithley switch matrix and measured with a Keithley source measurement unit. For time-resolved measurements a pulsed laser source (522 nm wavelength, 40 ns) is used. Here, the output signals are amplified with a high-bandwidth (20 MHz) transimpedance amplifier. The pulsed laser source is triggered with a signal generator and an oscilloscope is used to record the time trace. Supplementary Figures.

16 in total

1. An Artificial Flexible Visual Memory System Based on an UV-Motivated Memristor.

Authors: Shuai Chen; Zheng Lou; Di Chen; Guozhen Shen
Journal: Adv Mater Date: 2018-01-08 Impact factor: 30.849

2. An Oxide Schottky Junction Artificial Optoelectronic Synapse.

Authors: Shuang Gao; Gang Liu; Huali Yang; Chao Hu; Qilai Chen; Guodong Gong; Wuhong Xue; Xiaohui Yi; Jie Shang; Run-Wei Li
Journal: ACS Nano Date: 2019-02-12 Impact factor: 15.881

3. Memristive devices for computing.

Authors: J Joshua Yang; Dmitri B Strukov; Duncan R Stewart
Journal: Nat Nanotechnol Date: 2013-01 Impact factor: 39.213

4. Digital cameras with designs inspired by the arthropod eye.

Authors: Young Min Song; Yizhu Xie; Viktor Malyarchuk; Jianliang Xiao; Inhwa Jung; Ki-Joong Choi; Zhuangjian Liu; Hyunsung Park; Chaofeng Lu; Rak-Hwan Kim; Rui Li; Kenneth B Crozier; Yonggang Huang; John A Rogers
Journal: Nature Date: 2013-05-02 Impact factor: 49.962

5. Ultrafast machine vision with 2D material neural network image sensors.

Authors: Lukas Mennel; Joanna Symonowicz; Stefan Wachter; Dmitry K Polyushkin; Aday J Molina-Mendoza; Thomas Mueller
Journal: Nature Date: 2020-03-04 Impact factor: 49.962

6. A Ferroelectric/Electrochemical Modulated Organic Synapse for Ultraflexible, Artificial Visual-Perception System.

Authors: Hanlin Wang; Qiang Zhao; Zhenjie Ni; Qingyuan Li; Hongtao Liu; Yunchang Yang; Lifeng Wang; Yang Ran; Yunlong Guo; Wenping Hu; Yunqi Liu
Journal: Adv Mater Date: 2018-09-25 Impact factor: 30.849

7. Artificial optic-neural synapse for colored and color-mixed pattern recognition.

Authors: Seunghwan Seo; Seo-Hyeon Jo; Sungho Kim; Jaewoo Shim; Seyong Oh; Jeong-Hoon Kim; Keun Heo; Jae-Woong Choi; Changhwan Choi; Saeroonter Oh; Duygu Kuzum; H-S Philip Wong; Jin-Hong Park
Journal: Nat Commun Date: 2018-11-30 Impact factor: 14.919

8. A flexible ultrasensitive optoelectronic sensor array for neuromorphic vision systems.

Authors: Qian-Bing Zhu; Bo Li; Dan-Dan Yang; Chi Liu; Shun Feng; Mao-Lin Chen; Yun Sun; Ya-Nan Tian; Xin Su; Xiao-Mu Wang; Song Qiu; Qing-Wen Li; Xiao-Ming Li; Hai-Bo Zeng; Hui-Ming Cheng; Dong-Ming Sun
Journal: Nat Commun Date: 2021-03-19 Impact factor: 14.919

9. Sparse pixel image sensor.

Authors: Lukas Mennel; Dmitry K Polyushkin; Dohyun Kwak; Thomas Mueller
Journal: Sci Rep Date: 2022-04-05 Impact factor: 4.379