Literature DB >> 35061531

Harnessing optoelectronic noises in a photonic generative network.

Changming Wu¹, Xiaoxuan Yang², Heshan Yu³, Ruoming Peng¹, Ichiro Takeuchi³, Yiran Chen², Mo Li^1,4.

Abstract

Integrated optoelectronics is emerging as a promising platform of neural network accelerator, which affords efficient in-memory computing and high bandwidth interconnectivity. The inherent optoelectronic noises, however, make the photonic systems error-prone in practice. It is thus imperative to devise strategies to mitigate and, if possible, harness noises in photonic computing systems. Here, we demonstrate a photonic generative network as a part of a generative adversarial network (GAN). This network is implemented with a photonic core consisting of an array of programable phase-change memory cells to perform four-element vector-vector dot multiplication. The GAN can generate a handwritten number ("7") in experiments and full 10 digits in simulation. We realize an optical random number generator, apply noise-aware training by injecting additional noise, and demonstrate the network's resilience to hardware nonidealities. Our results suggest the resilience and potential of more complex photonic generative networks based on large-scale, realistic photonic hardware.

Entities: Chemical

Year: 2022 PMID： 35061531 PMCID： PMC8782447 DOI： 10.1126/sciadv.abm2956

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

The current rate of improvement in digital electronics’ energy efficiency (–) is lagging behind the fast-growing computational load (, ) spurred by the widespread implementation of large-scale artificial neural networks for machine learning and artificial intelligence (–). Because of its advantages in power efficiency, communication bandwidth, and parallelism (–), analog optical computing based on integrated optoelectronic processors (–) is once again brought into focus as a hardware accelerator for neural networks. Photonic neural networks reported to date (, , , , , ) are predominantly hybrid optoelectronic systems in which the photonic components are used for linear multiplication and interconnect, while nonlinear functions and feedback control are implemented electronically. Compared to electronic neural networks using digital processors, photonic neural networks have higher inaccuracy and error rates due to their analog nature and the abundance of optoelectronic noises in the hardware. The accumulation of computational errors in large-scale photonic neural networks could severely impair their performance (–), limiting the computation effectiveness and scalability. Although several offline noise-aware training schemes, including injecting noises to layer inputs (, ), synaptic weights (, ), and preactivations (, ), have been proposed to mitigate analog hardware nonidealities (, , , ), those schemes only address discriminative models. In another study, a diffractive optics–based network is trained with carefully drafted parametric randomness to be robust against optical nonidealities (, ). Noise in the analog hardware has also been used to facilitate various machine learning algorithms (–). In contrast to discriminative models, generative neural network models can automatically discover and learn regularities or patterns from the training data to generate plausible new instances (–). So far, a photonic generative network has not been reported, and the corresponding noise mitigation strategies have not been explored. Here, we demonstrate a generative network on the basis of a photonic computing core consisting of an array of programmable phase-change metasurface mode converters (PMMCs) (). This photonic generative network is combined with a discriminator to realize a generative adversarial network (GAN) that is trained to generate handwritten numbers. We show that the photonic GAN can harness and mitigate optoelectronic noises and errors in three ways. First, we use the amplified spontaneous emission (ASE) noise to realize an optical true random number generator (RNG) (, ), which is used as the input to the GAN. This optical RNG efficiently generates random numbers at high speed in multiple wavelength channels by slicing the ASE spectrum(–). Second, we analyze error sources originating from the components in the photonic GAN and propose noise-aware training approaches by augmenting noises during the training process, which improves the network’s performance and robustness. Last, we validate the training approaches through experiment and simulation and demonstrate that the photonic GAN can benefit from the inevitable random errors in practical implementation. Unexpectedly, the images generated by nonideal photonic hardware show even higher quality than those by ideal, errorless counterparts (i.e., software baseline). Our results demonstrate the feasibility and resilience of more complex photonic GANs using nonideal optoelectronic hardware. Because the proposed noise-aware training approaches are generic, they can be applied to various types of optoelectronic neuromorphic computing hardware.

RESULTS

Photonic generative networks in a GAN architecture

A GAN network consists of two subnetwork models (Fig. 1A): a generator and a discriminator (–). These two models compete against each other in a zero-sum game: The discriminator strives to distinguish the instances produced by the generator (labeled as the “fake” instances) from the real instances in the training dataset (labeled as the “real” instances); the generator aims to fool the discriminator by producing instances that imitate the real instances. The competition drives both networks to improve their capabilities until an equilibrium state is reached, i.e., when the fake instances are indistinguishable from the real instances by the discriminator, so the generator is deemed well trained to generate plausible new instances. In this work, we design a prototype photonic generator to produce images of the handwritten number “7” using a noise-aware offline training configuration: We first train the generator model on a computer () and implement it on the photonic platform (Fig. 1B). Here, we only focus on realizing the photonic generator because the photonic discriminator has been demonstrated previously by many groups, including us (, , , ). As shown in Fig. 1C, in each layer of the generator, the input data are encoded in the power of the optical signals through multiple wavelength channels, processed by the PMMC photonic tensor core (Fig. 1D), in which the kernel matrices are stored. The results are detected by the photodetector arrays. Electronic postprocessing is then performed to apply nonlinear functions. The results are reencoded into the optical signals and relayed to the next photonic layer. In such an optical network, various noises—including optical and electrical noises of the optical sources, modulators, and photodetectors—are accumulating through the processes of programming (i.e., writing) the kernel matrices, data encoding, and data transferring (i.e., reading) between the layers of the network.

Fig. 1.

Photonic GAN network with optoelectronic noises.

(A) A GAN architecture is composed of two subnetwork models: a generator and a discriminator. The generator competes with the discriminator during training and produces new instances after it is trained. (B) The offline noise-aware training and inference processes flow of the generator. The process of mapping the trained weight to the hardware during implementation inevitably introduces optoelectronic noise. (C) Decomposition of the generator into individual layers. In each layer, the input signals pass through the photonic tensor core and are converted to the electrical domain by photodetectors (PDs). After postprocessing, the data are converted back into the optical domain and transferred to the next layer. EOM, electro-optic modulator. (D) Optical microscopic image of the photonic tensor core consisting of four input channels. The optical RNG is input to the photonic tensor core through O/E and E/O conversion in our experiment. Potentially, the optical RNG can be directly sent into the tensor core using WDM schemes. DEMUX, demultiplexers. (E) The false-colored scanning electron microscopy (SEM) image of the photonic tensor core. The Si3N4 waveguide, the GST metasurface, and the Al2O3 protection layer are colored green, red, and blue, respectively. Scale bar, 10 μm. Inset: The zoomed-in SEM image of the phase-gradient metasurface on the waveguide. Scale bar, 2 μm.

Photonic GAN network with optoelectronic noises.

Optical RNG

One key component of the photonic generator is the optical RNG that produces the random input. To realize it, we use the ASE noise from the erbium-doped fiber amplifiers (EDFAs), which are ubiquitous in fiber-optic communication systems, to generate random optical signals at high rates in four parallel channels as shown schematically in Fig. 2A. Here, the ASE noise is first filtered with wavelength division multiplexing (WDM) demultiplexers and then detected with photodetectors. The generated baseband electrical currents due to beating between different frequency components are the so-called “ASE-ASE beat noises” (, ). The DC photocurrent is filtered by a DC block, passing only the stochastic photocurrent variances to a sampling oscilloscope to generate random numbers (see the Supplementary Materials for the theory of the optical RNG). Figure 2 (B and C) plots the statistical histogram and a representative trace of the random numbers (in voltage) generated in a single WDM channel, respectively. The probability density function is well approximated by a zero-mean Gaussian distribution with a standard deviation (SD) of 0.2 V [i.e., N(0, 0.2)]. We further calculate the correlation coefficient of an N = 5 × 104 number long sequence (Fig. 2D), which reaches the limit of (red line in Fig. 2D), proving the randomness of the number sequence. Because of the limited size of the photonic tensor core, we need to measure and record the random numbers from the RNG and repeatedly input them to the generator during the experiment (see Fig. 1D). In future full-scale systems, the filtered ASE noise can be directly used as random optical inputs to the GAN without electrical sampling (the dashed box in Fig. 1D) and detected after the first layer of the network is performed.

Fig. 2.

Optical RNG and kernel programming errors.

(A) Schematic of the optical RNG. The ASE noise is spectrally sliced into four wavelength channels using DEMUX and then detected with photodetectors. After a DC block, the random electrical signals are sampled by an oscilloscope. a.u., arbitrary units. (B and C) Statistical histograms (B) and a representative trace (C) of the generated random numbers. The generated random number follows the Gaussian distribution. (D) Correlation coefficient as a function of lag for the random number sequence. A random number sequence with length N = 5 × 104 has a correlation coefficient (blue dots) around the lower limit (red line). (E) Process of programming the mode contrast of a kernel element using optical pulses. The target Γ values are −0.7, 0, and 0.7, respectively. (F) Histogram of Γ value distribution when the kernels are repeatedly set to be −0.7, 0, and 0.7, respectively. The SD for each setting is 0.37, 0.67, and 0.68%, respectively. (G) Histograms of the error distribution in the experimental measurement (solid) and the simulation (hashed) when assuming the ΔΓ follow a Gaussian distribution with an SD of 5%. Inset: Measured MVM accuracy for 4900 MVM operations in the first layer of the network.

Optical RNG and kernel programming errors.

Photonic tensor core error analysis

The other key component of the photonic generator is the photonic tensor core, which optically performs matrix-vector multiplication (MVM). The inset in Fig. 1C shows the schematic of one PMMC kernel element of the core that computes multiply accumulate (MAC): x ➔ x ∙ w + b, the fundamental operation of MVM. The PMMC consists of an array of Ge2Sb2Te5 (GST) nanoantennas with tapering widths (see Fig. 1E for the scanning electron microscopy images), forming a phase-gradient metasurface patterned on a silicon nitride waveguide (). The input vector element x is encoded in the power of the input optical signal. The corresponding kernel element weight w is represented using the TE0/TE1 mode contrast Γ = βTE0 − βTE1 at multiple intermediate levels between [-1,1], where βTE0 (TE1) = PTE0 (TE1)/(PTE0 + PTE1) is the mode purity and PTE0 (PTE1) is the power of the TE0 (TE1) mode component in the waveguide. Thus, the MAC computation is simplified to an incoherent optical transmission measurement and can be performed over a broad bandwidth. Figure 2E shows the evolution of Γ during the programming process of using optical control pulses to set negative (−0.7), zero (0.0), and positive (0.7) values, respectively. We implement the network model on a 2 × 2 tensor core with four PMMCs (Fig. 1D). The kernel weight W value is mapped to the corresponding mode contrast as , where is the maximum absolute mode contrast and is the maximum absolute kernel weight of layer l. Given the limited number of PMMCs on a chip, we repeatedly reset the kernel elements on the same devices, which bottlenecks the computing speed. With a sufficiently large tensor core in a photonic crossbar array architecture (–), one could directly map the full kernel matrices to the hardware so the computing speed will be much accelerated. The analog nature of weight programming and data encoding and transferring in the photonic neural network limits the precisions of MVM calculations and makes the computation error-prone. The computation errors would accumulate through the layers of the network and impair the final results. Because in realistic experiments, the computation errors stem from various optoelectronic noises in the system, we use the terms of noise and error interchangeably. To quantify the noises and errors in our system, we repeatedly program different fixed Γ values and estimate the short-term inaccuracy by measuring the variation ΔΓ. Figure 2F shows that the SD of 15 programming operations is less than 0.7%, corresponds to 6 bits in resolution, which is one order of magnitude larger than the input encoding error (see the Supplementary Materials for more detailed error analysis). Thus, the short-term programming inaccuracy ΔΓ (write error), limited by the inaccuracy of the programming optical pulses, is one of the dominant error sources. Another error source is the long-term measurement fluctuations (read error), including the noise of photodetectors, the variation of the O/E and E/O conversions, and the thermo-optic fluctuation of the phase-change material (PCM). These errors collectively contribute to an effective error on the kernel element weight , where is the total write error. To estimate the computation error of the overall system, Fig. 2G compares the measured MVM error distributions with the simulation, which assumes a Gaussian distribution of error. The result estimates the overall error to be 5%, corresponding to more than 3 bits in resolution, which we subsequently use in the noise-aware training and simulation. Unlike the discriminative network, where the input regularities or patterns are well defined, the generator network takes random numbers as the input. It would be more susceptible to the effective weight setting noise , which could degrade the quality of the generated new instances (, ). To reveal the noise effect on the GAN, we emulate the noisy hardware on a GAN model that is trained using a noiseless offline training approach but add a random error ΔW [introduced by ΔΓ with a Gaussian distribution N(0, 0.05)] when using it to generate images. Figure 3A plots 49 images of 14 × 14 pixels generated from simulation using random inputs produced by the optical RNG. These images show the handwritten number 7 but with very noisy backgrounds, demonstrating that the noise-free training algorithm is impaired by the practical weight setting noise (see the Supplementary Materials for the detailed comparison between inference results using accurate and inaccurate kernels).

Fig. 3.

Generating handwritten numbers with GAN.

Generating handwritten numbers with GAN.

(A to C) Forty-nine images (size, 14 × 14 pixels) generated by (A) NF-GAN, (B) IC-GAN, and (C) WC-GAN under effective kernel weight setting error (introduced by 5% Gaussian random error ΔΓ) and using random inputs~ N(0, 0.2) produced by the optical RNG. (A) is generated by simulation, and (B) and (C) are from the experiments. (D) The FIDs of the generated images, assuming the network is trained using various approaches and is implemented either on the ideal (solid bars) or noisy hardware (hashed bars). The FIDs obtained from the experimental results are labeled as stars. (E) The difference of FID (ΔFID) in (D). The ΔFIDs from the experimentally generated images are denoted by the red lines. Therefore, it is necessary to consider hardware noise during training to realize a GAN that is resilient to realistic noises. Theoretically, it has been proven that adding noises to the training data of a neural network is equivalent to an extra regularization added to the error function (), which can significantly improve hardware noise tolerance in a discriminative neural network. Meanwhile, it was shown that introducing noise on kernel weights during training enhances the robustness against weight perturbations of multilayer perceptrons (), such that inference accuracy close to the software baseline could be achieved. However, previous demonstrations of noise-aware solutions are limited to discriminative networks. For GAN, theoretical, simulation, and experimental validations of effective noise-aware solutions are still lacking and require further investigation.

Noise-aware training of the photonic generative model

For our photonic GAN, we propose and experimentally validate two noise-aware training approaches, namely, the input-compensatory approach (IC-GAN) and the kernel weight-compensatory approach (WC-GAN), to improve the network’s tolerance to the effective weight setting noise ΔW. The IC-GAN approach inflates the SD of the random signal input from the experimental value of 0.2 to 0.5 V during training. The WC-GAN approach adds ΔΓ with 5% SD to the corresponding weight at each forward-propagation pass but performs noiseless gradient descent in the back-propagation pass (see Fig. 1B and the Supplementary Materials for the training procedure of these noise-aware training approaches). Figure 3 (B and C) shows the experimentally generated images of handwritten number 7 by the photonic GAN trained using both approaches. For a fair comparison, the random number inputs used for inferences are produced by the same optical RNG. Compared to the images generated by the noise free–trained GAN (NF-GAN) (Fig. 3A), the images generated using both noise-aware approaches display much clearer patterns with lower background noise, validating the noise tolerance of the IC-GAN and WC-GAN. Furthermore, we observe that the images generated by the WC-GAN (Fig. 3C) have richer handwritten-like features than those by the IC-GAN (Fig. 3B), with more diverse variations in styles. Therefore, we conclude that the WC-GAN is advantageous for practical implementation using nonideal analog hardware.

DISCUSSION

To quantitatively compare the GAN performance, we use the standard metric of Frechet inception distance (FID), which evaluates both the fidelity and diversity of the generated images by comparing the feature distribution in the generated images with images from the training dataset. The lower the FID score, the better performance of the GAN (). In Fig. 3D, the FIDs of the images generated by the NF-GAN (, , ), the IC-GAN, and the WC-GAN, respectively, are compared, assuming either ideal (FIDideal) or noisy (FIDnoisy) hardware (see the Supplementary Materials for detailed steps to calculate the FID). The FIDnoisy (hashed bars in Fig. 3D) is the lowest for the WC-GAN and the highest for the NF-GAN, consistent with the observation in Fig. 3 (A to C). The impact of hardware noise ΔFID = FIDnoisy − FIDideal is plotted in Fig. 3E. The noise-aware WC-GAN and IC-GAN show two notable benefits. First, the FIDideal (solid bars in Fig. 3D) for the WC-GAN is lower than the NF-GAN [e.g., the software baseline ()], indicating that introducing noises during training helps GAN to learn better. Such a gain is absent in discriminative networks, where the inference accuracies of the noise-aware trained model cannot exceed the software baseline (, , , ). Second, unexpectedly, the noise impact results (Fig. 3E) show that, unlike the NF-GAN, the WC-GAN and IC-GAN implemented on the photonic hardware with practical noise (hashed bars in Fig. 3D) perform even better in inference than the noiseless hardware (solid bars in Fig. 3D). In contrast, a discriminative network’s inference accuracy always decreases with more noisy hardware (, ). This unexpected gain in performance suggests photonic neural networks’ potential in generative models despite the inevitable optoelectronic noises and errors. Optical computation in this work is performed at a low speed of 4000 operations per second (4 KOPS), limited by the use of the low-speed variable optical attenuators (VOAs) to encode data and the small-scale 2 × 2 tensor core. However, the state-of-the-art integrated photonic transmitters and photodetectors can drive the system at many 10s of Gbits/s. The size of the photonic core can be further scaled up to a much larger array. Assuming a moderate data rate of 10 Gbits/s and four WDM channels, the computing density of a photonic tensor core can reach an upper-bound value of 25 TOPS/mm2 (tera operations per second per square millimeter), significantly higher than that of the state-of-the-art digital electronics. To predict whether the noise-aware approaches performance gain is scalable, in simulation, we train a larger-scaled GAN to generate images of all 10 number digits using ideal or noise-aware approaches under various levels of writing errors. Figure 4A shows the FID score of the results as a function of ΔΓ. Here, the curvature regularization approach (CR-GAN), which evolves from the WC-GAN, is used to improve the GAN robustness further (see the Supplementary Materials for more details about the CR-GAN). The comparison shows that the CR-GAN performs better than the NF-GAN at every error level. Note that, under our present realistic noise level of 5% (Fig. 3G), the FID of CR-GAN is still below the software baseline, whereas the NF-GAN’s FID is higher than the baseline. For both approaches, with the increasing noise level, the FID first drops until reaching a minimum at ~2.5% noise and then increases. To explain this, we further examine the images generated by CR-GAN at three noise levels: 0, 5, and 10% in Fig. 4 (B to D). The comparison shows that the increasing hardware noise in GAN would improve the diversity [evaluated by the SD of the percentage of each number classes in the generated images (); see the Supplementary Materials for more details] but, at the same time, reduce the fidelity of the generated images (). The trade-off results in a minimal FID at ~2.5% noise, as shown in Fig. 4A. Throughout the full range of noise levels, the noise-aware approach consistently improves the GAN over the noiseless approach.

Fig. 4.

Scalability of noise-aware training.

Scalability of noise-aware training.

(A) The FID of the generated images by the NF-GAN and the CR-GAN, respectively, under various effective mode contrast setting noise with SD ranging from 0 to 10%. The shaded region indicates the range of FID over five individual tests. The FID is lower for CR-GAN at every noise level. At the measured noise level of 5% (black dashed line), the FID for CR-GAN is below the software baseline (solid green line), while the FID for the NF-GAN is above it. (B to D) 50 images (size, 14 × 14) generated by CR-GAN assuming effective mode contrast setting noise of (B) 0%, (C) 5%, and (D) 10%. In conclusion, we demonstrate a photonic generative network on the basis of phase-change photonics, which is used to form a GAN network and harnesses the intrinsic noise sources in the photonic system. Unlike the previously demonstrated discriminative networks that suffer from the hardware noise, our experimental and simulation results show that the photonic generative network not only can tolerate but also can benefit from a certain level of hardware noise after training by noise-aware training approaches. Our finding expands the current implementation of photonic neural networks to generative models (), in which the inevitable and ubiquitous optoelectronic noises and errors can be mitigated and even leveraged in intelligent ways. We emphasize that the proposed noise-aware training approaches are generic and thus applicable to various types of optoelectronic neuromorphic computing hardware. The improved noise resilience of the model also implies their scalability in large-scale photonic neural networks with tightly cointegrated electronics and photonics.

MATERIALS AND METHODS

PMMC design and fabrication

The PMMC consists of a phase-gradient metasurface made of GST thin film on silicon nitride waveguides. The metasurface is designed to convert the incident TE0 mode into the TE1 mode when GST is in the crystalline phase while maintaining the TE0 mode when GST is in the amorphous phase. The PMMC is fabricated by depositing a 30-nm-thick GST film using a sputtering tool on an oxidized silicon substrate with 330-nm-thick silicon nitride film. The GST film is then patterned into the metasurface using standard electron beam lithography and inductively coupled plasma etching processes. A 218-nm-thick Al2O3 layer is deposited with atomic layer deposition to cap the GST conformally.

Measurement setup

The measurement set up to operate the photonic tensor core is shown in fig. S2. The input optical signals are carried by four different wavelengths using four tunable continuous wave lasers. The signal amplitudes are controlled by VOAs with a 1-kHz operation speed. An additional control laser coupled with a 1 × 4 optical switch is used to optically program the kernel weight into each GST PMMC. The control pulses are generated with a 12-GHz electro-optical modulator and amplified by a low noise EDFA. The energy of each control pulse is further tuned using another VOA. The input signals and the control pulses are coupled into the photonic device via integrated grating couplers with a coupling efficiency of ~20%. The input signals propagate forward through each input channel, while the control pulses propagate in the opposite direction through the TE1 detection waveguides. The optical power in TE0 mode is combined on-chip using integrated Y junctions and detected. The optical power in the TE1 mode is collected and combined off-chip. The mode power contrast is measured to give the MVM results.

24 in total

1. Fast physical random number generator using amplified spontaneous emission.

Authors: Caitlin R S Williams; Julia C Salevan; Xiaowen Li; Rajarshi Roy; Thomas E Murphy
Journal: Opt Express Date: 2010-11-08 Impact factor: 3.894

2. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

Harnessing optoelectronic noises in a photonic generative network.

INTRODUCTION

RESULTS

Photonic generative networks in a GAN architecture

Photonic GAN network with optoelectronic noises.

Optical RNG

Optical RNG and kernel programming errors.

Photonic tensor core error analysis

Generating handwritten numbers with GAN.

Noise-aware training of the photonic generative model

DISCUSSION

Scalability of noise-aware training.

MATERIALS AND METHODS

PMMC design and fabrication

Measurement setup

1. Fast physical random number generator using amplified spontaneous emission.

2. Mastering the game of Go with deep neural networks and tree search.

3. The chips are down for Moore's law.

Review 4. Inference in artificial intelligence with deep optics and photonics.

5. All-optical machine learning using diffractive deep neural networks.

6. How to stop data centres from gobbling up the world's electricity.

7. All-optical spiking neurosynaptic networks with self-learning capabilities.

8. Neuromorphic computing with multi-memristive synapses.

9. Electronic-photonic arithmetic logic unit for high-speed computing.

10. Freely scalable and reconfigurable optical hardware for deep learning.