Literature DB >> 35379848

Programmable photonic neural networks combining WDM with coherent linear optics.

Angelina Totovic¹, George Giamougiannis², Apostolos Tsakyridis², David Lazovsky³, Nikos Pleros².

Abstract

Neuromorphic photonics has relied so far either solely on coherent or Wavelength-Division-Multiplexing (WDM) designs for enabling dot-product or vector-by-matrix multiplication, which has led to an impressive variety of architectures. Here, we go a step further and employ WDM for enriching the layout with parallelization capabilities across fan-in and/or weighting stages instead of serving the computational purpose and present, for the first time, a neuron architecture that combines coherent optics with WDM towards a multifunctional programmable neural network platform. Our reconfigurable platform accommodates four different operational modes over the same photonic hardware, supporting multi-layer, convolutional, fully-connected and power-saving layers. We validate mathematically the successful performance along all four operational modes, taking into account crosstalk, channel spacing and spectral dependence of the critical optical elements, concluding to a reliable operation with MAC relative error [Formula: see text].

Entities: Chemical

Year: 2022 PMID： 35379848 PMCID： PMC8980092 DOI： 10.1038/s41598-022-09370-y

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

The explosive growth of Artificial Intelligence (AI) and Deep Learning (DL) together with maturing photonic integration have created a new window of opportunity for use of optics in computational tasks[1-5]. The use of photons and relevant optical technologies in Neural Network (NN) hardware is predicted to offer a significant boost in Multiply-Accumulate (MAC) operations per second compared to the respective NN electronic platforms, with computational energy and area efficiency being estimated to reach < fJ/MAC and > TMAC/s/mm, respectively[6,7]. The pathway towards realizing this NN hardware paradigm-shift aims to exploit the high line-rates supported by integrated photonic technologies together with the small-size and low-power weighting function that can be offered at chip-scale[4,8]. So far, the vast majority of photonic devices utilized for weighting purposes has emphasized on slowly reconfigurable elements, like Thermo-Optic (T/O) phase shifters[9,10] and Phase-Change Material (PCM)-based non-volatile memory structures[4,8], implying that inference applications are currently considered as the main target within the area of neuromorphic photonics[3]. Inference engines indeed require a rather static neuron architecture and a layer connectivity graph that usually gets defined for optimally performing a certain AI task. Object tracking and image classification, for example, are typically performed via a number of convolutional layers followed by one or more Fully Connected (FC) layers, while autoencoders require cascaded stages of FC layers[11,12]. Although convolutional and FC layers comprise critical architectural elements in almost all inference platforms, a large set of parameters—such as the number of layers and/or neurons per layer and the connectivity graph—can vary significantly depending on the targeted DL architecture and application. Electronic implementations may conclude to Application-Specific Integrated Circuits (ASICs) customized for a specific inference task, but the use of GPUs, TPUs or even FPGAs becomes unavoidable when reprogrammability and reconfigurability are required in order to utilize the same hardware for multiple applications[13]. Transferring the reconfiguration capability to Photonic (P)-NN implementations requires a platform that can flexibly support different functional layouts over the same neural hardware. Programmability in photonics has made significant progress over the last years[14-16] and programmable Photonic Integrated Circuits (PICs) have been shown to offer important advantages towards releasing cost-efficient, flexible and multi-functional photonic platforms that can closely follow the concept of electronic FPGAs[17]. In this effort, it has also been highlighted that just the use of slowly reconfigurable Mach-Zehnder Interferometric (MZI) switches within an appropriate architectural scheme can yield a large set of circuit connectivities and functionality options[14,15]. However, the idiosyncrasy of NN architectures has to proceed along alternative functionalities that are currently still not offered by programmable photonic implementations. Although weight value reconfiguration can be indeed offered by state-of-the-art photonic weighting technology[4,8-10] and a shift in perspective towards programmable activation functions has also started to emerge[16,18,19], neuromorphic photonic architectures demonstrated so far are not supporting any reconfiguration mechanism for their linear neuron stages. PNNs have so far progressed along two main architectural categories for realizing linear neural layers, where Wavelength-Division-Multiplexed (WDM) and coherent platforms seem to follow discrete and parallel roadmaps: (i) incoherent or WDM-based layouts, where a discrete wavelength is used for each axon within the same neuron[3,4,20], and (ii) coherent interferometric schemes, where a single wavelength is utilized across the entire neuron, exploiting interference between coherent electrical fields for weighted sum operations[9,10]. Here, we present a novel architecture that can efficiently combine WDM and coherent photonics towards supporting Programmable PNNs (PPNNs) with four different linear neural layer operational modes. Starting from our recently proposed dual-IQ coherent linear neuron architecture[21], that has been recently demonstrated also as a PIC with the ground breaking compute-rates per axon[22,23], we extend single neuron architecture by employing multiple wavelength channels and respective WDM De/Multiplexing (DE/MUX) structures towards creating multi- and single-element fan-in (input) and weight stages per every axon. Programmability is then enforced through MZI switches that can flexibly define the connectivity between fan-in and weighting stages, allowing in this way for software-defined neural layer topologies. We formulate the mathematical framework for this programmable neuromorphic architecture and proceed with an in-depth study of the anticipated performance impairments originating from the use of multiple wavelengths within the same interferometric arrangement. We conclude to a simple mechanism for counteracting wavelength-dependent behaviour of modulators and phase shifters at the fan-in and weighting stage, respectively, showing that our programmable layout performs equally well for any number of employed optical channels in any of the 4 distinct modes of operation, with all supported neurons always offering a relative error lower than as long as the inter-channel crosstalk is kept at typical values of less than .

PPNN architecture and operating principle

In our recent study[21] we have demonstrated how coherent linear neurons, offering dot-product functionality, can be constructed of IQ-modulator blocks, allowing for the sign information (encoded into the signal’s phase) to be preserved by introducing the biasing signal, , making the neuron compatible with all-optical nonlinear activation functions, , tailored either for electric field, or for optical power, without suffering information loss. Having the wavelength domain unexploited, we advance our original neuron architecture in order to accommodate multiple channels and achieve parallelization as shown in Fig. 1.

Figure 1

(a) Schematic representation of PPNN showing M laser diodes (LDs), a MUX, a 3dB X-splitter followed by a bias branch () and a reconfigurable OLAU encompassing 1-to-N splitting stage, input () and weight () modulator banks and an N-to-1 combiner stage, the output of which is brought to interfere with the bias signal within 3dB X-coupler and sent to the DEMUX. Closer look into (b) 1-to-N splitting and (d) its -rotated N-to-1 coupling stage. Zoom-in into the (c) bias branch wavelength selective weights and phase modulators and (e) an axon of the OLAU consisting of switches for signal routing and modulators for inputs () and weights (). As Fig. 1a reveals, the backbone of our neural layer remains similar as in[21] with the main differences being: (i) a single Continuous Wave (CW) input optical signal is now replaced by M multiplexed CW signals, each centered at and supporting one independent virtual neuron, and (ii) input and weight modulators are now replaced by more elaborate modulator banks given in Fig. 1c, e, delimited by software-controllable switches in the case of latter. The input, multichannel signal is first split by a 3dB X-coupler to the portion directed to the bias branch and the remaining one entering the Optical Linear Algebraic Unit (OLAU). Within the OLAU, the signal gets further split equally in terms of power by a 1-to-N splitter, an example of which is given in Fig. 1b, and, after being appropriately modulated by inputs and pondered by weights , gets sent to the N-to-1 combiner, shown in Fig. 1d. At this stage, the output signal interferes with the bias within a 3dB X-coupler and is forwarded to the DEMUX to generate the outputs . Finally, each channel m will have its own algebraic addition of the weighted inputs with a designated bias, concluding to a total of M independent N-fan-in neurons. PPNN modes of operation and the corresponding switch states. Depending on the configuration of switches, an overview of which is given in Table 1, channels within a single axon from Fig. 1e, can be controlled either individually or by a common modulator, allowing the network to operate as:

Table 1

PPNN modes of operation and the corresponding switch states.

	Mode	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{\mathrm {X},n}$$\end{document}SX,n	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{\mathrm {W},n}$$\end{document}SW,n	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{\mathrm {O},n}$$\end{document}SO,n
#1	Multi-neuron	1 (up)	1 (bar)	1 (up)
#2	Convolutional	1 (up)	0 (cross)	0 (down)
#3	Fully-connected	0 (down)	0 (cross)	1 (up)
#4	Power-saving	0 (down)	1 (bar)	0 (down)

multi-neuron (M independent N-to-1 neurons), allowing for an arbitrary logical interconnection graph, supporting even a multi-layer operation by designating different neurons to different layers of the NN; convolutional (M independent N-element inputs with a single kernel of size N), where all different input vectors pass through the same set of weights, Fig. 2c, achieving simultaneous M-fold usage of the same kernel, speeding up convolution operation from Fig. 2b;

Figure 2

(a) Simplified CNN inspired by LeNet-5, employed in image classification. (b) Schematic of a convolutional layer with color coded input/output pairs and (c) its implementation over PPNN in mode #2 where each channel m corresponds to one input/output pair.

fully-connected (FC) (single N-element input over M neurons), where a single input passes through all M available weight sets, each of size N, allowing for full connectivity between all inputs and outputs, Fig. 3a, c;

Figure 3

(b) Schematic of an autoencoder and (a), (c) its two FC layers implemented over PPNN in mode #3 where channels correspond to unique weight vectors and outputs . Based on the connectivity graph from (b), the implementation assumes the use of (a) 4 branches and 2 wavelengths in the first layer and (c) 2 branches and 4 wavelengths in the second one. If the number of available branches N is greater than needed, all the excess branches will have the inputs set to 0 (observe the Nth branch in (a), (c), where the condition and is imposed, respectively). Index n in the implementation (a) is set to to denote that the lit nth branch carries a non-zero input. Similarly, if the number of available wavelengths M exceeds the number of required ones, the excess LDs are powered off.

power-saving (single N-to-1 neuron), which, even though is not a primarily targeted mode of operation due to large footprint penalty and low aggregated throughput, still allows for resource conservation by powering-off the excess channels and can be useful if NN is occasionally required to operate in sequential manner (one neuron at a time). (a) Simplified CNN inspired by LeNet-5, employed in image classification. (b) Schematic of a convolutional layer with color coded input/output pairs and (c) its implementation over PPNN in mode #2 where each channel m corresponds to one input/output pair. (b) Schematic of an autoencoder and (a), (c) its two FC layers implemented over PPNN in mode #3 where channels correspond to unique weight vectors and outputs . Based on the connectivity graph from (b), the implementation assumes the use of (a) 4 branches and 2 wavelengths in the first layer and (c) 2 branches and 4 wavelengths in the second one. If the number of available branches N is greater than needed, all the excess branches will have the inputs set to 0 (observe the Nth branch in (a), (c), where the condition and is imposed, respectively). Index n in the implementation (a) is set to to denote that the lit nth branch carries a non-zero input. Similarly, if the number of available wavelengths M exceeds the number of required ones, the excess LDs are powered off. A detailed mapping between the architecture from Fig. 1 and the enlisted modes of operation can be found in Section 1, Supplementary Document, with some examples also given in Figs. 2 and 3. Convolutional and FC modes of operation are particularly important due to their ubiquitous presence in deep NNs, especially in the widely-used Convolutional NNs (CNNs), Fig. 2a[11]. In both convolutional and pooling layers, a unique kernel (filtering or weighting window) is applied to the inputs in a scanning manner with a certain stride, yielding a single output value, as depicted schematically in Fig. 2b and implemented over PPNN in Fig. 2c. On the other hand, FC layer, shown implemented over PPNN in Fig. 3a, c, has a single set of inputs passing through multiple sets of weights to produce the outputs and it is the main building block of autoencoders, Fig. 3b, along with being necessary in CNNs, Fig. 2a. Both of these operations are time and energy consuming if approached to in a sequential manner, implying that they greatly benefit from parallelization. Input and weight matrices of the nth axon. Although the switches of different axons can be controlled independently, the resulting mixed type NN layer has no application foreseen at the moment. Therefore, we assume that switches in all branches are synchronized in the following manner , and . The matrices encapsulating the values of the inputs, , and weights, , for different modes of operation are summarized in Table 2 where stands for identity matrix. Inputs require no more than one amplitude modulator per value, since they are defined on the positive domain , whereas, in case of weights, which can be both positive and negative, , two modulators are required, one for the amplitude, which will be proportional to the weight magnitude, , and the remaining for the phase, which will be carrying the sign of the weight, .

Table 2

Input and weight matrices of the nth axon.

Mode	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_n$$\end{document}Xn	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_n$$\end{document}Wn
#1	diag\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[x_{n,1}, \ldots , x_{n,M}]$$\end{document}[xn,1,…,xn,M]	diag\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[w_{n,1}, \ldots , w_{n,M}]$$\end{document}[wn,1,…,wn,M]
#2	diag\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[x_{n,1}, \ldots , x_{n,M}]$$\end{document}[xn,1,…,xn,M]	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{n,0} I_M$$\end{document}wn,0IM
#3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{n,0} I_M$$\end{document}xn,0IM	diag\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[w_{n,1}, \ldots , w_{n,M}]$$\end{document}[wn,1,…,wn,M]
#4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{n,0} I_M$$\end{document}xn,0IM	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{n,0} I_M$$\end{document}wn,0IM

The bias branch, given in Fig. 1c differs from the axon branch, Fig. 1e, in two aspects: (i) it has no input sequence modulator(s); (ii) it has only one possible route the signal can take, with a separate control of each channels’ phase and amplitude. The latter comes as a counteraction measure to the anticipated wavelength-dependent variation of the input and weight magnitudes when a single phase- and amplitude-modulator is used in each axon of the OLAU. Moreover, it allows for compensating potentially different transmission coefficients and phase offsets that will be accumulated by different channels within OLAU, therefore meeting the conditions for constructive interference at the last 3dB coupler of the PNN. Bias matrix remains the same for all modes of operation and reads , where . Let us assume that the optical carrier consists of M channels , and is represented via an column-vector of electric fields , which are normalized such that their magnitude squared yields optical power, i.e., . Following the architecture given in Fig. 1 and the detailed derivation presented in Section 2 of Supplementary Document, we find the column-vector of electric fields at the output of PPNN aswhere, in order to ensure constructive interference at the last 3dB X-coupler of Fig. 1a, phase matching between the bias and the signal coming from OLAU is performed. The former is done through , which denotes the bias branch channel-wise transfer matrix accounting for phase alignment, with its mth element being . Disregarding accumulated phase shift and losses that are identical for all channels, the transfer matrix of the PPNN, , can be written as The mth element of matrix, , given by Eq. (2b) for multi-neuron mode of operation (#1), reveals the underlying principle of operation of our PPNN, demonstrating how normalized dot-product between the N-element vectors represented across axons, and , can be achieved at the mth channel neuron output with bias superimposed to it. The reconfigurability of PPNN is concealed in Eq. (2a), where the choice of matrices and is governed by the mode of operation according to the Table 2, leading to alternative functionalities. In convolutional mode (#2), a single kernel as in Fig. 2b, i.e., a single set of weights across different channels , calls for common weight modulator per axon since , whereas the input vectors remain different across the channels, , concluding to M-fold parallelization, and consequently acceleration, of convolution operation. On the other hand, in FC mode (#3), a single input vector , calling for one input modulator per nth axon, is passed through multiple, channel selective, weights, , yielding full connectivity between all N inputs and all M outputs , as depicted in Fig. 3b. Finally, in power-saving mode (#4), unique weight and input vectors, and , allow for only one channel to be used and the remaining ones to be powered off, offering the same functionality as our dual-IQ dot-product engine from[21] without additional penalties in power consumption or throughput per channel, albeit, suffering from footprint penalty imposed by PPNN programmability and multi-channel design. This mode of operation is certainly not the preferred one, but, in case reconfigurability is a necessary feature of the system, such as in prototyping stages, one can save power when faced with sequential operations, typically embracing the parallel ones, in the form of setup and analysis procedures. As noted earlier, Eq. (2b) is given for mode #1, but can be updated to any other by replacing the channel-specific and/or , by a joint and/or . In what follows, except when explicitly noted otherwise, we will be using and notation for an arbitrary mode of operation for simplicity and clarity. In certain application scenarios, such as image classification, Fig. 2a, b, it is convenient to choose the number of axons as a square of the linear filter (kernel) dimension which is typically an odd number, resulting in, e.g., or . Some other applications may call for an arbitrary N, not necessarily a square. In this case two approaches can be adopted to exploit the PPNN architecture from Fig. 1, bearing in mind that splitter and combiner from Fig. 1b, d were engineered assuming N to be a power of 2. First approach is straight-forward and assumes using the N needed axons and ignoring the remaining ones that are supplementing to the closest power-of-2 number larger than N. In this case, certain amount of optical power will be lost, but being proportional to , loss will never exceed 3dB. Second approach aims to eliminate power losses at the expense of redesigning the splitter and combiner, asserting identical phase shift along all paths resulting in coherence preservation between the signals traveling along different axons. The algorithm for designing such splitter and the corresponding combiner is presented in Section 3 of Supplementary Document.

Impairment analysis

Operating PPNN in power-saving mode with a single active channel, opens the possibility to bypass the DE/MUXes in axons and center all passive (splitters, combiners) and active components (switches, input and weight modulators) to the channel’s central wavelength, leaving no room for output degradation due to wavelength dependent properties of optical components. On the other hand, having a multichannel PPNN (modes #1 through #3) rightfully raises a concern on whether all channels will perform in equal manner, having similar relative error between the targeted output, given by matrix element in Eq. (2b), and experimentally obtained value . The wavelength dependent loss and phase accumulation along with the crosstalk in DE/MUXes could lead to performance degradation of some channels to a higher extent than the others, measured by increase of absolute, , and relative error, , between the matrix elements. Setting the limit for tolerable relative error can be a challenging task as the network’s error-tolerance depends on the assignment in which it is employed and on the training algorithm. As a rule of thumb, an acceptable PPNN error should be lower than the training error, which is commonly in the range of few percent[21-23]. Moreover, employing noise-aware training algorithms has proven to increase the resilience of the NN models even in the noisy environment[24], where the noise should be understood as a broad term encapsulating any randomly distributed deviation from the targeted output. Following the above said, in this Section we set to investigate how much will the experimental PPNN transfer matrix, , deviate from the targeted one, , and whether this deviation can be counteracted. We start our analysis by examining the effect of wavelength dependence of X-couplers, used for splitting and combining stages, as well as optical switches, used for signal routing within the axons. In what follows, the number of axons N is assumed to be a power of two, implying that the splitting and combining stages are composed of cascaded 3dB X-couplers. Nevertheless, all the conclusions can be generalized to an arbitrary number of axons N, following the splitter/combiner design outlined in Section 3 of Supplementary Document. The wavelength dependent power splitting ratio of the coupler for the mth channel can be written as , where denotes coefficient’s deviation from the targeted value of 1/2. All three switches, , and , are assumed to introduce wavelength dependent loss-penalty, such that the amount of optical power forwarded to the active port is proportional to . According to the detailed study reported in Section 4 of Supplementary Document, we find the output electric field from PPNN in a column-vector formwhere denotes the transfer matrix of the switch and stands for the bar/cross transfer matrix of an X-coupler, both wavelength dependent. Ensuring the constructive interference at the output 3dB coupler and preserving the sign integrity of the resulting output field requires phase compensation and per-channel loss balancing within the bias branch, which is achieved by modified weight matrix , with its mth element Both the coefficient pondering in (4) and the one pondering in (3) depend only on the properties of the switches and X-couplers, and remain unchanged regardless of the input sequence and/or weighs. Comparing (3) to the ideal case given by (1)–(2), it can be seen that the interference condition is successfully fulfilled by individual control of the bias amplitude and phase according to (4). Different channels will certainly accumulate different amount of loss, however, this disbalance can be easily counteracted by employing a set of Variable Optical Attenuators (VOAs) at the demultiplexed output of the PPNN (refer to Fig. 1a). Having the possibility to resolve this challenge outside of the core of PPNN, from this point on, we assume that wavelength dependence of X-couplers and switches is not critical, and we focus on the impairments which may cause degradation of the targeted matrix . For implementing the inputs , we use Mach-Zehnder Modulators (MZMs) in our study, with c being the index of the channel at which the MZM is centered. We assume that MZMs have voltage-controlled Phase Shifters (PS) in both arms (indexed as “1/2” for upper/lower arm, respectively) and are operated in push-pull configuration with DC induced phase shifts given as and RF induced as with and where L and denote the lengths of RF and DC active regions and , with and being the refractive index at zero applied voltage and its deviation when the voltage is applied. The transfer function of the MZM is given asand is tailored such that by choosing the DC voltages (biases) which induce phase shifts separated by , implying and . Assuming that the modulation-induced phase-variation does not contribute significantly to the overall wavelength dependence, the MZM transfer function can be approximated by For modes of operation #3 and #4, MZM transfer function will be centered at a certain , i.e., optimized to deliver targeted input at the given channel by enforcing and setting the argument of the exponential function in Eq. (5) to a multiple of . For any other channel m, the imprinted value will deviate from the targeted one. Following the detailed analysis of the input modulator operation given in Section 5 of Supplementary Document, relying on the 1 order Taylor expansion of the phases and around , we find that the mth channel of the nth axon carries the input value given by where and stand for normalized lengths of RF and DC phase shifters within the MZM and are restricted to , is the group refractive index, and denotes channel spacing (assuming equidistant channels). Parameter represents the phase shift accumulated by channel m and reveals four important facts: (i) it does not depend on targeted value implying that the phase accumulation does not vary with the input sequence; (ii) it does not depend on the axon index n, implying that all axons introduce the same amount of phase accumulation that can be compensated outside the OLAU rather than within the OLAU itself; (iii) it depends on the difference between m and c implying that all side channels of the same order have the same phase accumulation which magnitude increases with ; (iv) it increases with the channel spacing . In order to implement the weights a combination of MZM and an independent PS can be used. Depending on targeted application, amplitude modulation can be achieved either through absorption control[4,8,23] or by employing interferometric modules[9,10,22] using either T/O or E/O PSs. Aligning with the majority of reported state-of-the-art coherent layouts targeting inference, and thus allowing slow reconfiguration rates, we choose thermally controlled PSs both within MZM’s arms and in the PS that follows. Here we note that cointegration of the E/O (input) and T/O (weight) modulators requires careful planning in order to avoid thermal crosstalk but has turned into a well-established process during the last years, with significant on-chip demonstrations of co-integrated E/O and T/O structures both in the fields of silicon-based transceivers[25], as well as in neuromorphic photonics[22,23]. Adopting thermally insulating trenches and/or heat shunts[26,27] or more elaborate approaches such as thermal eigenmode decomposition[28], can be additionally employed, if necessary, in order to ensure reliable operation of both device types in diverse PIC platforms, including Si and InP ones. Unlike E/O MZM, the T/O MZM cannot be operated in push-pull configuration; instead, it can be made asymmetrical by changing the length of the waveguide(s) in one or both of its arms to achieve a built-in phase difference of at the nominal temperature and , or, in other words, it will be biased at -point. At any point in time, only one PS is being used for adjusting the weight magnitude depending on the ratio of and . This is reflected in the electric field transfer function of the MZM-PS systemwhere is the phase accumulated in MZM at , is the phase shift due to applied differential temperature , and is the phase accumulated in the standalone PS. Similar to the case of input MZM, we can neglect the contribution of variation with the wavelength and approximate the MZM-PS transfer function bytaking into account that it will be centered at yielding , implying also and where . For any channel , staying restricted to the 1st order approximation and assuming which is expected in all cases of practical interest, following the detailed derivation given in Section 6 of Supplementary Document, we find that the mth channel of the nth axon carries the weight where and represent normalized lengths of the PSs within the MZM and the standalone PS, respectively, with L and being their lengths. Same conclusions enlisted earlier for hold for . For signal multiplexing and demultiplexing Arrayed Waveguide Gratings (AWGs) are used, with a flat channel-wise spectral response over the frequency band of interest. We assume that the AWG’s power transfer function is given as a parabola in logarithmic domain, symmetrical and centered at the channel’s wavelength, and that it introduces negligible overall losses. In linear domain, the transfer function corresponds to the far-field shape, i.e., a Gaussian function versus the wavelength[29]. The crosstalk of the AWG, defined as the ratio of powers of the first suppressed channel and the pass channel, is denoted as in linear terms, or in logarithmic (dB) domain. In what follows, we assume zero insertion loss (IL) and restrict ourselves to the 1 order approximation where it is assumed that the crosstalk is relevant only between adjacent channels. We also assume that the curvature of the output free-propagating region of the AWG matches the curvature of the Gaussian field (its equiphase line in transversal plane) yielding zero-phase difference between adjacent output waveguides. When passing through the DEMUX, channel m will be distributed not only to the mth output port, but also to ports , with the ratio of powers being determined by . This will cause the mth channel in adjacent waveguides to be modulated by input or weight targeted at channels . Subsequently, when collected by MUX, reversed process will follow, which will gather all the signals back to the output, leading to mixing of inputs or weights belonging to the three adjacent paths, with the appropriate coefficients. Following the detailed derivation given in Section 7 of Supplementary Document, we find that the actual, imprinted value of the input in modes of operation #1 and #2 deviates from the targeted one asunder the constrain and with the same formalism being applied to weights in modes #1 and #3, and biases in all modes of operation. Unlike the deviation coming from using a single modulator for multiple channels, which can be compensated to a certain extent, the crosstalk originating from the AWG cannot be easily counteracted outside the OLAU as it its pattern-dependent and, consequently, depends both on the index of the axon n and index of the channel m. Having identified wavelength-dependent behaviour of the PPNN’s constituent components, its experimental diagonal transfer matrix, , can be derived based on the PPNN configuration for different modes of operation, as per Tables 1 and 2, following the path of the signal in Fig. 1e, relying on Eq. (12) for modeling the AWG response, and Eqs. (5) and (8) for unapproximated input and weight modulator transfer functions. Similar as in the case of in Eq. (2a), we disregard the accumulated phase shift in and restrain our focus only to the phase difference between the bias branch and the OLAU and between the axons in the OLAU itself, as these lead to potential performance deterioration through impairment of interference conditions. In order to perform phase alignment between the bias branch and the OLAU in modes of operation which assume using a single modulator for enforcing inputs or weights to multiple channels (mode #3 for inputs and #2 for weights), we modify the bias branch transfer matrix from to in mode #2 or in mode #3, wherewith and being defined by Eqs. (7b) and (11b), respectively. In this manner, channel-selective phase accumulation originating from Eqs. (7a) and (11a) is cancelled, as detailed in Section 8 of Supplementary Document. It should be stressed that derived based on Eqs. (7), (11) and (12) is approximate and, even though the phase compensation is carried out via the PSs in the bias branch, certain deviation from will remain. In the forthcoming analysis, these will be quantified by absolute, , and relative error, , between the experimental, , and targeted, , diagonal matrix elements. The errors can be derived based on the expressions correlating and in Section 8 of Supplementary Document.

PPNN performance analysis

For our case-study, we assume silicon platform, with the refractive index dependence on wavelength at different temperatures taken from[30]. At and we have and . In case of E/O modulators, unless doping is severe and/or composite materials are used, optical properties of the undoped silicon (where the majority of light is confined) remain the same as above, whereas the dependence of the refractive index on the voltage is assumed to be approximately linear for the voltage ranges of interest. Using Monte-Carlo method, we observe sets of random, uniformly distributed input and weight values, chosen on the domain and and keep the bias fixed to in order to ensure that the information about the sign of the sum is preserved when transitioning to the power domain. When employing PPNN in trained environment, bias weight can take any value from imposed by the training algorithm. Following the simulation, the diagonal matrix elements and are aggregated and 2-D scatter plots analyzed using multivariate statistical approach to determine deviations in terms of absolute and relative error. Comparison between the convolutional (#2, left-hand-side) and the fully-connected (#3, right-hand-side) mode of PPNN operation with channels, optimized for operation at channel , and axons for and . Channel-wise color coded 2-D scatter plots of the targeted matrix element and (a), (b) the magnitude and (c), (d) the argument of the experimental matrix element and (e), (f) the algebraic magnitude of the absolute deviation of the experimental from targeted matrix element, , with , all with displayed univariate kernel probability density plots on the corresponding horizontal and vertical axes of the scatter plots. Figure 4 shows 2-D scatter plots for two different modes of operation, convolutional (left-hand-side) and FC (right-hand-side), for T/O MZM biasing point , normalized lengths and , nominal channel spacing , translating to approximately in frequency domain, and . Phase alignment between the bias branch and the OLAUs output has been carried out following Eq. (13).

Figure 4

Comparison between the convolutional (#2, left-hand-side) and the fully-connected (#3, right-hand-side) mode of PPNN operation with channels, optimized for operation at channel , and axons for and . Channel-wise color coded 2-D scatter plots of the targeted matrix element and (a), (b) the magnitude and (c), (d) the argument of the experimental matrix element and (e), (f) the algebraic magnitude of the absolute deviation of the experimental from targeted matrix element, , with , all with displayed univariate kernel probability density plots on the corresponding horizontal and vertical axes of the scatter plots.

In terms of magnitude of the experimental matrix element, , versus the targeted matrix element, , both modes of operation show similar performance, as confirmed by Fig. 4a, b, when optimized for the same channel, , out of color-coded channels in the PPNN when a single modulator is used, or, optimized for m if a modulator per channel is used. The Spearman’s rank correlation coefficient in both cases given in Fig. 4a, b exceeds 0.999 for all 4 observed channels, indicating almost perfect monotonic relation between the two quantities. The univariate Probability Density Functions (PDFs) of both and retain Gaussian shape, complying with Central Limit Theorem (CLT). Nevertheless, a slight downshift in the means of edge channels’ PDFs can be observed ( and ), or, in other words, reduction in the mean value of the experimental matrix element comparing to the targeted one. The downshift implies that edge channels encounter greater power loss than the inner ones during the propagation through PPNN, which can be attributed to the DEMUX/MUX pairs embracing the modulators in the input and weight banks. Namely, as the edge channel gets demultiplexed, the fraction of its optical power that is proportional to the crosstalk strength () and is sent to an adjacent channel not supported by PPNN (channel 0 for and channel for ) gets irreversibly lost during demultiplexing step. This effect is not observed for inner channels, since they distribute their crosstalk signals to the adjacent channels which are supported by PPNN, and can be later on collected by MUX, as described in Section 7 of Supplementary Document. This edge-channel loss penalty is captured by and in Eq. (12) and its counterpart for . Scatter plots of the argument of versus , given in Fig. 4c, d, reveal that phase alignment based on the approximate expression given by Eqs. (7b) and (11b) yields excellent results, bringing the residual phase shifts below . The distribution of is well approximated by Gaussian owing to CLT and depends to a certain extent on the targeted matrix element value. It can be also noticed that the edge channels ( and ) suffer a shift of the PDFs as was the case with the PDFs describing the magnitude of , arising from non-symmetrical phase shifts seen by the 1 and channel. This time, however, the shift of the mean is of different sign: positive for the 1 and negative for the channel. In both cases, the shift originates from the crosstalk in the bias branch, where phase compensation is performed. Looking at the bias counterpart of (12), the crosstalk term is proportional to , and, having for all supported channels , should amount to 0. Yet, when or , the signals are not counterbalanced since , leaving a residual crosstalk term proportional to , which is multiplied by or depending on the mode of operation, as detailed in Section 8 of Supplementary Document. On the other hand, the elements of depend on the difference between the observed channel m and the channel with respect to which the modulator was centered, c, as (7b) and (11b) show. This leads to phase shifts of different signs for the 1 and the channel, since the typical choice is . Regardless of means being shifted, standard deviations of the corresponding quasi-Gaussian PDFs remain similar as for the inner channels ( and ). Finally, in Fig. 4e, f, we observe the algebraic magnitude of the absolute error between the experimental and the targeted transfer matrix elements, . The effect of mean drifting for edge channels, observed in Fig. 4a, b, can now be quantified and, for all analyzed cases stays below which yields the maximum relative error of the order of for edge channels. In case of inner channels, the error is centered in the proximity of 0 and, for a given and stays below in of analyzed random sets. We extend our analysis to all multichannel modes of PPNN operation according to Table 1 for from 0.4 to (translating to grid spacing of 50–) and from to , accounting for channels centered at when a single modulator for all channels is used, and at m otherwise, aiming to determine the influence of various system parameters on the relative error of the matrix element, . Figure 5 shows mean values of relative errors over the collection of analyzed samples, together with 5–95% confidence bounds versus for AWG crosstalk of and versus for channel spacing of . As observed in scatter plots given in Fig. 4, we again confirm based on Fig. 5 that edge channels ( and ) introduce similar amount of error (lines are overlapping), which is greater than the error encountered by inner channels (), also overlapping among themselves. The underlying cause is related to the asymmetry in the filed magnitude and phase shifts accumulated by edge channels when passing through AWG, as previously elaborated. The important conclusion stemming from this overlap is that the number of employed channels M does not pose a challenge for any of the PPNN modes of operation, as long as phase compensation is done within the bias branch following Eq. (13).

Figure 5

Mean relative errors of the matrix element (given in percent) with to confidence bounds for (a), (b) multi-neuron, (c), (d) convolutional, and (e), (f) FC mode of operation, depending on (a), (c), (e) channel spacing for and (b), (d), (f) AWG crosstalk for . Comparing different modes of operation in Fig. 5 reveals that the mean relative error, be it higher for the edge channels or lower for the inner ones, remains fairly similar for different modes of operation (excluding very high ), having weaker dependence on than on . For it does not exceed for any analyzed , however, as the crosstalk increases, the mean error shoots up exponentially, surpassing for the edge channels at and remaining within manageable values of up to for the inner ones even at . On the other hand, there is a significant difference in the confidence interval between the modes of operation: it is widest for the multi-neuron mode of operation, given in Fig. 5a, b, and reduces for convolutional and FC modes, given in Fig. 5c–f, implying that, although not common, large errors can occur in multi-neuron case. Same evolution of the confidence interval can be seen with respect to AWG crosstalk, Fig. 5b, d, f, revealing that having more DE/MUX stages in mode #1 comparing to the remaining 2 modes of operation is actually responsible for its sizeable spread of errors, as is expected based on the Eq. (12). Looking at convolutional, Fig. 5c, d, and FC mode of operation, Fig. 5e, f, difference can be observed in the confidence intervals, and to a certain extent in the mean relative error for the inner channels, indicating that convolutional mode of operation seems to exhibit better overall performance. Yet, from architectural point of view, Figs. 1, 2 and 3, the two are nearly interchangeable. At the same time, our analysis shows that the normalized modulator lengths , , and play marginal role in relative error means and confidence intervals, as was expected having in mind that the accumulated phase given by Eqs. (7b) and (11b) is compensated by the PSs within the bias modulator bank following Eq. (13). The difference, thus, comes in response to different domains of inputs and weights, i.e., the the quantities enforced jointly to all-channels and the ones enforced on per-channel bases. Repeating the analysis from Fig. 5 for weights restricted to the same domain as inputs, namely , confirms that the confidence intervals slightly reduce for both modes of operation and, more importantly, become similar in magnitude. This can be explained by reducing the magnitude of crosstalk in weight modulator bank in the FC mode of operation by halving the range of the values can take in the equivalent of Eq. (12) for . Relative error 5–95% confidence interval (given in %) versus the neuron fan-in N at and for (a) convolutional and (b) fully-connected mode. The study of the PPNN performance on fan-in has been carried out for N ranging from 2 to 64 and reported in Fig. 6 for convolutional and FC configuration. A clear trend can be observed for both modes of operation where the confidence interval reduces with the increase of N, stemming from narrowing of the univariate PDF of both and , complying with CLT, whereby the standard deviation decreases with . The values of the mean relative error remain similar to the ones in Fig. 5 across different N values, implying that, similar to other analyzed parameters, the number of axons does not pose a challenge to PPNN operation.

Figure 6

Relative error 5–95% confidence interval (given in %) versus the neuron fan-in N at and for (a) convolutional and (b) fully-connected mode.

Implementation considerations and perspectives

Here we discuss the practical aspects of PPNN implementation, focusing on insertion losses (), power consumption (), footprint () and throughput (), jointly shaping the energy- and footprint-efficiency, defined as the ratio of the throughput and the power consumption or the PPNN area, respectively. We recognize the penalties introduced by sub-optimal resource employment, such as powering off some of the LDs or keeping some of the axons dark, i.e., using less channels () or less axons () than the PPNN supports. Based on the detailed study reported in Section 9 of Supplementary Document, we find the respective values per number of active channels for power-of-2 splitting and combining stages where , and denote per-device insertion losses, length and power consumption, with the exception of which stands for the optical power of the LD per channel. Indices refer, in the given order, to DE/MUX, switch, X-coupler, input amplitude modulator, weight amplitude and phase modulator and routing waveguides. Moreover, is the wall-plug efficiency of the LD, is the total length of an axon, distance between lateral waveguides, is the datarate of the input modulator, and are the switch states defined in Table 1 depending on the mode of operation. The first two terms of in (14a) denote the penalty introduced by multichannel operation () and programmability (), whereas the last term denotes the penalty in the form of irreversibly lost optical power when axons are used. No IL penalty is observed when channels are employed. The PPNN power consumption per channel, given by (14b), is governed by all of its active components, which are, in turn, powered on based on the states of the switches and modes of operation. The power consumption of the optional Transimpedance Amplifier (TIA) and Temperature Controller (TEC) are excluded from the analysis as they would contribute to the total power consumption in a similar manner regardless of the multichannel operation or PNN programmability. Comparing to its predecessor, dual-IQ coherent linear neuron[21], power consumption of PPNN in modes #1 and #4 is similar to that of dual-IQ, with a minor penalty in PPNN case owing to its programmability. However, operating in either mode #2 (convolutional) or #3 (fully-connected) brings power savings in PPNN case through weight (#2) or input (#3) modulator sharing, since the coefficients pondering and , respectively, get divided by the number of active channels, , implying increased energy-efficiency of the PPNN comparing to using dual-IQ neurons. Comparing the PPNN footprint per channel, given by (14c), to that of dual-IQ, we can observe both longitudinal and lateral penalty, the former due to DE/MUXes and switches making longer for PPNN than for dual-IQ, and the latter due to the existence of two alternative routes a signal can take within the input and/or weight banks. Focusing on two corner scenarios, when (i) and (ii) , the lateral footprint penalty due to multichannel operation and programmability ranges from multiplicative factors of (i) (best-case scenario) to (ii) (worst-case scenario). The second case reveals that power-saving mode of operation comes at a price of footprint penalty proportional to the number of channels for which PPNN was designed. The thorough study on wavelength dependence of individual components could be further extended to incorporate the temperature dependent operation of devices and statistical differences between the employed components. Temperature dependent operation would provide useful information regarding the performance reliability in realistic conditions where on-chip temperatures up to 80–100C can be encountered. An extended analysis where statistical differences between the employed components are taken into account would provide a clearer insight with respect to its practical perspectives, since current silicon photonic platforms don’t guarantee identical performance for identical devices, calling for a system tolerance analysis. The study can also be expanded to different types of input/weight modulators which are governed by different amplitude and phase equations, aiming to conclude to analytical expressions for deviation compensation. On the system level, two upscaling directions can be taken. One relates to interconnection of multiple PPNNs and employing them in inference task in order to estimate their accuracy under a non-random load. The second relies on the positive impact that the increase of number of axons has on the reduction of the confidence interval of relative error reported in Fig. 6. This indicates that PPNN architecture can be reliably extended into a two-dimensional arrangement, similar to our recently proposed photonic crossbar[31], yielding K spatially separated neuron outputs. Boosted by WDM, crossbar could support a total of logical outputs, while also offering flexibility to switch between the different modes of operation, approaching to the photonic FPGA concept.

Conclusion

In this manuscript we present an in-situ reconfigurable coherent PNN, exploiting the wavelength domain for achieving parallel operation of multiple neurons with flexible, user-defined interconnection graph, supporting four distinct modes of operation, among others convolutional and fully-connected layer. We carry out a detailed analytical study of the modulator and DE/MUX wavelength dependence, offering a simple approach for restoring the PNN fidelity through phase alignment of the bias signal, revealing that the majority of the residual errors comes from the crosstalk in DE/MUX stages. The analytical approach is benchmarked against Monte-Carlo simulation showing that the residual relative error typically remains within the manageable 2% range for AWG crosstalk of up to . More importantly, the PNN performance does not degrade with the increase of number of channels or the neuron fan-in as long as phase alignment in the bias branch is carried out, supporting seamless network upscaling, including the extension to multi-column arrangements for vector-by-matrix multiplication. The relative error dependence on channel spacing is weak, allowing the PNN to be operated equally well in coarse and dense WDM systems. Supplementary Information.

6 in total

1. 11 TOPS photonic convolutional accelerator for optical neural networks.

Authors: Xingyuan Xu; Mengxi Tan; Bill Corcoran; Jiayang Wu; Andreas Boes; Thach G Nguyen; Sai T Chu; Brent E Little; Damien G Hicks; Roberto Morandotti; Arnan Mitchell; David J Moss
Journal: Nature Date: 2021-01-06 Impact factor: 49.962

2. Parallel convolutional processing using an integrated photonic tensor core.

Authors: J Feldmann; N Youngblood; M Karpov; H Gehring; X Li; M Stappers; M Le Gallo; X Fu; A Lukashchuk; A S Raja; J Liu; C D Wright; A Sebastian; T J Kippenberg; W H P Pernice; H Bhaskaran
Journal: Nature Date: 2021-01-06 Impact factor: 49.962

3. Adaptive sigmoid-like and PReLU activation functions for all-optical perceptron.

Authors: Jasna Crnjanski; Marko Krstić; Angelina Totović; Nikos Pleros; Dejan Gvozdić
Journal: Opt Lett Date: 2021-05-01 Impact factor: 3.776

4. Experimental realization of arbitrary activation functions for optical neural networks.

Authors: Monireh Moayedi Pour Fard; Ian A D Williamson; Matthew Edwards; Ke Liu; Sunil Pai; Ben Bartlett; Momchil Minkov; Tyler W Hughes; Shanhui Fan; Thien-An Nguyen
Journal: Opt Express Date: 2020-04-13 Impact factor: 3.894

Review 5. Programmable photonic circuits.

Authors: Wim Bogaerts; Daniel Pérez; José Capmany; David A B Miller; Joyce Poon; Dirk Englund; Francesco Morichetti; Andrea Melloni
Journal: Nature Date: 2020-10-07 Impact factor: 69.504

6. An optical neural chip for implementing complex-valued neural network.

Authors: H Zhang; M Gu; X D Jiang; J Thompson; H Cai; S Paesani; R Santagati; A Laing; Y Zhang; M H Yung; Y Z Shi; F K Muhammad; G Q Lo; X S Luo; B Dong; D L Kwong; L C Kwek; A Q Liu
Journal: Nat Commun Date: 2021-01-19 Impact factor: 14.919

6 in total