| Literature DB >> 33791596 |
Yutaro Iiyama1, Gianluca Cerminara2, Abhijay Gupta2, Jan Kieseler2, Vladimir Loncar2,3, Maurizio Pierini2, Shah Rukh Qasim2,4, Marcel Rieger2, Sioni Summers2, Gerrit Van Onsem2, Kinga Anna Wozniak2,5, Jennifer Ngadiuba6, Giuseppe Di Guglielmo7, Javier Duarte8, Philip Harris9, Dylan Rankin9, Sergo Jindariani10, Mia Liu10, Kevin Pedro10, Nhan Tran10,11, Edward Kreinar12, Zhenbin Wu13.
Abstract
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than one μs on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the hls4ml library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage.Entities:
Keywords: deep learning; fast inference; field-programmable gate arrays; graph network; imaging calorimeter
Year: 2021 PMID: 33791596 PMCID: PMC8006281 DOI: 10.3389/fdata.2020.598927
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
FIGURE 1Processing flow of the modified GarNet algorithm: (A) The input features of each vertex are processed by a linear network, that returns a new set of features and its distance from the S aggregators . (B) A graph is built in the learned space, using the distances. (C) A message is gathered by each aggregator, as a weighted sum across the vertices of , with as weights. (D) A message from each aggregator () is passed back to each vertex, with the same weight. (E) The aggregated outputs of each vertex are given as input to a neural network, which returns the learned representation.
FIGURE 2Schematics of the high-granularity and low-granularity regions of the (A) electromagnetic and (B) hadron layers.
FIGURE 3Examples of electron (A), (C) and pion (B), (D) events. Values in parentheses in the graph titles are the respective energy depositions contained in the cluster around the seed hit. Points represent hits in the detector, with their coordinates at the center of the corresponding detector cells and the size of the markers proportional to the square root of the hit energy. Opaque points are within the cluster, while the translucent ones are not. In (A) and (B), the point color scale from blue to red corresponds to the primary fraction (see Section 5.1 for definition). In (C) and (D), the color scale from blue to green corresponds to , which is an indication of the importance the neural network model places to individual hits for energy regression. See Section 5.3 for details.
FIGURE 4Classification (A) and regression (B) inference performance of the continuous and quantized GarNet-based models and the reference algorithms. Results from the Keras and HLS implementations are shown for the GarNet-based models. The classification performance is quantified with a ROC curve of electron identification efficiency vs. pion rejection efficiency. The inset in (A) shows a close-up view of the efficiency range 0.90–0.96 for both axes. The regression performance is quantified as the response in 10 GeV bins of . The horizontal line in the box corresponds to the median of the distribution, the top and bottom of the box to the upper and lower quartiles, and the upper and lower ends of the whiskers to the 95th and 5th percentiles.
Summary of the latency, II, FPGA resource usage metrics, and inference accuracy metrics of the synthesized firmware. The reported resource usage numbers reflect the synthesis estimates from Vivado HLS. The target FPGA is a Xilinx Kintex UltraScale FPGA (part number xcku115-flvb2104-2-i), which has 5,520 DSPs, 663,360 LUTs, 1,326,720 FFs, and 77.8 Mb of BRAM (Xilinx, 2020). The utilized percentage of the targeted FPGA resources are denoted in the square brackets.
| Model |
|
| Latency (Cycles) | Interval (Cycles) | DSP ( | LUT ( | FF ( | BRAM (Mb) | ROC AUC | Response RMS |
|---|---|---|---|---|---|---|---|---|---|---|
| Continuous | 128 | 32 | 155 | 55 | 3.1 [56%] | 57 [9%] | 39 [2.9%] | 1.8 [2.3%] | 0.98 | 0.23 |
| Quantized | 128 | 32 | 148 | 50 | 1.6 [29%] | 70 [11%] | 41 [3.1%] | 1.9 [2.4%] | 0.98 | 0.24 |
| Quantized | 64 | 16 | 99 | 34 | 1.6 [29%] | 63 [9%] | 38 [2.9%] | 1.8 [2.3%] | 0.96 | 0.24 |
| Quantized | 32 | 8 | 75 | 26 | 1.4 [25%] | 52 [8%] | 33 [2.5%] | 1.8 [2.3%] | 0.86 | 0.37 |
| Quantized | 16 | 4 | 63 | 22 | 1.5 [27%] | 57 [9%] | 37 [2.8%] | 1.8 [2.3%] | 0.64 | 0.36 |