| Literature DB >> 35351897 |
Ekaterina Govorkova1, Ema Puljak2, Thea Aarrestad1, Maurizio Pierini1, Kinga Anna Woźniak1,3, Jennifer Ngadiuba4,5.
Abstract
In the particle detectors at the Large Hadron Collider, hundreds of millions of proton-proton collisions are produced every second. If one could store the whole data stream produced in these collisions, tens of terabytes of data would be written to disk every second. The general-purpose experiments ATLAS and CMS reduce this overwhelming data volume to a sustainable level, by deciding in real-time whether each collision event should be kept for further analysis or be discarded. We introduce a dataset of proton collision events that emulates a typical data stream collected by such a real-time processing system, pre-filtered by requiring the presence of at least one electron or muon. This dataset could be used to develop novel event selection strategies and assess their sensitivity to new phenomena. In particular, we intend to stimulate a community-based effort towards the design of novel algorithms for performing unsupervised new physics detection, customized to fit the bandwidth, latency and computational resource constraints of the real-time event selection system of a typical particle detector.Entities:
Year: 2022 PMID: 35351897 PMCID: PMC9070018 DOI: 10.1038/s41597-022-01187-8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1The real-time data processing flow of the ATLAS and CMS experiments: M collisions are produced every second and processed by the hardware-based event selection system, consisting of algorithms implemented as logic circuits on custom electronic boards. Of these events, 100k events/s are accepted and passed to the second selection stage, the HLT, which selects about 1000 events/s for offline physics studies.
Fig. 2The reference system used to describe the momentum coordinates of the particles in the dataset.
The names and corresponding Zenodo reference for each dataset, the total number of collision events and the dataset type (S for signal and B for background).
| Sample name | Number of events | Type |
|---|---|---|
| SM processes[ | 4,000,000 | B |
| 340,544 | S | |
| 55,969 | S | |
| 691,283 | S | |
| 760,272 | S | |
| 4,210,492 | S + B |
Fig. 3Distribution of the p (left), ϕ (center) and η (right) coordinates of the physics objects entering the dataset, for missing transverse energy, MET (top row), electrons (second row), muons (third row) and jets (bottom row).
| Measurement(s) | Simulations of LHC collisions in real-time processing data format |
| Technology Type(s) | PYTHIA, DELPHES, private python code |
| Factor Type(s) | phi • eta • pT |