| Literature DB >> 34723097 |
A Hamann1, V Dunjko2, S Wölk1,3.
Abstract
In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the "rewarded space" is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.Entities:
Keywords: Amplitude amplification; Quantum-classical hybrid agent; Reinforcement learning
Year: 2021 PMID: 34723097 PMCID: PMC8550166 DOI: 10.1007/s42484-021-00049-7
Source DB: PubMed Journal: Quantum Mach Intell ISSN: 2524-4906
Fig. 1Visualization of the time evolution of |ψ(0)〉 under Grover iterations with changing oracles for 1 ≤ k ≤ K and O = O for K + 1 ≤ k ≤ J and . The rewarded space (red plane) of O is spanned by {|w〉,|w⊥〉}. The equal superposition state |ψ(0)〉 is rotated along the blue circle by an angle during the first phase leading to the state |ψ(K)〉. Consecutively, this state is rotated along the green circle changing only its component |ϕ〉 but not |w⊥〉
Fig. 2Comparison of the success probabilities for different numbers of Grover iterations during the first and second phase with (blue), (green) and (red) for and thus
Fig. 3Minimal expected average cost C of a continuous Grover algorithm with changing oracle with optimized j (red) for and fixed ϕ = 0.6 and ε ≈ 0.32. The expected average cost for an interrupted Grover algorithm with fixed oracles is denoted by a blue dashed line for comparison