Reinforcement Learning in Sparse-Reward Environments with Hindsight Policy Gradients

Rauber, P; Ummadisingu, A; Mutz, F; Schmidhuber, J

dc.contributor.author	Rauber, P
dc.contributor.author	Ummadisingu, A
dc.contributor.author	Mutz, F
dc.contributor.author	Schmidhuber, J
dc.date.accessioned	2021-06-03T13:27:57Z
dc.date.available	2020-08-17
dc.date.available	2021-06-03T13:27:57Z
dc.date.issued	2021-05
dc.identifier.citation	Rauber, Paulo et al. "Reinforcement Learning In Sparse-Reward Environments With Hindsight Policy Gradients". Neural Computation, vol 33, no. 6, 2021, pp. 1498-1553. MIT Press - Journals, doi:10.1162/neco_a_01387. Accessed 3 June 2021.	en_US
dc.identifier.issn	0899-7667
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/72285
dc.description.abstract	A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.	en_US
dc.publisher	Massachusetts Institute of Technology Press (MIT Press)	en_US
dc.relation.ispartof	Neural Computation
dc.rights	This is a pre-copyedited, author-produced version of an article accepted for publication in Neural Computation following peer review. The version of record is available https://direct.mit.edu/neco/article/33/6/1498/100578/Reinforcement-Learning-in-Sparse-Reward
dc.title	Reinforcement Learning in Sparse-Reward Environments with Hindsight Policy Gradients	en_US
dc.type	Article	en_US
dc.rights.holder	© 2021 Massachusetts Institute of Technology
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2020-08-17
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Rauber Reinforcement Learning ...
Size:: 1.370Mb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3490]

Show simple item record