Show simple item record

dc.contributor.author?lhan, EMen_US
dc.date.accessioned2022-12-19T17:46:06Z
dc.date.issued2023
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/83282
dc.description.abstractDeep Reinforcement Learning (RL) algorithms can solve complex sequential decision-making tasks successfully. However, they suffer from the major drawbacks of having poor sample efficiency and long training times, which can often be tackled by knowledge reuse. Action advising is a promising knowledge exchange mechanism that adopts the teacher-student paradigm to leverage some legacy knowledge through a budget-limited number of interactions in the form of action advice between peers. In this thesis, we studied action advising techniques, particularly in Deep RL domain, both in single-agent and multi-agent scenarios. We proposed a heuristic-based jointly-initiated action advising method that is suitable for multi-agent Deep RL setting, for the first time in literature. By adopting Random Network Distillation (RND), we devised a measurement for agents to assess their confidence in any given state to initiate the teacher-student dynamics with no prior role assumptions. We also used RND as an advice novelty metric to construct more robust student-initiated advice query strategies in single-agent Deep RL. Moreover, we addressed the absence of advice utilisation mechanisms beyond collection by employing a behavioural cloning module to imitate the teacher's advice. We also proposed a method to automatically tune the relevant hyperparameters of these components on the fly to make our action advising algorithms capable of adapting to any domain with minimal human intervention. Finally, we extended our advice reuse via imitation technique to construct a unified student-initiated approach that addresses both advice collection and advice utilisation problems. The experiments we conducted in a range of Deep RL domains showed that our proposal provides significant contributions. Our Deep RL-compatible action advising techniques managed to achieve a state-of-the-art level of performance. Furthermore, we demonstrated that their practical attributes render domain adaptation and implementation processes straightforward, which is an important progression towards being able to apply action advising in real-world problems.en_US
dc.language.isoenen_US
dc.titleAccelerating Deep Reinforcement Learning via Action Advisingen_US
pubs.notesNot knownen_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Theses [4235]
    Theses Awarded by Queen Mary University of London

Show simple item record