Reinforcement Learning
Webpage for the ATCI course of the Master in Artificial Intelligence (Spring 2024)Slides of Lectures:
- Course Information: Presentation and description of the course (updated 16/2/24)
- Slides for Week 1: Definition of the RL framework. Key elements in RL. Finding the optimal policy using Value Iteration and Policy Iteration (updated 16/2/24) - Video 1a: RL framework (slides 1-16, 38 mins).
- Slides for Week 2: Introduction to Model-Free approaches. Monte-Carlo, Q-learning, Sarsa, TD(lambda) (updated 29/2/24) - Video 2a: Free-model RL and Monte Carlo policy evaluation(slides 1-16, 45 mins).
- Slides for Week 3: Function approximation (updated 29/2/24) - [Not Optional] Video (Lab) 3a: OpenAI gym (20 mins).
- Slides for Week 4: Deep Reinforcement Learning (DRL) (updated 29/2/24) - Video (Lab) 4a: Linear Function Approximation: Tile coarse coding applied to Q-learning (33 mins).
- Slides for Week 5: Policy gradient methods (updated 7/3/24) - Video 5a: Goal of lecture and probabilistic policies (slides 1-8, 27 mins).
- Slides for Week 6: Modeling Rewards: (NEW - 3/2023) and Inverse Reinforcement Learning (IRL) - Video (Lab) 6: Double DQN implemented in pytorch (43 mins).
- Slides for Week 7: MultiAgent Reinforcement Learning (MARL) and 8 : Zero-sum games (NEW - 4/2023) - Video 7a: Definitions of and problems (38 mins).
- Slides for Week 9: Sample Efficiency I: Model Based Reinforcement Learning (MBRL) - Video 9a: Motivation (34 mins).
- Slides for Week 10: Sample Efficiency II: Exploration, Conditioned policies and Curriculum Learning - Video 10a: Motivation and Sparse Rewards (9 mins).
- Slides for Week 11: Extended RL, AGI and applications - Video 11a: Motivation, Transfer Learning and Hierarchical Learning (38 mins).
- Video 1b: RL elements (slides 17-34, 46 mins).
- Video 1c: Value functions (slides 35-42, 17 mins).
- Video 1d: Policy evaluation (slides 43-50, 14 mins).
- [Optional] Video 1e (Lab): Policy evaluation notebook. (17 mins)
- Video 1f: Finding Optimal Policies (slides 51-63, 39 mins).
- [Optional] Video 1g (Lab): Policy iteration and Value Iteration notebook.
- Video 2b: Monte Carlo policy learning (slides 17-25, 19 mins).
- Video 2c: Exploration (slides 26-29, 16 mins).
- Video 2d: TD Learning: Q-learning (slides 30-40, 25 mins).
- [Optional] Video (Lab): Monte Carlo on grid-world notebook. (16 mins)
- Video 2e: TD Extended: n-steps and TD(lambda) policy evaluation (slides 41-48, 18 mins).
- Video 2f: On-policy vs Off-policy learning (slides 49-65, 32 mins).
- Video 2g: Importance Sampling for General Off-policy learning (slides 66-71, 15 mins).
- [Not Optional] Video (Lab) 3b: Q-learning implementation and demonstrarion (28 mins).
- Video 3c: Need for Function Approximation (slides 1-3, 12 mins).
- Video 3d: Incremental methods and Gradient Descent (slides 4-8, 10 mins).
- Video 3e: Linear Function Approximation (slides 9-21, 27 mins).
- Video 3f: LFA for prediction (slides 22-29, 12 mins).
- Video 3g: LFA for control (slides 30-41, 21 mins).
- Video 3h: Batch methods and LSPI (slides 43-56, 20 mins).
- Video 3i: Fitted NN Q-learning (slides 57-63, 17 mins).
- Video 4b: Deep Q-learning (DQN) (slides 1-18, 52 mins).
- Video 4c: Double DQN (slides 19-22, 9 mins).
- Video 4d: Prioritized Experience Replay (PER) (slides 23-26, 15 mins).
- Video 4e: Dueling Networks (slides 27-33, 12 mins).
- Video 4f: Multi-step learning (slides 34-35, 8 mins).
- Video 4g: Rainbow (slides 36-39, 11 mins).
- Video 4h: Parallelism in RL (slides 40-42, 11 mins).
- Video 4i: Practical Tricls (slides 47-50, 20 mins).
- [Optional] Video 4j: Speeding by optimality tightening (43-46, 8 mins).
- Video 5b: Gradient Free methods (slides 9-20, 25 mins).
- Video 5c: Policy Gradient criteria (slides 21-31, 24 mins).
- Video 5d: Vanilla Policy Gradient (VPG) (slides 31-35, 14 mins).
- Video 5e: Using temporal structure (slides 36-43, 16 mins).
- Video 5f: Actor Critic Methods (slides 44-54, 28 mins).
- Video 5g: Latest On-policy AC methods (slides 55-62, 20 mins).
- Video 5h: Off-policy AC methods: DDPG and TD3 (slides 63-67, 21 mins).
- Video 5i: Off-policy AC methods: SAC (slides 68-72, 24 mins).
- Video 6a: Definition of the problem (43 mins).
- Video 6b: Formulation and problems (12 mins).
- Video 6c: Linear Programming (32 mins).
- Video 6d: SVM and BIRL (18 mins).
- Video 6e: MaxEnt and GANs (43 mins).
- Video 7b: Mathematical formulation (54 mins).
- Video 7c: Classic algorithms (1h01 mins).
- Video 7d: Stochastic Games (25 mins).
- Video 7e: Summary last week (24 mins).
- Video 7f: Centralized / Decentralized (25 mins).
- Video 7g: MADDPG (20 mins).
- Video 7h: COMA (20 mins).
- Video 7i: Value Decomposition Network (VDN) (16 mins).
- Video 7j: QMIX (34 mins).
- Video 7k: Conclusions (34 mins).
- Video 9b: Model Definition (49 mins).
- Video 9c: Model Predictive Control like (21 mins).
- Video 9d: Simulated trajectories (45 mins).
- Video 9e: End-to-end and Conclusions (35 mins).
- Video 10b: Bandits and Exploration (1h 04 mins).
- Video 10c: Advanced Exploration in general RL setting (49 mins).
- Video 10d: Conditioned Policies and Hindsigth Experience Replay (52 mins).
- Video 10e: Curriculum Learning (1h 16 mins)
- Video 11b: Multi-Task learning, Meta-Learning and Long-life learning (30 mins).
- Video 11c: Some RL applications (24 mins).
Other resources:
- My Mindmap of RL algorithms
Notebooks and software:
- Notebook 1: Policy evaluation
- Notebook 2: Policy iteration and Value Iteration
- Notebook 3: Monte Carlo on grid-world
- Notebook 4: Introduction to OpenAI and Q-learning and Google colab version(updated 16/2/24)
- Notebook 5: Linear FA with Tile coding and Google colab version (updated 29/3/24)
- Notebook 6: DQN on Lunar Lander (colab version) (updated 7/3/24)
- Notebook 7: Actor Critic RL (soon to appear)
Additional links [Note that slides have also embedded links to main references!]
- THE book for Reinforcement Learning:
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction 2nd Edition, 2018 - Blog summary of Reinforcement Learning
- Nice blog description of Gradient methods
Additional Recent bibliography (with links to implementations)
Miguel Morales. Grokking Deep Reinforcement Learning. MANNING, 2020. [http] Alexander Zai and Brandon Brown. Deep Reinforcement Learning in Action. MANNING, 2020. [http] Maxim Lapan. Deep Reinforcement Learning Hands-on. Packt Publishing Ltd., 2nd edition, 2020. [http]
Proposed Topics for practical projects or papers review. See also last set of slides for topics and more references:
Papers in black are specially recommended
[Ng and Russell, 2000] | Ng, A. Y. and Russell, S. (2000). Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, volume 0, pages 663--670. [ .pdf ] |
[Abbeel and Ng, 2004] | Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. 21st international conference on Machine learning - ICML '04, page 1. [ DOI | arXiv | http ] |
[Ziebart et al., 2008] | Ziebart, B. D., Maas, A., Bagnell, J. A., and Dey, A. K. (2008). Maximum Entropy Inverse Reinforcement Learning. AAAI Conference on Artificial Intelligence, pages 1433--1438. [ arXiv | .pdf ] |
[Dvijotham and Todorov, 2010] | Dvijotham, K. and Todorov, E. (2010). Inverse Optimal Control with Linearly-Solvable MDPs. In International Conference on Machine Learning (ICML), pages 335--342. [ .pdf ] |
[Boularias et al., 2011] | Boularias, A., Kober, J., and Peters, J. (2011). Relative Entropy Inverse Reinforcement Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 182--189. [ .pdf ] |
[Wulfmeier et al., 2015] | Wulfmeier, M., Ondruska, P., and Posner, I. (2015). Maximum Entropy Deep Inverse Reinforcement Learning. [ arXiv | http ] |
[Alger, 2016] | Alger, M. (2016). Deep Inverse Reinforcement Learning. Technical report. [ .pdf ] |
[Finn et al., 2016] | Finn, C., Levine, S., and Abbeel, P. (2016). Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. In 33rd International Conference on Machine Learning. [ DOI | arXiv | .html ] |
[Baram et al., 2017] | Baram, N., Anschel, O., Caspi, I., and Mannor, S. (2017). End-to-End Differentiable Adversarial Imitation Learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 390--399. [ .html ] |
[Wulfmeier et al., 2017] | Wulfmeier, M., Rao, D., Wang, D. Z., Ondruska, P., and Posner, I. (2017). Large-scale cost function learning for path planning using deep inverse reinforcement learning. International Journal of Robotics Research, 36(10):1073--1087. [ DOI | http ] |
[Henderson et al., 2017] | Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv | http ] |
[Metelli et al., 2017] | Metelli, A. M., Pirotta, M., and Restelli, M. (2017). Compatible Reward Inverse Reinforcement Learning. In Advances in Neural Information Processing Systems. [ .pdf ] |
[Halperin, 2017] | Halperin, I. (2017). Inverse Reinforcement Learning for Marketing. [ arXiv | http ] |
[Fu et al., 2018] | Fu, J., Luo, K., and Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. In 6th International Conference on Learning Representations, ICLR 2018. [ arXiv | http ] |
[Arora and Doshi, 2018] | Arora, S. and Doshi, P. (2018). A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. [ arXiv | http ] |
[Le et al., 2018] | Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | http ] |
[Reddy et al., 2018] | Reddy, S., Dragan, A. D., and Levine, S. (2018). What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning. Technical report. [ http ] |
[Tucker et al., 2018] | Tucker, A., Gleave, A., and Russell, S. (2018). Inverse reinforcement learning for video games. [ arXiv | http ] |
[Xu et al., 2018] | Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv | .pdf ] |
[Haug et al., 2018] | Haug, L., Tschiatschek, S., and Singla, A. (2018). Teaching Inverse Reinforcement Learners via Features and Demonstrations. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). [ arXiv | .pdf ] |
[Behbahani et al., 2018] | Behbahani, F., Shiarlis, K., Chen, X., Kurin, V., Kasewa, S., Stirbu, C., Gomes, J., Paul, S., Oliehoek, F. A., Messias, J., and Whiteson, S. (2018). Learning from Demonstration in the Wild. [ arXiv | http ] |
[Brown and Niekum, 2018] | Brown, D. S. and Niekum, S. (2018). Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications. [ arXiv | http ] |
[Gao et al., 2018] | Gao, Y., Xu, H., Lin, J., Yu, F., Levine, S., and Darrell, T. (2018). Reinforcement Learning from Imperfect Demonstrations. 35th International Conference on Machine Learning. [ arXiv | http ] |
[Qureshi et al., 2019] | Qureshi, A. H., Boots, B., and Yip, M. C. (2019). Adversarial Imitation via Variational Inverse Reinforcement Learning. In Iclr 2019. [ arXiv | http ] |
[Kinose and Taniguchi, 2019] | Kinose, A. and Taniguchi, T. (2019). Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model. [ arXiv | http ] |
[Stadie et al., 2015] | Stadie, B. C., Levine, S., and Abbeel, P. (2015). Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arXiv, pages 1--11. [ arXiv | http ] |
[Bent et al., 2015] | Bent, O., Rashid, T., and Whiteson, S. (2015). Improving Exploration in Deep Reinforcement Learning. [ .pdf ] |
[Ostrovski et al., 2017] | Ostrovski, G., Bellemare, M. G., Van Den Oord, A., and Munos, R. (2017). Count-Based Exploration with Neural Density Models. In 34th International Conference on Machine Learning. [ arXiv | .pdf ] |
[Aslanides et al., 2017] | Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 1403--1410, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI | arXiv | http ] |
[Fu et al., 2017] | Fu, J., Co-Reyes, J. D., and Levine, S. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv | http ] |
[Tang et al., 2017] | Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. (2017). #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv | http ] |
[Martin et al., 2017] | Martin, J., Narayanan S., S., Everitt, T., and Hutter, M. (2017). Count-Based Exploration in Feature Space for Reinforcement Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 2471--2478, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI | arXiv | http ] |
[Achiam and Sastry, 2017] | Achiam, J. and Sastry, S. (2017). Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning. [ arXiv | http ] |
[Pathak et al., 2017] | Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017). Curiosity-Driven Exploration by Self-Supervised Prediction. In Proceedings of the 34th International Conference on Machine Learning, volume 2017-July. [ DOI | arXiv | http ] |
[Kaushik et al., 2018] | Kaushik, R., Chatzilygeroudis, K., and Mouret, J.-B. (2018). Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. In 2nd Conference on Robot Learning (CoRL 2018). [ arXiv | http ] |
[Colas et al., 2018] | Colas, C., Sigaud, O., and Oudeyer, P.-Y. (2018). GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In 35th International Conference on Machine Learning. [ arXiv | http ] |
[Savinov et al., 2018] | Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., and Gelly, S. (2018). Episodic Curiosity through Reachability. [ arXiv | http ] |
[Burda et al., 2018] | Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2018). Exploration by Random Network Distillation. [ arXiv | http ] |
[Osband et al., 2018] | Osband, I., Aslanides, J., and Cassirer, A. (2018). Randomized Prior Functions for Deep Reinforcement Learning. [ arXiv | http ] |
[Junyent et al., 2018] | Junyent, M., Jonsson, A., and Gómez, V. (2018). Improving width-based planning with compact policies. In ICML 2018 workshop: Planning and Learning, pages 1--21. [ .pdf ] |
[Plappert et al., 2018] | Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2018). Parameter Space Noise for Exploration. In Iclr 2018. [ arXiv | http ] |
[Shyam et al., 2018] | Shyam, P., Jaśkowski, W., and Gomez, F. (2018). Model-Based Active Exploration. [ arXiv | http ] |
[Azizzadenesheli et al., 2018] | Azizzadenesheli, K., Brunskill, E., and Anandkumar, A. (2018). Efficient Exploration through Bayesian Deep Q-Networks. Technical report. [ arXiv | .pdf ] |
[Haber et al., 2018] | Haber, N., Mrowca, D., Fei-Fei, L., and Yamins, D. L. K. (2018). Learning to Play with Intrinsically-Motivated Self-Aware Agents. [ DOI | arXiv | http ] |
[Gupta et al., 2018] | Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine, S. (2018). Meta-Reinforcement Learning of Structured Exploration Strategies. [ arXiv | http ] |
[Hong et al., 2018] | Hong, Z.-W., Fu, T.-J., Shann, T.-Y., Chang, Y.-H., and Lee, C.-Y. (2018). Adversarial Exploration Strategy for Self-Supervised Imitation Learning. [ .pdf ] |
[Fortunato et al., 2018] | Fortunato, M., Azar, M. G., Piot, B., Menick, J., Hessel, M., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., and Legg, S. (2018). Noisy Networks for Exploration. In Iclr 2018. [ arXiv | http ] |
[Moerland et al., 2018] | Moerland, T. M., Broekens, J., and Jonker, C. M. (2018). The Potential of the Return Distribution for Exploration in RL. In 35th International Conference on Machine Learning. [ arXiv | http ] |
[Sukhbaatar et al., 2018] | Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv | http ] |
[Taïga et al., 2019] | Taïga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2019). Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment. In 2nd Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning. [ arXiv | http ] |
[Beyer et al., 2019] | Beyer, L., Vincent, D., Teboul, O., Gelly, S., Geist, M., and Pietquin, O. (2019). MULEX: Disentangling Exploitation from Exploration in Deep RL. [ arXiv | http ] |
[Ciosek et al., 2019] | Ciosek, K., Vuong, Q., Loftin, R., and Hofmann, K. (2019). Better Exploration with Optimistic Actor-Critic. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Mavrin et al., 2019] | Mavrin, B., Zhang, S., Yao, H., Kong, L., Wu, K., and Yu, Y. (2019). Distributional Reinforcement Learning for Efficient Exploration. In 36th International Conference on Machine Learning, ICML 2019, volume 2019-June, pages 7775--7785. International Machine Learning Society (IMLS). [ arXiv | http ] |
[Hare, 2019] | Hare, J. (2019). Dealing with Sparse Rewards in Reinforcement Learning. [ arXiv | http ] |
[Shani et al., 2019] | Shani, L., Efroni, Y., and Mannor, S. (2019). Exploration Conscious Reinforcement Learning Revisited. In 36th International Conference on Machine Learning. [ arXiv | http ] |
[Ecoffet et al., 2019] | Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., and Clune, J. (2019). Go-Explore: a New Approach for Hard-Exploration Problems. [ arXiv | http ] |
[Zhang et al., 2019] | Zhang, J., Wetzel, N., Dorka, N., Boedecker, J., and Burgard, W. (2019). Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration. [ arXiv | http ] |
[Nikolov et al., 2019] | Nikolov, N., Kirschner, J., Berkenkamp, F., and Krause, A. (2019). Information-Directed Exploration for Deep Reinforcement Learning. In Iclr 2019. [ arXiv | http ] |
[Yang et al., 2019] | Yang, H.-K., Chiang, P.-H., Hong, M.-F., and Lee, C.-Y. (2019). Exploration via Flow-Based Intrinsic Rewards. [ arXiv | http ] |
[Badia et al., 2020] | Badia, A. P., Sprechmann, P., Vitvitskyi, A., Guo, D., Piot, B., Kapturowski, S., Tieleman, O., Arjovsky, M., Pritzel, A., Bolt, A., and Blundell, C. (2020). Never Give Up: Learning Directed Exploration Strategies. In ICLR 2020. [ arXiv | http ] |
[Taiga et al., 2020] | Taiga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2020). ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT. In ICLR 2020. |
[Wawrzyński, 2012] | Wawrzyński, P. (2012). Autonomous reinforcement learning with experience replay for humanoid gait optimization. Procedia Computer Science, 13(February):205--211. [ DOI ] |
[Wawrzyński, 2014] | Wawrzyński, P. (2014). Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization. International Journal of Humanoid Robotics, 11(03):23. [ DOI | http ] |
[Levine et al., 2015] | Levine, S., Wagener, N., and Abbeel, P. (2015). Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE International Conference on Robotics and Automation (ICRA), volume 2015-June, pages 156--163. IEEE. [ DOI | arXiv | http ] |
[Peng et al., 2016] | Peng, X. B., Berseth, G., and van de Panne, M. (2016). Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics, 35(4):1--12. [ DOI | http ] |
[Kim and Pineau, 2016] | Kim, B. and Pineau, J. (2016). Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning. International Journal of Social Robotics, 8(1):51--66. [ DOI ] |
[Zhu et al., 2016] | Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., and Farhadi, A. (2016). Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. [ DOI | arXiv | http ] |
[Večerík et al., 2017] | Večerík, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T., and Riedmiller, M. (2017). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. pages 1--10. [ arXiv | http ] |
[Gudimella et al., 2017] | Gudimella, A., Story, R., Shaker, M., Kong, R., Brown, M., Shnayder, V., Campos, M., and Berkeley, B. (2017). Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks. Technical report. [ arXiv | .pdf ] |
[Gu et al., 2017] | Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE International Conference on Robotics and Automation, pages 3389--3396. [ DOI | arXiv ] |
[Hwangbo et al., 2017] | Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M. (2017). Control of a Quadrotor with Reinforcement Learning. IEEE Robotics and Automation Letters, 2(644227):1--8. [ DOI | arXiv | http ] |
[Bruce et al., 2017] | Bruce, J., Suenderhauf, N., Mirowski, P., Hadsell, R., and Milford, M. (2017). One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay. (Nips). [ arXiv | http ] |
[Haarnoja et al., 2018] | Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., and Levine, S. (2018). Learning to Walk via Deep Reinforcement Learning. [ arXiv | http ] |
[Goyal et al., 2018] | Goyal, A., Brakel, P., Fedus, W., Lillicrap, T., Levine, S., Larochelle, H., and Bengio, Y. (2018). Recall Traces: Backtracking Models for Efficient Reinforcement Learning. [ arXiv | http ] |
[Lee et al., 2018] | Lee, R., Mou, S., Dasagi, V., Bruce, J., Leitner, J., and Sünderhauf, N. (2018). Zero-shot Sim-to-Real Transfer with Modular Priors. [ arXiv | http ] |
[Mahmood et al., 2018] | Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., and Bergstra, J. (2018). Benchmarking Reinforcement Learning Algorithms on Real-World Robots. [ arXiv | http ] |
[Amos et al., 2018] | Amos, B., Dinh, L., Cabi, S., Rothörl, T., Colmenarejo, S. G., Muldal, A., Erez, T., Tassa, Y., de Freitas, N., and Denil, M. (2018). Learning Awareness Models. In Iclr 2018. [ arXiv | http ] |
[Sharma et al., 2019] | Sharma, P., Pathak, D., and Gupta, A. (2019). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Cabi et al., 2019] | Cabi, S., Colmenarejo, S. G., Novikov, A., Konyushkova, K., Reed, S., Jeong, R., Żolna, K., Aytar, Y., Budden, D., Vecerik, M., Sushkov, O., Barker, D., Scholz, J., Denil, M., de Freitas, N., and Wang, Z. (2019). A Framework for Data-Driven Robotics. [ arXiv | http ] |
[Hwangbo et al., 2019] | Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):1--14. [ DOI | arXiv | http ] |
[Schaul et al., 2015] | Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. 32nd international conference on Machine learning (ICML '15), pages 1312--1320. [ .html ] |
[Higgins et al., 2017] | Higgins, I., Pal, A., Rusu, A. A., Matthey, L., Burgess, C. P., Pritzel, A., Botvinick, M., Blundell, C., and Lerchner, A. (2017). DARLA: Improving Zero-Shot Transfer in Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning. [ arXiv | http ] |
[Gupta et al., 2018] | Gupta, A., Eysenbach, B., Finn, C., and Levine, S. (2018). Unsupervised Meta-Learning for Reinforcement Learning. [ arXiv | http ] |
[Xu et al., 2018] | Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv | .pdf ] |
[Miconi et al., 2018] | Miconi, T., Clune, J., and Stanley, K. O. (2018). Differentiable plasticity: training plastic neural networks with backpropagation. In 35th International Conference on Machine Learning. [ DOI | arXiv | http ] |
[Landolfi et al., 2019] | Landolfi, N. C., Thomas, G., and Ma, T. (2019). A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning. [ arXiv | http ] |
[Nagabandi et al., 2019] | Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. (2019). Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. In Iclr 2019. [ arXiv | http ] |
-
Classical introductions and papers:
[Barto and Mahadevan, 2003] | Barto, A. G. and Mahadevan, S. (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 13(4):341--379. [ DOI | http ] |
[Dietterich, 1998] | Dietterich, T. G. (1998). The MAXQ Method for Hierarchical Reinforcement Learning. In 15th international conference on machine learning, number c, pages 118--126. [ .pdf ] |
[Dietterich, 2000] | Dietterich, T. G. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13(6):227--303. [ DOI | http ] |
[Botvinick et al., 2009] | Botvinick, M. M., Niv, Y., and Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3):262--280. [ DOI | .pdf ] |
[Hengst, 2012] | Hengst, B. (2012). Hierarchical approaches. In Adaptation, Learning, and Optimization, volume 12, pages 293--323. [ DOI | arXiv | http ] |
-
More recent papers:
[Mankowitz et al., 2016] | Mankowitz, D. J., Mann, T. A., and Mannor, S. (2016). Iterative Hierarchical Optimization for Misspecified Problems (IHOMP). [ arXiv | http ] |
[Kulkarni et al., 2016] | Kulkarni, T. D., Narasimhan, K. R., Saeedi CSAIL, A., and Tenenbaum BCS, J. B. (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In 30th Conference on Neural Information Processing Systems (NIPS 2016), pages 1--13. [ arXiv | .pdf ] |
[Arulkumaran et al., 2016] | Arulkumaran, K., Dilokthanakul, N., Shanahan, M., and Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. In Arxiv. [ http ] |
[Rusu et al., 2016] | Rusu, A. A., Gomez Colmenarejo, S., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. (2016). Policy Distillation. arXiv, pages 1--12. [ arXiv | http ] |
[Frans et al., 2017] | Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv | http ] |
[Tessler et al., 2017] | Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J., and Mannor, S. (2017). A deep hierarchical approach to lifelong learning in minecraft. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pages 1553--1561. AAAI press. [ arXiv | http ] |
[Florensa et al., 2017] | Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic Neural Networks for Hierarchical Reinforcement Learning. In Iclr2017, pages 1--12. |
[Vezhnevets et al., 2017] | Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017). FeUdal Networks for Hierarchical Reinforcement Learning. arXiv Preprint. [ arXiv | http ] |
[Henderson et al., 2017] | Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv | http ] |
[Gatto, 2018] | Gatto, S. (2018). Extending the Hierarchical Deep Reinforcement Learning framework. Technical Report September. [ .pdf ] |
[Pang et al., 2018] | Pang, Z.-J., Liu, R.-Z., Meng, Z.-Y., Zhang, Y., Yu, Y., and Lu, T. (2018). On Reinforcement Learning for Full-length Game of StarCraft. [ arXiv | http ] |
[Sukhbaatar et al., 2018b] | Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018b). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv | http ] |
[Rafati and Noelle, 2018] | Rafati, J. and Noelle, D. C. (2018). Learning Representations in Model-Free Hierarchical Reinforcement Learning. [ arXiv | http ] |
[Wei et al., 2018] | Wei, E., Wicke, D., and Luke, S. (2018). Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space. [ arXiv | http ] |
[Haarnoja et al., 2018] | Haarnoja, T., Hartikainen, K., Abbeel, P., and Levine, S. (2018). Latent Space Policies for Hierarchical Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | .pdf ] |
[Song et al., 2018] | Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., and Xu, M. (2018). Diversity-Driven Extensible Hierarchical Reinforcement Learning. In 33rd National Conference on Artificial Intelligence (AAAI 2019). [ arXiv | www: ] |
[Nachum et al., 2018] | Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv | http ] |
[Tuyen et al., 2018] | Tuyen, L. P., Vien, N. A., Layek, A., and Chung, T. (2018). Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes. [ arXiv | http ] |
[Menashe and Stone, 2018] | Menashe, J. and Stone, P. (2018). Escape Room: A Configurable Testbed for Hierarchical Reinforcement Learning. Technical report. [ arXiv | .pdf ] |
[Co-Reyes et al., 2018] | Co-Reyes, J. D., Liu, Y., Gupta, A., Eysenbach, B., Abbeel, P., and Levine, S. (2018). Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. In 35th International Conference on Machine Learning. [ arXiv | http ] |
[Keramati et al., 2018] | Keramati, R., Whang, J., Cho, P., and Brunskill, E. (2018). Strategic Object Oriented Reinforcement Learning. [ http ] |
[Le et al., 2018] | Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | http ] |
[Sukhbaatar et al., 2018a] | Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018a). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv | http ] |
[Peterson et al., 2018] | Peterson, E. J., Müyesser, N. A., Verstynen, T., and Dunovan, K. (2018). Keep it stupid simple. [ arXiv | http ] |
[Machado et al., 2018] | Machado, M. C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., and Campbell, M. (2018). Eigenoption Discovery through the Deep Successor Representation. In Iclr 2018. [ arXiv | http ] |
[Aubret et al., 2019] | Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv | http ] |
[Nachum et al., 2019b] | Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., and Levine, S. (2019b). Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? [ arXiv | http ] |
[Sharma et al., 2019a] | Sharma, A., Gu, S., Levine, S., Kumar, V., and Hausman, K. (2019a). Dynamics-Aware Unsupervised Discovery of Skills. [ arXiv | http ] |
[Osa et al., 2019] | Osa, T., Tangkaratt, V., and Sugiyama, M. (2019). Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization. In Iclr 2019. [ arXiv | http ] |
[Levy et al., 2019] | Levy, A., Konidaris, G., Platt, R., and Saenko, K. (2019). Learning Multi-Level Hierarchies with Hindsight. In Iclr 2019. [ arXiv | http ] |
[Wulfmeier et al., 2019] | Wulfmeier, M., Abdolmaleki, A., Hafner, R., Springenberg, J. T., Neunert, M., Hertweck, T., Lampe, T., Siegel, N., Heess, N., and Riedmiller, M. (2019). Regularized Hierarchical Policies for Compositional Transfer in Robotics. [ arXiv | http ] |
[Modhe et al., 2019] | Modhe, N., Chattopadhyay, P., Sharma, M., Das, A., Parikh, D., Batra, D., and Vedantam, R. (2019). Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning. [ arXiv | http ] |
[Harutyunyan et al., 2019] | Harutyunyan, A., Dabney, W., Borsa, D., Heess, N., Munos, R., and Precup, D. (2019). The Termination Critic. In Proceedings of the 22nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS). [ arXiv | http ] |
[Li et al., 2019] | Li, S., Wang, R., Tang, M., and Zhang, C. (2019). Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Jiang et al., 2019] | Jiang, Y., Gu, S., Murphy, K., and Finn, C. (2019). Language as an Abstraction for Hierarchical Deep Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Paul et al., 2019] | Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11. |
[Saleh et al., 2019] | Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J. H., and Picard, R. (2019). Hierarchical Reinforcement Learning for Open-Domain Dialog. [ arXiv | http ] |
[Jain et al., 2019] | Jain, D., Iscen, A., and Caluwaerts, K. (2019). Hierarchical Reinforcement Learning for Quadruped Locomotion. [ arXiv | http ] |
[Christodoulou et al., 2019] | Christodoulou, P., Lange, R. T., Shafti, A., and Faisal, A. A. (2019). Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions. [ arXiv | http ] |
[Sharma et al., 2019b] | Sharma, P., Pathak, D., and Gupta, A. (2019b). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Nachum et al., 2019a] | Nachum, O., Gu, S., Lee, H., and Levine, S. (2019a). Near-Optimal Representation Learning for Hierarchical Reinforcement Learning. In Iclr 2019. [ arXiv | http ] |
[Zhou and Yu, 2020] | Zhou, W.-J. and Yu, Y. (2020). Temporal-adaptive Hierarchical Reinforcement Learning. [ arXiv | http ] |
[Li et al., 2020] | Li, A. C., Florensa, C., Clavera, I., and Abbeel, P. (2020). Sub-policy Adaptation for Hierarchical Reinforcement Learning. In ICLR 2020. [ arXiv | http ] |
-
Some recent papers on the area:
[Frans et al., 2017] | Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv | http ] |
[Forestier et al., 2017] | Forestier, S., Mollard, Y., and Oudeyer, P.-Y. (2017). Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. [ arXiv | http ] |
[Matiisen et al., 2017] | Matiisen, T., Oliver, A., Cohen, T., and Schulman, J. (2017). Teacher-Student Curriculum Learning. In Deep Reinforcement Learning Symposium, NIPS 2017. [ arXiv | http ] |
[Andrychowicz et al., 2017] | Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O. P., Zaremba, W., Abbeel, P., Zaremba, W., Abbeel, O. P., Zaremba, W., Abbeel, P., and Zaremba, W. (2017). Hindsight experience replay. In 31st Conference on Neural Information Processing Systems (NIPS 2017), pages 5048--5058. [ arXiv | http ] |
[Florensa et al., 2017] | Florensa, C., Held, D., Wulfmeier, M., Zhang, M., and Abbeel, P. (2017). Reverse Curriculum Generation for Reinforcement Learning. In 1st Conference on Robot Learning (CoRL 2017). [ DOI | arXiv | http ] |
[Czarnecki et al., 2018] | Czarnecki, W. M., Jayakumar, S. M., Jaderberg, M., Hasenclever, L., Teh, Y. W., Osindero, S., Heess, N., and Pascanu, R. (2018). Mix & Match - Agent Curricula for Reinforcement Learning. [ arXiv | http ] |
[Nachum et al., 2018] | Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv | http ] |
[Resnick et al., 2018] | Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., and Bruna, J. (2018). Backplay: "Man muss immer umkehren". [ arXiv | http ] |
[Lanka and Wu, 2018] | Lanka, S. and Wu, T. (2018). ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay. [ arXiv | www: ] |
[Justesen and Risi, 2018] | Justesen, N. and Risi, S. (2018). Automated Curriculum Learning by Rewarding Temporally Rare Events. [ arXiv | http ] |
[Eppe et al., 2018] | Eppe, M., Magg, S., and Wermter, S. (2018). Curriculum goal masking for continuous deep reinforcement learning. (Trr 169):2019. [ arXiv | .pdf ] |
[Ivanovic et al., 2018] | Ivanovic, B., Harrison, J., Sharma, A., Chen, M., and Pavone, M. (2018). BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning. [ arXiv | http ] |
[Sukhbaatar et al., 2018] | Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv | http ] |
[Florensa et al., 2018] | Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018). Automatic Goal Generation for Reinforcement Learning Agents. 35th International Conference on Machine Learning. [ DOI | arXiv | http ] |
[Aubret et al., 2019] | Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv | http ] |
[Paul et al., 2019] | Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11. |
[Green et al., 2019] | Green, M. C., Sergent, B., Shandilya, P., and Kumar, V. (2019). Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents. Technical report. [ arXiv | http ] |
[Portelas et al., 2019] | Portelas, R., Colas, C., Hofmann, K., and Oudeyer, P.-Y. (2019). Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments. In 3rd Conference on Robot Learning (CoRL 2019). [ arXiv | http ] |
[Jabri et al., 2019] | Jabri, A., Hsu, K., Eysenbach, B., Gupta, A., Levine, S., and Finn, C. (2019). Unsupervised Curricula for Visual Meta-Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ] |
[Luo et al., 2020] | Luo, S., Kasaei, H., and Schomaker, L. (2020). Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning. [ arXiv | http ] |
[Narvekar et al., 2020] | Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M. E., and Stone, P. (2020). Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. [ arXiv | http ] |
[Wang et al., 2020] | Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., and Stanley, K. O. (2020). Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions. [ arXiv | http ] |