Reinforcement Learning

Site for the second part of the URL course of the Master in Artificial Intelligence

Slides of Lectures:

Course Information: Presentation and description of the course (18th may: updated typo year of project delivery )
Slides for Week 1: Definition of the RL framework. Key elements in RL. Finding the optimal policy using Value Iteration and Policy Iteration (updated 27th March)

Video 1a

Slides for Week 2: Introduction to Model-Free approaches. Monte-Carlo, Q-learning, Sarsa, TD(lambda) (updated 1st April)

Slides for Week 3: Function approximation (updated 15th April)

Slides for Week 4: Deep Reinforcement Learning (DRL) (updated 23th April)

Slides for Week 5: Policy gradient methods (updated 7th May)

Slides for Week 6: Other topics(updated 14th May)

Other resources:

My Mindmap of RL algorithms

Notebooks and software:

Notebook 1: Policy evaluation
Notebook 2: Policy iteration and Value Iteration
Notebook 3: Monte Carlo on grid-world
Notebook 4: Introduction to OpenAI and Q-learning
Notebook 5: Linear FA with Tile coding
Notebook 6: Deep Reinforcement Learning (adapted from RL Adventure)
Notebook 7: Actor Critic RL (soon to appear)

Additional links [Note that slides have also embedded links to main references!]

THE book for Reinforcement Learning:
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction 2nd Edition, 2018
Blog summary of Reinforcement Learning
Nice blog description of Gradient methods

Additional Recent bibliography (with links to implementations)

Miguel Morales. Grokking Deep Reinforcement Learning. MANNING, 2020. [http]
Alexander Zai and Brandon Brown. Deep Reinforcement Learning in Action. MANNING, 2020. [http]
Maxim Lapan. Deep Reinforcement Learning Hands-on. Packt Publishing Ltd., 2nd edition, 2020. [http]

Proposed Topics for practical projects or papers review. See also last set of slides for topics and more references:
Papers in black are specially recommended

Inverse Reinforcement Learning

[Ng and Russell, 2000]	Ng, A. Y. and Russell, S. (2000). Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, volume 0, pages 663--670. [ .pdf ]
[Abbeel and Ng, 2004]	Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. 21st international conference on Machine learning - ICML '04, page 1. [ DOI \| arXiv \| http ]
[Ziebart et al., 2008]	Ziebart, B. D., Maas, A., Bagnell, J. A., and Dey, A. K. (2008). Maximum Entropy Inverse Reinforcement Learning. AAAI Conference on Artificial Intelligence, pages 1433--1438. [ arXiv \| .pdf ]
[Dvijotham and Todorov, 2010]	Dvijotham, K. and Todorov, E. (2010). Inverse Optimal Control with Linearly-Solvable MDPs. In International Conference on Machine Learning (ICML), pages 335--342. [ .pdf ]
[Boularias et al., 2011]	Boularias, A., Kober, J., and Peters, J. (2011). Relative Entropy Inverse Reinforcement Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 182--189. [ .pdf ]
[Wulfmeier et al., 2015]	Wulfmeier, M., Ondruska, P., and Posner, I. (2015). Maximum Entropy Deep Inverse Reinforcement Learning. [ arXiv \| http ]
[Alger, 2016]	Alger, M. (2016). Deep Inverse Reinforcement Learning. Technical report. [ .pdf ]
[Finn et al., 2016]	Finn, C., Levine, S., and Abbeel, P. (2016). Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. In 33rd International Conference on Machine Learning. [ DOI \| arXiv \| .html ]
[Baram et al., 2017]	Baram, N., Anschel, O., Caspi, I., and Mannor, S. (2017). End-to-End Differentiable Adversarial Imitation Learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 390--399. [ .html ]
[Wulfmeier et al., 2017]	Wulfmeier, M., Rao, D., Wang, D. Z., Ondruska, P., and Posner, I. (2017). Large-scale cost function learning for path planning using deep inverse reinforcement learning. International Journal of Robotics Research, 36(10):1073--1087. [ DOI \| http ]
[Henderson et al., 2017]	Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv \| http ]
[Metelli et al., 2017]	Metelli, A. M., Pirotta, M., and Restelli, M. (2017). Compatible Reward Inverse Reinforcement Learning. In Advances in Neural Information Processing Systems. [ .pdf ]
[Halperin, 2017]	Halperin, I. (2017). Inverse Reinforcement Learning for Marketing. [ arXiv \| http ]
[Fu et al., 2018]	Fu, J., Luo, K., and Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. In 6th International Conference on Learning Representations, ICLR 2018. [ arXiv \| http ]
[Arora and Doshi, 2018]	Arora, S. and Doshi, P. (2018). A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. [ arXiv \| http ]
[Le et al., 2018]	Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv \| http ]
[Reddy et al., 2018]	Reddy, S., Dragan, A. D., and Levine, S. (2018). What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning. Technical report. [ http ]
[Tucker et al., 2018]	Tucker, A., Gleave, A., and Russell, S. (2018). Inverse reinforcement learning for video games. [ arXiv \| http ]
[Xu et al., 2018]	Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv \| .pdf ]
[Haug et al., 2018]	Haug, L., Tschiatschek, S., and Singla, A. (2018). Teaching Inverse Reinforcement Learners via Features and Demonstrations. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). [ arXiv \| .pdf ]
[Behbahani et al., 2018]	Behbahani, F., Shiarlis, K., Chen, X., Kurin, V., Kasewa, S., Stirbu, C., Gomes, J., Paul, S., Oliehoek, F. A., Messias, J., and Whiteson, S. (2018). Learning from Demonstration in the Wild. [ arXiv \| http ]
[Brown and Niekum, 2018]	Brown, D. S. and Niekum, S. (2018). Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications. [ arXiv \| http ]
[Gao et al., 2018]	Gao, Y., Xu, H., Lin, J., Yu, F., Levine, S., and Darrell, T. (2018). Reinforcement Learning from Imperfect Demonstrations. 35th International Conference on Machine Learning. [ arXiv \| http ]
[Qureshi et al., 2019]	Qureshi, A. H., Boots, B., and Yip, M. C. (2019). Adversarial Imitation via Variational Inverse Reinforcement Learning. In Iclr 2019. [ arXiv \| http ]
[Kinose and Taniguchi, 2019]	Kinose, A. and Taniguchi, T. (2019). Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model. [ arXiv \| http ]

Game theory

Exploration

[Stadie et al., 2015]	Stadie, B. C., Levine, S., and Abbeel, P. (2015). Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arXiv, pages 1--11. [ arXiv \| http ]
[Bent et al., 2015]	Bent, O., Rashid, T., and Whiteson, S. (2015). Improving Exploration in Deep Reinforcement Learning. [ .pdf ]
[Ostrovski et al., 2017]	Ostrovski, G., Bellemare, M. G., Van Den Oord, A., and Munos, R. (2017). Count-Based Exploration with Neural Density Models. In 34th International Conference on Machine Learning. [ arXiv \| .pdf ]
[Aslanides et al., 2017]	Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 1403--1410, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI \| arXiv \| http ]
[Fu et al., 2017]	Fu, J., Co-Reyes, J. D., and Levine, S. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv \| http ]
[Tang et al., 2017]	Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. (2017). #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv \| http ]
[Martin et al., 2017]	Martin, J., Narayanan S., S., Everitt, T., and Hutter, M. (2017). Count-Based Exploration in Feature Space for Reinforcement Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 2471--2478, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI \| arXiv \| http ]
[Achiam and Sastry, 2017]	Achiam, J. and Sastry, S. (2017). Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning. [ arXiv \| http ]
[Pathak et al., 2017]	Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017). Curiosity-Driven Exploration by Self-Supervised Prediction. In Proceedings of the 34th International Conference on Machine Learning, volume 2017-July. [ DOI \| arXiv \| http ]
[Kaushik et al., 2018]	Kaushik, R., Chatzilygeroudis, K., and Mouret, J.-B. (2018). Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. In 2nd Conference on Robot Learning (CoRL 2018). [ arXiv \| http ]
[Colas et al., 2018]	Colas, C., Sigaud, O., and Oudeyer, P.-Y. (2018). GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In 35th International Conference on Machine Learning. [ arXiv \| http ]
[Savinov et al., 2018]	Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., and Gelly, S. (2018). Episodic Curiosity through Reachability. [ arXiv \| http ]
[Burda et al., 2018]	Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2018). Exploration by Random Network Distillation. [ arXiv \| http ]
[Osband et al., 2018]	Osband, I., Aslanides, J., and Cassirer, A. (2018). Randomized Prior Functions for Deep Reinforcement Learning. [ arXiv \| http ]
[Junyent et al., 2018]	Junyent, M., Jonsson, A., and Gómez, V. (2018). Improving width-based planning with compact policies. In ICML 2018 workshop: Planning and Learning, pages 1--21. [ .pdf ]
[Plappert et al., 2018]	Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2018). Parameter Space Noise for Exploration. In Iclr 2018. [ arXiv \| http ]
[Shyam et al., 2018]	Shyam, P., Jaśkowski, W., and Gomez, F. (2018). Model-Based Active Exploration. [ arXiv \| http ]
[Azizzadenesheli et al., 2018]	Azizzadenesheli, K., Brunskill, E., and Anandkumar, A. (2018). Efficient Exploration through Bayesian Deep Q-Networks. Technical report. [ arXiv \| .pdf ]
[Haber et al., 2018]	Haber, N., Mrowca, D., Fei-Fei, L., and Yamins, D. L. K. (2018). Learning to Play with Intrinsically-Motivated Self-Aware Agents. [ DOI \| arXiv \| http ]
[Gupta et al., 2018]	Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine, S. (2018). Meta-Reinforcement Learning of Structured Exploration Strategies. [ arXiv \| http ]
[Hong et al., 2018]	Hong, Z.-W., Fu, T.-J., Shann, T.-Y., Chang, Y.-H., and Lee, C.-Y. (2018). Adversarial Exploration Strategy for Self-Supervised Imitation Learning. [ .pdf ]
[Fortunato et al., 2018]	Fortunato, M., Azar, M. G., Piot, B., Menick, J., Hessel, M., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., and Legg, S. (2018). Noisy Networks for Exploration. In Iclr 2018. [ arXiv \| http ]
[Moerland et al., 2018]	Moerland, T. M., Broekens, J., and Jonker, C. M. (2018). The Potential of the Return Distribution for Exploration in RL. In 35th International Conference on Machine Learning. [ arXiv \| http ]
[Sukhbaatar et al., 2018]	Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv \| http ]
[Taïga et al., 2019]	Taïga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2019). Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment. In 2nd Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning. [ arXiv \| http ]
[Beyer et al., 2019]	Beyer, L., Vincent, D., Teboul, O., Gelly, S., Geist, M., and Pietquin, O. (2019). MULEX: Disentangling Exploitation from Exploration in Deep RL. [ arXiv \| http ]
[Ciosek et al., 2019]	Ciosek, K., Vuong, Q., Loftin, R., and Hofmann, K. (2019). Better Exploration with Optimistic Actor-Critic. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Mavrin et al., 2019]	Mavrin, B., Zhang, S., Yao, H., Kong, L., Wu, K., and Yu, Y. (2019). Distributional Reinforcement Learning for Efficient Exploration. In 36th International Conference on Machine Learning, ICML 2019, volume 2019-June, pages 7775--7785. International Machine Learning Society (IMLS). [ arXiv \| http ]
[Hare, 2019]	Hare, J. (2019). Dealing with Sparse Rewards in Reinforcement Learning. [ arXiv \| http ]
[Shani et al., 2019]	Shani, L., Efroni, Y., and Mannor, S. (2019). Exploration Conscious Reinforcement Learning Revisited. In 36th International Conference on Machine Learning. [ arXiv \| http ]
[Ecoffet et al., 2019]	Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., and Clune, J. (2019). Go-Explore: a New Approach for Hard-Exploration Problems. [ arXiv \| http ]
[Zhang et al., 2019]	Zhang, J., Wetzel, N., Dorka, N., Boedecker, J., and Burgard, W. (2019). Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration. [ arXiv \| http ]
[Nikolov et al., 2019]	Nikolov, N., Kirschner, J., Berkenkamp, F., and Krause, A. (2019). Information-Directed Exploration for Deep Reinforcement Learning. In Iclr 2019. [ arXiv \| http ]
[Yang et al., 2019]	Yang, H.-K., Chiang, P.-H., Hong, M.-F., and Lee, C.-Y. (2019). Exploration via Flow-Based Intrinsic Rewards. [ arXiv \| http ]
[Badia et al., 2020]	Badia, A. P., Sprechmann, P., Vitvitskyi, A., Guo, D., Piot, B., Kapturowski, S., Tieleman, O., Arjovsky, M., Pritzel, A., Bolt, A., and Blundell, C. (2020). Never Give Up: Learning Directed Exploration Strategies. In ICLR 2020. [ arXiv \| http ]
[Taiga et al., 2020]	Taiga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2020). ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT. In ICLR 2020.

POMDP

Memory approaches

Deep Learning and RL:

Robotics

[Wawrzyński, 2012]	Wawrzyński, P. (2012). Autonomous reinforcement learning with experience replay for humanoid gait optimization. Procedia Computer Science, 13(February):205--211. [ DOI ]
[Wawrzyński, 2014]	Wawrzyński, P. (2014). Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization. International Journal of Humanoid Robotics, 11(03):23. [ DOI \| http ]
[Levine et al., 2015]	Levine, S., Wagener, N., and Abbeel, P. (2015). Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE International Conference on Robotics and Automation (ICRA), volume 2015-June, pages 156--163. IEEE. [ DOI \| arXiv \| http ]
[Peng et al., 2016]	Peng, X. B., Berseth, G., and van de Panne, M. (2016). Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics, 35(4):1--12. [ DOI \| http ]
[Kim and Pineau, 2016]	Kim, B. and Pineau, J. (2016). Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning. International Journal of Social Robotics, 8(1):51--66. [ DOI ]
[Zhu et al., 2016]	Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., and Farhadi, A. (2016). Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. [ DOI \| arXiv \| http ]
[Večerík et al., 2017]	Večerík, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T., and Riedmiller, M. (2017). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. pages 1--10. [ arXiv \| http ]
[Gudimella et al., 2017]	Gudimella, A., Story, R., Shaker, M., Kong, R., Brown, M., Shnayder, V., Campos, M., and Berkeley, B. (2017). Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks. Technical report. [ arXiv \| .pdf ]
[Gu et al., 2017]	Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE International Conference on Robotics and Automation, pages 3389--3396. [ DOI \| arXiv ]
[Hwangbo et al., 2017]	Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M. (2017). Control of a Quadrotor with Reinforcement Learning. IEEE Robotics and Automation Letters, 2(644227):1--8. [ DOI \| arXiv \| http ]
[Bruce et al., 2017]	Bruce, J., Suenderhauf, N., Mirowski, P., Hadsell, R., and Milford, M. (2017). One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay. (Nips). [ arXiv \| http ]
[Haarnoja et al., 2018]	Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., and Levine, S. (2018). Learning to Walk via Deep Reinforcement Learning. [ arXiv \| http ]
[Goyal et al., 2018]	Goyal, A., Brakel, P., Fedus, W., Lillicrap, T., Levine, S., Larochelle, H., and Bengio, Y. (2018). Recall Traces: Backtracking Models for Efficient Reinforcement Learning. [ arXiv \| http ]
[Lee et al., 2018]	Lee, R., Mou, S., Dasagi, V., Bruce, J., Leitner, J., and Sünderhauf, N. (2018). Zero-shot Sim-to-Real Transfer with Modular Priors. [ arXiv \| http ]
[Mahmood et al., 2018]	Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., and Bergstra, J. (2018). Benchmarking Reinforcement Learning Algorithms on Real-World Robots. [ arXiv \| http ]
[Amos et al., 2018]	Amos, B., Dinh, L., Cabi, S., Rothörl, T., Colmenarejo, S. G., Muldal, A., Erez, T., Tassa, Y., de Freitas, N., and Denil, M. (2018). Learning Awareness Models. In Iclr 2018. [ arXiv \| http ]
[Sharma et al., 2019]	Sharma, P., Pathak, D., and Gupta, A. (2019). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Cabi et al., 2019]	Cabi, S., Colmenarejo, S. G., Novikov, A., Konyushkova, K., Reed, S., Jeong, R., Żolna, K., Aytar, Y., Budden, D., Vecerik, M., Sushkov, O., Barker, D., Scholz, J., Denil, M., de Freitas, N., and Wang, Z. (2019). A Framework for Data-Driven Robotics. [ arXiv \| http ]
[Hwangbo et al., 2019]	Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):1--14. [ DOI \| arXiv \| http ]

Optimization

Transfer learning

[Schaul et al., 2015]	Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. 32nd international conference on Machine learning (ICML '15), pages 1312--1320. [ .html ]
[Higgins et al., 2017]	Higgins, I., Pal, A., Rusu, A. A., Matthey, L., Burgess, C. P., Pritzel, A., Botvinick, M., Blundell, C., and Lerchner, A. (2017). DARLA: Improving Zero-Shot Transfer in Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning. [ arXiv \| http ]
[Gupta et al., 2018]	Gupta, A., Eysenbach, B., Finn, C., and Levine, S. (2018). Unsupervised Meta-Learning for Reinforcement Learning. [ arXiv \| http ]
[Xu et al., 2018]	Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv \| .pdf ]
[Miconi et al., 2018]	Miconi, T., Clune, J., and Stanley, K. O. (2018). Differentiable plasticity: training plastic neural networks with backpropagation. In 35th International Conference on Machine Learning. [ DOI \| arXiv \| http ]
[Landolfi et al., 2019]	Landolfi, N. C., Thomas, G., and Ma, T. (2019). A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning. [ arXiv \| http ]
[Nagabandi et al., 2019]	Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. (2019). Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. In Iclr 2019. [ arXiv \| http ]

Hierarchical reinforcement learning

Classical introductions and papers:

[Barto and Mahadevan, 2003]	Barto, A. G. and Mahadevan, S. (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 13(4):341--379. [ DOI \| http ]
[Dietterich, 1998]	Dietterich, T. G. (1998). The MAXQ Method for Hierarchical Reinforcement Learning. In 15th international conference on machine learning, number c, pages 118--126. [ .pdf ]
[Dietterich, 2000]	Dietterich, T. G. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13(6):227--303. [ DOI \| http ]
[Botvinick et al., 2009]	Botvinick, M. M., Niv, Y., and Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3):262--280. [ DOI \| .pdf ]
[Hengst, 2012]	Hengst, B. (2012). Hierarchical approaches. In Adaptation, Learning, and Optimization, volume 12, pages 293--323. [ DOI \| arXiv \| http ]

More recent papers:

[Mankowitz et al., 2016]	Mankowitz, D. J., Mann, T. A., and Mannor, S. (2016). Iterative Hierarchical Optimization for Misspecified Problems (IHOMP). [ arXiv \| http ]
[Kulkarni et al., 2016]	Kulkarni, T. D., Narasimhan, K. R., Saeedi CSAIL, A., and Tenenbaum BCS, J. B. (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In 30th Conference on Neural Information Processing Systems (NIPS 2016), pages 1--13. [ arXiv \| .pdf ]
[Arulkumaran et al., 2016]	Arulkumaran, K., Dilokthanakul, N., Shanahan, M., and Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. In Arxiv. [ http ]
[Rusu et al., 2016]	Rusu, A. A., Gomez Colmenarejo, S., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. (2016). Policy Distillation. arXiv, pages 1--12. [ arXiv \| http ]
[Frans et al., 2017]	Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv \| http ]
[Tessler et al., 2017]	Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J., and Mannor, S. (2017). A deep hierarchical approach to lifelong learning in minecraft. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pages 1553--1561. AAAI press. [ arXiv \| http ]
[Florensa et al., 2017]	Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic Neural Networks for Hierarchical Reinforcement Learning. In Iclr2017, pages 1--12.
[Vezhnevets et al., 2017]	Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017). FeUdal Networks for Hierarchical Reinforcement Learning. arXiv Preprint. [ arXiv \| http ]
[Henderson et al., 2017]	Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv \| http ]
[Gatto, 2018]	Gatto, S. (2018). Extending the Hierarchical Deep Reinforcement Learning framework. Technical Report September. [ .pdf ]
[Pang et al., 2018]	Pang, Z.-J., Liu, R.-Z., Meng, Z.-Y., Zhang, Y., Yu, Y., and Lu, T. (2018). On Reinforcement Learning for Full-length Game of StarCraft. [ arXiv \| http ]
[Sukhbaatar et al., 2018b]	Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018b). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv \| http ]
[Rafati and Noelle, 2018]	Rafati, J. and Noelle, D. C. (2018). Learning Representations in Model-Free Hierarchical Reinforcement Learning. [ arXiv \| http ]
[Wei et al., 2018]	Wei, E., Wicke, D., and Luke, S. (2018). Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space. [ arXiv \| http ]
[Haarnoja et al., 2018]	Haarnoja, T., Hartikainen, K., Abbeel, P., and Levine, S. (2018). Latent Space Policies for Hierarchical Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv \| .pdf ]
[Song et al., 2018]	Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., and Xu, M. (2018). Diversity-Driven Extensible Hierarchical Reinforcement Learning. In 33rd National Conference on Artificial Intelligence (AAAI 2019). [ arXiv \| www: ]
[Nachum et al., 2018]	Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv \| http ]
[Tuyen et al., 2018]	Tuyen, L. P., Vien, N. A., Layek, A., and Chung, T. (2018). Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes. [ arXiv \| http ]
[Menashe and Stone, 2018]	Menashe, J. and Stone, P. (2018). Escape Room: A Configurable Testbed for Hierarchical Reinforcement Learning. Technical report. [ arXiv \| .pdf ]
[Co-Reyes et al., 2018]	Co-Reyes, J. D., Liu, Y., Gupta, A., Eysenbach, B., Abbeel, P., and Levine, S. (2018). Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. In 35th International Conference on Machine Learning. [ arXiv \| http ]
[Keramati et al., 2018]	Keramati, R., Whang, J., Cho, P., and Brunskill, E. (2018). Strategic Object Oriented Reinforcement Learning. [ http ]
[Le et al., 2018]	Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv \| http ]
[Sukhbaatar et al., 2018a]	Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018a). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv \| http ]
[Peterson et al., 2018]	Peterson, E. J., Müyesser, N. A., Verstynen, T., and Dunovan, K. (2018). Keep it stupid simple. [ arXiv \| http ]
[Machado et al., 2018]	Machado, M. C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., and Campbell, M. (2018). Eigenoption Discovery through the Deep Successor Representation. In Iclr 2018. [ arXiv \| http ]
[Aubret et al., 2019]	Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv \| http ]
[Nachum et al., 2019b]	Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., and Levine, S. (2019b). Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? [ arXiv \| http ]
[Sharma et al., 2019a]	Sharma, A., Gu, S., Levine, S., Kumar, V., and Hausman, K. (2019a). Dynamics-Aware Unsupervised Discovery of Skills. [ arXiv \| http ]
[Osa et al., 2019]	Osa, T., Tangkaratt, V., and Sugiyama, M. (2019). Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization. In Iclr 2019. [ arXiv \| http ]
[Levy et al., 2019]	Levy, A., Konidaris, G., Platt, R., and Saenko, K. (2019). Learning Multi-Level Hierarchies with Hindsight. In Iclr 2019. [ arXiv \| http ]
[Wulfmeier et al., 2019]	Wulfmeier, M., Abdolmaleki, A., Hafner, R., Springenberg, J. T., Neunert, M., Hertweck, T., Lampe, T., Siegel, N., Heess, N., and Riedmiller, M. (2019). Regularized Hierarchical Policies for Compositional Transfer in Robotics. [ arXiv \| http ]
[Modhe et al., 2019]	Modhe, N., Chattopadhyay, P., Sharma, M., Das, A., Parikh, D., Batra, D., and Vedantam, R. (2019). Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning. [ arXiv \| http ]
[Harutyunyan et al., 2019]	Harutyunyan, A., Dabney, W., Borsa, D., Heess, N., Munos, R., and Precup, D. (2019). The Termination Critic. In Proceedings of the 22nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS). [ arXiv \| http ]
[Li et al., 2019]	Li, S., Wang, R., Tang, M., and Zhang, C. (2019). Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Jiang et al., 2019]	Jiang, Y., Gu, S., Murphy, K., and Finn, C. (2019). Language as an Abstraction for Hierarchical Deep Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Paul et al., 2019]	Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11.
[Saleh et al., 2019]	Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J. H., and Picard, R. (2019). Hierarchical Reinforcement Learning for Open-Domain Dialog. [ arXiv \| http ]
[Jain et al., 2019]	Jain, D., Iscen, A., and Caluwaerts, K. (2019). Hierarchical Reinforcement Learning for Quadruped Locomotion. [ arXiv \| http ]
[Christodoulou et al., 2019]	Christodoulou, P., Lange, R. T., Shafti, A., and Faisal, A. A. (2019). Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions. [ arXiv \| http ]
[Sharma et al., 2019b]	Sharma, P., Pathak, D., and Gupta, A. (2019b). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Nachum et al., 2019a]	Nachum, O., Gu, S., Lee, H., and Levine, S. (2019a). Near-Optimal Representation Learning for Hierarchical Reinforcement Learning. In Iclr 2019. [ arXiv \| http ]
[Zhou and Yu, 2020]	Zhou, W.-J. and Yu, Y. (2020). Temporal-adaptive Hierarchical Reinforcement Learning. [ arXiv \| http ]
[Li et al., 2020]	Li, A. C., Florensa, C., Clavera, I., and Abbeel, P. (2020). Sub-policy Adaptation for Hierarchical Reinforcement Learning. In ICLR 2020. [ arXiv \| http ]

Shaping and curriculum learning

Some recent papers on the area:

[Frans et al., 2017]	Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv \| http ]
[Forestier et al., 2017]	Forestier, S., Mollard, Y., and Oudeyer, P.-Y. (2017). Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. [ arXiv \| http ]
[Matiisen et al., 2017]	Matiisen, T., Oliver, A., Cohen, T., and Schulman, J. (2017). Teacher-Student Curriculum Learning. In Deep Reinforcement Learning Symposium, NIPS 2017. [ arXiv \| http ]
[Andrychowicz et al., 2017]	Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O. P., Zaremba, W., Abbeel, P., Zaremba, W., Abbeel, O. P., Zaremba, W., Abbeel, P., and Zaremba, W. (2017). Hindsight experience replay. In 31st Conference on Neural Information Processing Systems (NIPS 2017), pages 5048--5058. [ arXiv \| http ]
[Florensa et al., 2017]	Florensa, C., Held, D., Wulfmeier, M., Zhang, M., and Abbeel, P. (2017). Reverse Curriculum Generation for Reinforcement Learning. In 1st Conference on Robot Learning (CoRL 2017). [ DOI \| arXiv \| http ]
[Czarnecki et al., 2018]	Czarnecki, W. M., Jayakumar, S. M., Jaderberg, M., Hasenclever, L., Teh, Y. W., Osindero, S., Heess, N., and Pascanu, R. (2018). Mix & Match - Agent Curricula for Reinforcement Learning. [ arXiv \| http ]
[Nachum et al., 2018]	Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv \| http ]
[Resnick et al., 2018]	Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., and Bruna, J. (2018). Backplay: "Man muss immer umkehren". [ arXiv \| http ]
[Lanka and Wu, 2018]	Lanka, S. and Wu, T. (2018). ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay. [ arXiv \| www: ]
[Justesen and Risi, 2018]	Justesen, N. and Risi, S. (2018). Automated Curriculum Learning by Rewarding Temporally Rare Events. [ arXiv \| http ]
[Eppe et al., 2018]	Eppe, M., Magg, S., and Wermter, S. (2018). Curriculum goal masking for continuous deep reinforcement learning. (Trr 169):2019. [ arXiv \| .pdf ]
[Ivanovic et al., 2018]	Ivanovic, B., Harrison, J., Sharma, A., Chen, M., and Pavone, M. (2018). BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning. [ arXiv \| http ]
[Sukhbaatar et al., 2018]	Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv \| http ]
[Florensa et al., 2018]	Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018). Automatic Goal Generation for Reinforcement Learning Agents. 35th International Conference on Machine Learning. [ DOI \| arXiv \| http ]
[Aubret et al., 2019]	Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv \| http ]
[Paul et al., 2019]	Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11.
[Green et al., 2019]	Green, M. C., Sergent, B., Shandilya, P., and Kumar, V. (2019). Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents. Technical report. [ arXiv \| http ]
[Portelas et al., 2019]	Portelas, R., Colas, C., Hofmann, K., and Oudeyer, P.-Y. (2019). Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments. In 3rd Conference on Robot Learning (CoRL 2019). [ arXiv \| http ]
[Jabri et al., 2019]	Jabri, A., Hsu, K., Eysenbach, B., Gupta, A., Levine, S., and Finn, C. (2019). Unsupervised Curricula for Visual Meta-Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv \| http ]
[Luo et al., 2020]	Luo, S., Kasaei, H., and Schomaker, L. (2020). Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning. [ arXiv \| http ]
[Narvekar et al., 2020]	Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M. E., and Stone, P. (2020). Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. [ arXiv \| http ]
[Wang et al., 2020]	Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., and Stanley, K. O. (2020). Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions. [ arXiv \| http ]