Mario Martin's home page

Universitat Politècnica de Catalunya

Reinforcement Learning

Site for the second part of the URL course of the Master in Artificial Intelligence

Slides of Lectures:

Other resources:

Notebooks and software:

Additional links [Note that slides have also embedded links to main references!]

Additional Recent bibliography (with links to implementations)


Proposed Topics for practical projects or papers review. See also last set of slides for topics and more references:
Papers in black are specially recommended

    Inverse Reinforcement Learning. Given examples of a policy, obtain the underlying reinforcement function. Useful for learning from examples and discover complex reinforcement functions, for instance, driving or walking. Classical introductions and papers:
    [Ng and Russell, 2000] Ng, A. Y. and Russell, S. (2000). Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, volume 0, pages 663--670. [ .pdf ]
    [Abbeel and Ng, 2004] Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. 21st international conference on Machine learning - ICML '04, page 1. [ DOI | arXiv | http ]
    [Ziebart et al., 2008] Ziebart, B. D., Maas, A., Bagnell, J. A., and Dey, A. K. (2008). Maximum Entropy Inverse Reinforcement Learning. AAAI Conference on Artificial Intelligence, pages 1433--1438. [ arXiv | .pdf ]
    [Dvijotham and Todorov, 2010] Dvijotham, K. and Todorov, E. (2010). Inverse Optimal Control with Linearly-Solvable MDPs. In International Conference on Machine Learning (ICML), pages 335--342. [ .pdf ]
    [Boularias et al., 2011] Boularias, A., Kober, J., and Peters, J. (2011). Relative Entropy Inverse Reinforcement Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 182--189. [ .pdf ]
    [Wulfmeier et al., 2015] Wulfmeier, M., Ondruska, P., and Posner, I. (2015). Maximum Entropy Deep Inverse Reinforcement Learning.arXiv | http ]
    [Alger, 2016] Alger, M. (2016). Deep Inverse Reinforcement Learning. Technical report. [ .pdf ]
    [Finn et al., 2016] Finn, C., Levine, S., and Abbeel, P. (2016). Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. In 33rd International Conference on Machine Learning. [ DOI | arXiv | .html ]
    [Baram et al., 2017] Baram, N., Anschel, O., Caspi, I., and Mannor, S. (2017). End-to-End Differentiable Adversarial Imitation Learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 390--399. [ .html ]
    [Wulfmeier et al., 2017] Wulfmeier, M., Rao, D., Wang, D. Z., Ondruska, P., and Posner, I. (2017). Large-scale cost function learning for path planning using deep inverse reinforcement learning. International Journal of Robotics Research, 36(10):1073--1087. [ DOI | http ]
    [Henderson et al., 2017] Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv | http ]
    [Metelli et al., 2017] Metelli, A. M., Pirotta, M., and Restelli, M. (2017). Compatible Reward Inverse Reinforcement Learning. In Advances in Neural Information Processing Systems. [ .pdf ]
    [Halperin, 2017] Halperin, I. (2017). Inverse Reinforcement Learning for Marketing. [ arXiv | http ]
    [Fu et al., 2018] Fu, J., Luo, K., and Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. In 6th International Conference on Learning Representations, ICLR 2018. [ arXiv | http ]
    [Arora and Doshi, 2018] Arora, S. and Doshi, P. (2018). A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. [ arXiv | http ]
    [Le et al., 2018] Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | http ]
    [Reddy et al., 2018] Reddy, S., Dragan, A. D., and Levine, S. (2018). What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning. Technical report. [ http ]
    [Tucker et al., 2018] Tucker, A., Gleave, A., and Russell, S. (2018). Inverse reinforcement learning for video games. [ arXiv | http ]
    [Xu et al., 2018] Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv | .pdf ]
    [Haug et al., 2018] Haug, L., Tschiatschek, S., and Singla, A. (2018). Teaching Inverse Reinforcement Learners via Features and Demonstrations. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). [ arXiv | .pdf ]
    [Behbahani et al., 2018] Behbahani, F., Shiarlis, K., Chen, X., Kurin, V., Kasewa, S., Stirbu, C., Gomes, J., Paul, S., Oliehoek, F. A., Messias, J., and Whiteson, S. (2018). Learning from Demonstration in the Wild. [ arXiv | http ]
    [Brown and Niekum, 2018] Brown, D. S. and Niekum, S. (2018). Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications. [ arXiv | http ]
    [Gao et al., 2018] Gao, Y., Xu, H., Lin, J., Yu, F., Levine, S., and Darrell, T. (2018). Reinforcement Learning from Imperfect Demonstrations. 35th International Conference on Machine Learning. [ arXiv | http ]
    [Qureshi et al., 2019] Qureshi, A. H., Boots, B., and Yip, M. C. (2019). Adversarial Imitation via Variational Inverse Reinforcement Learning. In Iclr 2019. [ arXiv | http ]
    [Kinose and Taniguchi, 2019] Kinose, A. and Taniguchi, T. (2019). Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model. [ arXiv | http ]

      Game theory: Application of RL algorithms on Multi-Agent systems to learn to cooperate or compete in a Game Theoretical framework. 

        Exploration: Review of different techniques for exploration. Work will consist in a comparative study of the different techniques
        [Stadie et al., 2015] Stadie, B. C., Levine, S., and Abbeel, P. (2015). Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arXiv, pages 1--11. [ arXiv | http ]
        [Bent et al., 2015] Bent, O., Rashid, T., and Whiteson, S. (2015). Improving Exploration in Deep Reinforcement Learning..pdf ]
        [Ostrovski et al., 2017] Ostrovski, G., Bellemare, M. G., Van Den Oord, A., and Munos, R. (2017). Count-Based Exploration with Neural Density Models. In 34th International Conference on Machine Learning. [ arXiv | .pdf ]
        [Aslanides et al., 2017] Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 1403--1410, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI | arXiv | http ]
        [Fu et al., 2017] Fu, J., Co-Reyes, J. D., and Levine, S. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv | http ]
        [Tang et al., 2017] Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., and Abbeel, P. (2017). #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017). [ arXiv | http ]
        [Martin et al., 2017] Martin, J., Narayanan S., S., Everitt, T., and Hutter, M. (2017). Count-Based Exploration in Feature Space for Reinforcement Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 2471--2478, California. International Joint Conferences on Artificial Intelligence Organization. [ DOI | arXiv | http ]
        [Achiam and Sastry, 2017] Achiam, J. and Sastry, S. (2017). Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning. arXiv | http ]
        [Pathak et al., 2017] Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017). Curiosity-Driven Exploration by Self-Supervised Prediction. In Proceedings of the 34th International Conference on Machine Learning, volume 2017-July. [ DOI | arXiv | http ]
        [Kaushik et al., 2018] Kaushik, R., Chatzilygeroudis, K., and Mouret, J.-B. (2018). Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. In 2nd Conference on Robot Learning (CoRL 2018). [ arXiv | http ]
        [Colas et al., 2018] Colas, C., Sigaud, O., and Oudeyer, P.-Y. (2018). GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In 35th International Conference on Machine Learning. [ arXiv | http ]
        [Savinov et al., 2018] Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., and Gelly, S. (2018). Episodic Curiosity through Reachability. [ arXiv | http ]
        [Burda et al., 2018] Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2018). Exploration by Random Network Distillation. [ arXiv | http ]
        [Osband et al., 2018] Osband, I., Aslanides, J., and Cassirer, A. (2018). Randomized Prior Functions for Deep Reinforcement Learning. [ arXiv | http ]
        [Junyent et al., 2018] Junyent, M., Jonsson, A., and Gómez, V. (2018). Improving width-based planning with compact policies. In ICML 2018 workshop: Planning and Learning, pages 1--21. [ .pdf ]
        [Plappert et al., 2018] Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2018). Parameter Space Noise for Exploration. In Iclr 2018. [ arXiv | http ]
        [Shyam et al., 2018] Shyam, P., Jaśkowski, W., and Gomez, F. (2018). Model-Based Active Exploration. [ arXiv | http ]
        [Azizzadenesheli et al., 2018] Azizzadenesheli, K., Brunskill, E., and Anandkumar, A. (2018). Efficient Exploration through Bayesian Deep Q-Networks. Technical report. [ arXiv | .pdf ]
        [Haber et al., 2018] Haber, N., Mrowca, D., Fei-Fei, L., and Yamins, D. L. K. (2018). Learning to Play with Intrinsically-Motivated Self-Aware Agents. [ DOI | arXiv | http ]
        [Gupta et al., 2018] Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine, S. (2018). Meta-Reinforcement Learning of Structured Exploration Strategies. [ arXiv | http ]
        [Hong et al., 2018] Hong, Z.-W., Fu, T.-J., Shann, T.-Y., Chang, Y.-H., and Lee, C.-Y. (2018). Adversarial Exploration Strategy for Self-Supervised Imitation Learning..pdf ]
        [Fortunato et al., 2018] Fortunato, M., Azar, M. G., Piot, B., Menick, J., Hessel, M., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., and Legg, S. (2018). Noisy Networks for Exploration. In Iclr 2018. [ arXiv | http ]
        [Moerland et al., 2018] Moerland, T. M., Broekens, J., and Jonker, C. M. (2018). The Potential of the Return Distribution for Exploration in RL. In 35th International Conference on Machine Learning. [ arXiv | http ]
        [Sukhbaatar et al., 2018] Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv | http ]
        [Taïga et al., 2019] Taïga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2019). Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment. In 2nd Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning. [ arXiv | http ]
        [Beyer et al., 2019] Beyer, L., Vincent, D., Teboul, O., Gelly, S., Geist, M., and Pietquin, O. (2019). MULEX: Disentangling Exploitation from Exploration in Deep RL. [ arXiv | http ]
        [Ciosek et al., 2019] Ciosek, K., Vuong, Q., Loftin, R., and Hofmann, K. (2019). Better Exploration with Optimistic Actor-Critic. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
        [Mavrin et al., 2019] Mavrin, B., Zhang, S., Yao, H., Kong, L., Wu, K., and Yu, Y. (2019). Distributional Reinforcement Learning for Efficient Exploration. In 36th International Conference on Machine Learning, ICML 2019, volume 2019-June, pages 7775--7785. International Machine Learning Society (IMLS). [ arXiv | http ]
        [Hare, 2019] Hare, J. (2019). Dealing with Sparse Rewards in Reinforcement Learning. [ arXiv | http ]
        [Shani et al., 2019] Shani, L., Efroni, Y., and Mannor, S. (2019). Exploration Conscious Reinforcement Learning Revisited. In 36th International Conference on Machine Learning. [ arXiv | http ]
        [Ecoffet et al., 2019] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., and Clune, J. (2019). Go-Explore: a New Approach for Hard-Exploration Problems. [ arXiv | http ]
        [Zhang et al., 2019] Zhang, J., Wetzel, N., Dorka, N., Boedecker, J., and Burgard, W. (2019). Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration. [ arXiv | http ]
        [Nikolov et al., 2019] Nikolov, N., Kirschner, J., Berkenkamp, F., and Krause, A. (2019). Information-Directed Exploration for Deep Reinforcement Learning. In Iclr 2019. [ arXiv | http ]
        [Yang et al., 2019] Yang, H.-K., Chiang, P.-H., Hong, M.-F., and Lee, C.-Y. (2019). Exploration via Flow-Based Intrinsic Rewards. [ arXiv | http ]
        [Badia et al., 2020] Badia, A. P., Sprechmann, P., Vitvitskyi, A., Guo, D., Piot, B., Kapturowski, S., Tieleman, O., Arjovsky, M., Pritzel, A., Bolt, A., and Blundell, C. (2020). Never Give Up: Learning Directed Exploration Strategies. In ICLR 2020. [ arXiv | http ]
        [Taiga et al., 2020] Taiga, A. A., Fedus, W., Machado, M. C., Courville, A., and Bellemare, M. G. (2020). ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT. In ICLR 2020.

          POMDP: When the state is not completely observable the problem is no longer Markovian and, usually, RL algorithms fail to learn. One way to solve this problem is expanding the MDP problem to a Partially Observable MDP (POMDP). Work will consist in a serching and reviewing state of the art algorithms using this approach.

            Memory approaches: One solution to the problem of incomplete information about the world is to include to the current perception information about past perceptions that may help to disambiguate from current perception the true state of the agent. This can be done in several ways, for instance using a window of last experiences (like Atari games) or, for instance, using recursive neural networks (RNN or LSTM) to maintain in the state information of past situations. The

              Deep Learning and RL: Search for the state of the art algorithms using Deep Learning in the current literature. Do that for Value based and Actor-Critic approaches.

                Robotics: Find and review some succesful applications of RL to Robotics. Explain the most common problems that appear when applying RL to robotics and which are the most succesful ways to deal with them.
                [Wawrzyński, 2012] Wawrzyński, P. (2012). Autonomous reinforcement learning with experience replay for humanoid gait optimization. Procedia Computer Science, 13(February):205--211. [ DOI ]
                [Wawrzyński, 2014] Wawrzyński, P. (2014). Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization. International Journal of Humanoid Robotics, 11(03):23. [ DOI | http ]
                [Levine et al., 2015] Levine, S., Wagener, N., and Abbeel, P. (2015). Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE International Conference on Robotics and Automation (ICRA), volume 2015-June, pages 156--163. IEEE. [ DOI | arXiv | http ]
                [Peng et al., 2016] Peng, X. B., Berseth, G., and van de Panne, M. (2016). Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics, 35(4):1--12. [ DOI | http ]
                [Kim and Pineau, 2016] Kim, B. and Pineau, J. (2016). Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning. International Journal of Social Robotics, 8(1):51--66. [ DOI ]
                [Zhu et al., 2016] Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., and Farhadi, A. (2016). Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. [ DOI | arXiv | http ]
                [Večerík et al., 2017] Večerík, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T., and Riedmiller, M. (2017). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. pages 1--10. [ arXiv | http ]
                [Gudimella et al., 2017] Gudimella, A., Story, R., Shaker, M., Kong, R., Brown, M., Shnayder, V., Campos, M., and Berkeley, B. (2017). Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks. Technical report. [ arXiv | .pdf ]
                [Gu et al., 2017] Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE International Conference on Robotics and Automation, pages 3389--3396. [ DOI | arXiv ]
                [Hwangbo et al., 2017] Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M. (2017). Control of a Quadrotor with Reinforcement Learning. IEEE Robotics and Automation Letters, 2(644227):1--8. [ DOI | arXiv | http ]
                [Bruce et al., 2017] Bruce, J., Suenderhauf, N., Mirowski, P., Hadsell, R., and Milford, M. (2017). One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay. (Nips). [ arXiv | http ]
                [Haarnoja et al., 2018] Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., and Levine, S. (2018). Learning to Walk via Deep Reinforcement Learning. [ arXiv | http ]
                [Goyal et al., 2018] Goyal, A., Brakel, P., Fedus, W., Lillicrap, T., Levine, S., Larochelle, H., and Bengio, Y. (2018). Recall Traces: Backtracking Models for Efficient Reinforcement Learning. [ arXiv | http ]
                [Lee et al., 2018] Lee, R., Mou, S., Dasagi, V., Bruce, J., Leitner, J., and Sünderhauf, N. (2018). Zero-shot Sim-to-Real Transfer with Modular Priors. [ arXiv | http ]
                [Mahmood et al., 2018] Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., and Bergstra, J. (2018). Benchmarking Reinforcement Learning Algorithms on Real-World Robots. [ arXiv | http ]
                [Amos et al., 2018] Amos, B., Dinh, L., Cabi, S., Rothörl, T., Colmenarejo, S. G., Muldal, A., Erez, T., Tassa, Y., de Freitas, N., and Denil, M. (2018). Learning Awareness Models. In Iclr 2018. [ arXiv | http ]
                [Sharma et al., 2019] Sharma, P., Pathak, D., and Gupta, A. (2019). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
                [Cabi et al., 2019] Cabi, S., Colmenarejo, S. G., Novikov, A., Konyushkova, K., Reed, S., Jeong, R., Żolna, K., Aytar, Y., Budden, D., Vecerik, M., Sushkov, O., Barker, D., Scholz, J., Denil, M., de Freitas, N., and Wang, Z. (2019). A Framework for Data-Driven Robotics. [ arXiv | http ]
                [Hwangbo et al., 2019] Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):1--14. [ DOI | arXiv | http ]

                  Optimization: RL has been used to optimization problems from finding good architectures for RL to recommendation systems.

                    Transfer learning: One problem with RL is that learning to optimize one goal has to start from scratch even when the system has learned to solve another task in the same domain. Transfering knowledge from one behavior to another one to speed the learning process is called Transfer learning.
                    [Schaul et al., 2015] Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. 32nd international conference on Machine learning (ICML '15), pages 1312--1320. [ .html ]
                    [Higgins et al., 2017] Higgins, I., Pal, A., Rusu, A. A., Matthey, L., Burgess, C. P., Pritzel, A., Botvinick, M., Blundell, C., and Lerchner, A. (2017). DARLA: Improving Zero-Shot Transfer in Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning. [ arXiv | http ]
                    [Gupta et al., 2018] Gupta, A., Eysenbach, B., Finn, C., and Levine, S. (2018). Unsupervised Meta-Learning for Reinforcement Learning. [ arXiv | http ]
                    [Xu et al., 2018] Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2018). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. [ arXiv | .pdf ]
                    [Miconi et al., 2018] Miconi, T., Clune, J., and Stanley, K. O. (2018). Differentiable plasticity: training plastic neural networks with backpropagation. In 35th International Conference on Machine Learning. [ DOI | arXiv | http ]
                    [Landolfi et al., 2019] Landolfi, N. C., Thomas, G., and Ma, T. (2019). A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning. [ arXiv | http ]
                    [Nagabandi et al., 2019] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., and Finn, C. (2019). Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. In Iclr 2019. [ arXiv | http ]

                      Hierarchical reinforcement learning: One approach to solving very complex task consists in breaking the problem into smaller subtasks that can be learned in a hierarchical way. Automatic decomposition of one complex task into subtasks is called hierarchical RL.
                        Classical introductions and papers:
                      [Barto and Mahadevan, 2003] Barto, A. G. and Mahadevan, S. (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 13(4):341--379. [ DOI | http ]
                      [Dietterich, 1998] Dietterich, T. G. (1998). The MAXQ Method for Hierarchical Reinforcement Learning. In 15th international conference on machine learning, number c, pages 118--126. [ .pdf ]
                      [Dietterich, 2000] Dietterich, T. G. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13(6):227--303. [ DOI | http ]
                      [Botvinick et al., 2009] Botvinick, M. M., Niv, Y., and Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3):262--280. [ DOI | .pdf ]
                      [Hengst, 2012] Hengst, B. (2012). Hierarchical approaches. In Adaptation, Learning, and Optimization, volume 12, pages 293--323. [ DOI | arXiv | http ]
                        More recent papers:
                      [Mankowitz et al., 2016] Mankowitz, D. J., Mann, T. A., and Mannor, S. (2016). Iterative Hierarchical Optimization for Misspecified Problems (IHOMP). [ arXiv | http ]
                      [Kulkarni et al., 2016] Kulkarni, T. D., Narasimhan, K. R., Saeedi CSAIL, A., and Tenenbaum BCS, J. B. (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In 30th Conference on Neural Information Processing Systems (NIPS 2016), pages 1--13. [ arXiv | .pdf ]
                      [Arulkumaran et al., 2016] Arulkumaran, K., Dilokthanakul, N., Shanahan, M., and Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. In Arxiv. [ http ]
                      [Rusu et al., 2016] Rusu, A. A., Gomez Colmenarejo, S., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. (2016). Policy Distillation. arXiv, pages 1--12. [ arXiv | http ]
                      [Frans et al., 2017] Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv | http ]
                      [Tessler et al., 2017] Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J., and Mannor, S. (2017). A deep hierarchical approach to lifelong learning in minecraft. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pages 1553--1561. AAAI press. [ arXiv | http ]
                      [Florensa et al., 2017] Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic Neural Networks for Hierarchical Reinforcement Learning. In Iclr2017, pages 1--12.
                      [Vezhnevets et al., 2017] Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017). FeUdal Networks for Hierarchical Reinforcement Learning. arXiv Preprint. [ arXiv | http ]
                      [Henderson et al., 2017] Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., and Precup, D. (2017). OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning. [ arXiv | http ]
                      [Gatto, 2018] Gatto, S. (2018). Extending the Hierarchical Deep Reinforcement Learning framework. Technical Report September. [ .pdf ]
                      [Pang et al., 2018] Pang, Z.-J., Liu, R.-Z., Meng, Z.-Y., Zhang, Y., Yu, Y., and Lu, T. (2018). On Reinforcement Learning for Full-length Game of StarCraft. [ arXiv | http ]
                      [Sukhbaatar et al., 2018b] Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018b). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv | http ]
                      [Rafati and Noelle, 2018] Rafati, J. and Noelle, D. C. (2018). Learning Representations in Model-Free Hierarchical Reinforcement Learning. [ arXiv | http ]
                      [Wei et al., 2018] Wei, E., Wicke, D., and Luke, S. (2018). Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space. [ arXiv | http ]
                      [Haarnoja et al., 2018] Haarnoja, T., Hartikainen, K., Abbeel, P., and Levine, S. (2018). Latent Space Policies for Hierarchical Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | .pdf ]
                      [Song et al., 2018] Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., and Xu, M. (2018). Diversity-Driven Extensible Hierarchical Reinforcement Learning. In 33rd National Conference on Artificial Intelligence (AAAI 2019). [ arXiv | www: ]
                      [Nachum et al., 2018] Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv | http ]
                      [Tuyen et al., 2018] Tuyen, L. P., Vien, N. A., Layek, A., and Chung, T. (2018). Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes. [ arXiv | http ]
                      [Menashe and Stone, 2018] Menashe, J. and Stone, P. (2018). Escape Room: A Configurable Testbed for Hierarchical Reinforcement Learning. Technical report. [ arXiv | .pdf ]
                      [Co-Reyes et al., 2018] Co-Reyes, J. D., Liu, Y., Gupta, A., Eysenbach, B., Abbeel, P., and Levine, S. (2018). Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. In 35th International Conference on Machine Learning. [ arXiv | http ]
                      [Keramati et al., 2018] Keramati, R., Whang, J., Cho, P., and Brunskill, E. (2018). Strategic Object Oriented Reinforcement Learning. [ http ]
                      [Le et al., 2018] Le, H. M., Jiang, N., Agarwal, A., Dudík, M., Yue, Y., and Daumé, H. (2018). Hierarchical Imitation and Reinforcement Learning. In 35th International Conference on Machine Learning. [ arXiv | http ]
                      [Sukhbaatar et al., 2018a] Sukhbaatar, S., Denton, E., Szlam, A., and Fergus, R. (2018a). Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning. [ arXiv | http ]
                      [Peterson et al., 2018] Peterson, E. J., Müyesser, N. A., Verstynen, T., and Dunovan, K. (2018). Keep it stupid simple. [ arXiv | http ]
                      [Machado et al., 2018] Machado, M. C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., and Campbell, M. (2018). Eigenoption Discovery through the Deep Successor Representation. In Iclr 2018. [ arXiv | http ]
                      [Aubret et al., 2019] Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv | http ]
                      [Nachum et al., 2019b] Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., and Levine, S. (2019b). Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? [ arXiv | http ]
                      [Sharma et al., 2019a] Sharma, A., Gu, S., Levine, S., Kumar, V., and Hausman, K. (2019a). Dynamics-Aware Unsupervised Discovery of Skills. [ arXiv | http ]
                      [Osa et al., 2019] Osa, T., Tangkaratt, V., and Sugiyama, M. (2019). Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization. In Iclr 2019. [ arXiv | http ]
                      [Levy et al., 2019] Levy, A., Konidaris, G., Platt, R., and Saenko, K. (2019). Learning Multi-Level Hierarchies with Hindsight. In Iclr 2019. [ arXiv | http ]
                      [Wulfmeier et al., 2019] Wulfmeier, M., Abdolmaleki, A., Hafner, R., Springenberg, J. T., Neunert, M., Hertweck, T., Lampe, T., Siegel, N., Heess, N., and Riedmiller, M. (2019). Regularized Hierarchical Policies for Compositional Transfer in Robotics. [ arXiv | http ]
                      [Modhe et al., 2019] Modhe, N., Chattopadhyay, P., Sharma, M., Das, A., Parikh, D., Batra, D., and Vedantam, R. (2019). Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning. [ arXiv | http ]
                      [Harutyunyan et al., 2019] Harutyunyan, A., Dabney, W., Borsa, D., Heess, N., Munos, R., and Precup, D. (2019). The Termination Critic. In Proceedings of the 22nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS). [ arXiv | http ]
                      [Li et al., 2019] Li, S., Wang, R., Tang, M., and Zhang, C. (2019). Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
                      [Jiang et al., 2019] Jiang, Y., Gu, S., Murphy, K., and Finn, C. (2019). Language as an Abstraction for Hierarchical Deep Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
                      [Paul et al., 2019] Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11.
                      [Saleh et al., 2019] Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J. H., and Picard, R. (2019). Hierarchical Reinforcement Learning for Open-Domain Dialog. [ arXiv | http ]
                      [Jain et al., 2019] Jain, D., Iscen, A., and Caluwaerts, K. (2019). Hierarchical Reinforcement Learning for Quadruped Locomotion. [ arXiv | http ]
                      [Christodoulou et al., 2019] Christodoulou, P., Lange, R. T., Shafti, A., and Faisal, A. A. (2019). Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions. [ arXiv | http ]
                      [Sharma et al., 2019b] Sharma, P., Pathak, D., and Gupta, A. (2019b). Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
                      [Nachum et al., 2019a] Nachum, O., Gu, S., Lee, H., and Levine, S. (2019a). Near-Optimal Representation Learning for Hierarchical Reinforcement Learning. In Iclr 2019. [ arXiv | http ]
                      [Zhou and Yu, 2020] Zhou, W.-J. and Yu, Y. (2020). Temporal-adaptive Hierarchical Reinforcement Learning. [ arXiv | http ]
                      [Li et al., 2020] Li, A. C., Florensa, C., Clavera, I., and Abbeel, P. (2020). Sub-policy Adaptation for Hierarchical Reinforcement Learning. In ICLR 2020. [ arXiv | http ]

                        Shaping and curriculum learning: When learning to solve too complex tasks from scratch, the agent usually does not learn anything. One way to alleviate this problem is to propose the agent a sequence of easier problems (curriculum) to be learnt before facing to the most complex problem. See also slides of last lecture for other links.
                          Some recent papers on the area:
                        [Frans et al., 2017] Frans, K., Ho, J., Chen, X., Abbeel, P., and Schulman, J. (2017). Meta Learning Shared Hierarchies. [ arXiv | http ]
                        [Forestier et al., 2017] Forestier, S., Mollard, Y., and Oudeyer, P.-Y. (2017). Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. [ arXiv | http ]
                        [Matiisen et al., 2017] Matiisen, T., Oliver, A., Cohen, T., and Schulman, J. (2017). Teacher-Student Curriculum Learning. In Deep Reinforcement Learning Symposium, NIPS 2017. [ arXiv | http ]
                        [Andrychowicz et al., 2017] Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O. P., Zaremba, W., Abbeel, P., Zaremba, W., Abbeel, O. P., Zaremba, W., Abbeel, P., and Zaremba, W. (2017). Hindsight experience replay. In 31st Conference on Neural Information Processing Systems (NIPS 2017), pages 5048--5058. [ arXiv | http ]
                        [Florensa et al., 2017] Florensa, C., Held, D., Wulfmeier, M., Zhang, M., and Abbeel, P. (2017). Reverse Curriculum Generation for Reinforcement Learning. In 1st Conference on Robot Learning (CoRL 2017). [ DOI | arXiv | http ]
                        [Czarnecki et al., 2018] Czarnecki, W. M., Jayakumar, S. M., Jaderberg, M., Hasenclever, L., Teh, Y. W., Osindero, S., Heess, N., and Pascanu, R. (2018). Mix & Match - Agent Curricula for Reinforcement Learning. [ arXiv | http ]
                        [Nachum et al., 2018] Nachum, O., Gu, S., Lee, H., and Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. In 32nd Conference on Neural Information Processing Systems (NIPS 2018), pages 3303--3313. Neural information processing systems foundation. [ arXiv | http ]
                        [Resnick et al., 2018] Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., and Bruna, J. (2018). Backplay: "Man muss immer umkehren". [ arXiv | http ]
                        [Lanka and Wu, 2018] Lanka, S. and Wu, T. (2018). ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay. [ arXiv | www: ]
                        [Justesen and Risi, 2018] Justesen, N. and Risi, S. (2018). Automated Curriculum Learning by Rewarding Temporally Rare Events. [ arXiv | http ]
                        [Eppe et al., 2018] Eppe, M., Magg, S., and Wermter, S. (2018). Curriculum goal masking for continuous deep reinforcement learning. (Trr 169):2019. [ arXiv | .pdf ]
                        [Ivanovic et al., 2018] Ivanovic, B., Harrison, J., Sharma, A., Chen, M., and Pavone, M. (2018). BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning. [ arXiv | http ]
                        [Sukhbaatar et al., 2018] Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., and Fergus, R. (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In Iclr 2018. [ arXiv | http ]
                        [Florensa et al., 2018] Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018). Automatic Goal Generation for Reinforcement Learning Agents. 35th International Conference on Machine Learning. [ DOI | arXiv | http ]
                        [Aubret et al., 2019] Aubret, A., Matignon, L., and Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. [ arXiv | http ]
                        [Paul et al., 2019] Paul, S., van Baar, J., and Roy-Chowdhury, A. K. (2019). Learning from Trajectories via Subgoal Discovery. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 1--11.
                        [Green et al., 2019] Green, M. C., Sergent, B., Shandilya, P., and Kumar, V. (2019). Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents. Technical report. [ arXiv | http ]
                        [Portelas et al., 2019] Portelas, R., Colas, C., Hofmann, K., and Oudeyer, P.-Y. (2019). Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments. In 3rd Conference on Robot Learning (CoRL 2019). [ arXiv | http ]
                        [Jabri et al., 2019] Jabri, A., Hsu, K., Eysenbach, B., Gupta, A., Levine, S., and Finn, C. (2019). Unsupervised Curricula for Visual Meta-Reinforcement Learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). [ arXiv | http ]
                        [Luo et al., 2020] Luo, S., Kasaei, H., and Schomaker, L. (2020). Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning. [ arXiv | http ]
                        [Narvekar et al., 2020] Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M. E., and Stone, P. (2020). Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. [ arXiv | http ]
                        [Wang et al., 2020] Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., and Stanley, K. O. (2020). Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions. [ arXiv | http ]