# Reinforcement Learning

Site for the second part of the URL course of the Master in Artificial IntelligenceSlides of Lectures:

- Lecture 0: Presentation and description of the curse
- Lecture 1: Definition of the RL framework. Key elements in RL. Finding the optimal policy using Value Iteration and Policy Iteration
- Lecture 2: Introduction to Model-Free approaches. Monte-Carlo, Q-learning, Sarsa, TD(lambda)
**[fixed typos on 21/03/19]** - Lecture 3: Function approximation
**[fixed typos on 21/03/19]** - Lecture 4: Deep Learning for RL
- Lecture 5: Policy gradient methods
**[slides added on 28/03/19]** - Lecture 6: Other topics

Other resources:

- My Mindmap of RL algorithms

Notebooks and programs:

- Notebook: Policy evaluation
- Notebook: Policy iteration and Value Iteration
- Notebook: Monte Carlo on grid-world
- Notebook: Introduction to OpenAI and Q-learning
- Notebook: Linear FA with Tile coding
- Notebook: Deep Learning (soon to appear)
- Notebook: Actor Critic RL (soon to appear)

Proposed paper and practical projects:

**Inverse Reinforcement Learning**. Given examples of a policy, obtain the underlying reinforcement function. Useful for learning from examples and discover complex reinforcement functions, for instance, driving or walking.**Game theory**: Application of RL algorithms to learn to cooperate or compete in a Game Theoretical framework.**Exploration**: Review of different techniques for exploration. Work will consist in a comparative study of the different techniques**POMDP**: When the state is not completely observable the problem is no longer Markovian and, usually, RL algorithms fail to learn. One way to solve this problem is expanding the MDP problem to a Partially Observable MDP (POMDP). Work will consist in a serching and reviewing state of the art algorithms using this approach.**Memory approaches**: One solution to the problem of incomplete information about the world is to include to the current perception information about past perceptions that may help to disambiguate from current perception the true state of the agent. This can be done in several ways, for instance using a window of last experiences (like Atari games) or, for instance, using recursive neural networks (RNN or LSTM) to maintain in the state information of past situations. The**Deep Learning and RL:**Search for the state of the art algorithms using Deep Learning in the current literature. Do that for Value based and Actor-Critic approaches.**Robotics**: Find and review some succesful applications of RL to Robotics. Explain the most common problems that appear when applying RL to robotics and which are the most succesful ways to deal with them.**Optimization**: RL has been used to optimization problems from finding good architectures for RL to recommendation systems.**Transfer learning**: One problem with RL is that learning to optimize one goal has to start from scratch even when the system has learned to solve another task in the same domain. Transfering knowledge from one behavior to another one to speed the learning process is called Transfer learning.**Hierarchical reinforcement learning**: One approach to solving very complex task consists in breaking the problem into smaller subtasks that can be learned in a hierarchical way. Automatic decomposition of one complex task into subtasks is called hierarchical RL.
Classical introductions and papers:
- Recent Advances in Hierarchical Reinforcement Learning
- The MAXQ Method for Hierarchical Reinforcement Learning
- Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
- Hengst B. (2012) Hierarchical Approaches. In: Wiering M., van Otterlo M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg
- S. Vezhnevets
*et al.*, “FeUdal Networks for Hierarchical Reinforcement Learning,”*arXiv Prepr.*, 2017. - Sukhbaatar, E. Denton, A. Szlam, and R. Fergus, “Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning,” Nov. 2018.
- Florensa, Y. Duan, and P. Abbeel, “STOCHASTIC NEURAL NETWORKS FOR HIERARCHICAL REINFORCEMENT LEARNING,” in
*Iclr 2017*, 2017, pp. 1–12. - Nachum, S. Gu, H. Lee, and S. Levine, “Data-Efficient Hierarchical Reinforcement Learning,” in
*32nd Conference on Neural Information Processing Systems (NIPS 2018)*, 2018. - Osa, V. Tangkaratt, and M. Sugiyama, “Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization,” in
*Iclr 2019*, 2019. - D. Kulkarni, K. R. Narasimhan, A. Saeedi CSAIL, and J. B. Tenenbaum BCS, “Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation,”
*ArXiv*, pp. 1–13, 2016. - Data-Efficient Hierarchical Reinforcement Learning (Arxiv Preprint)
- Learning Multi-Level Hierarchies with Hindsight (ArXiv Preprint)
- Learning and Transfer of Modulated Locomotor Controllers (ArXiv Preprint)
- On Reinforcement Learning for Full-length Game of StarCraft (ArXiv Preprint)
- Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation (ArXiv Preprint)
- Meta Learning Shared Hierarchies (ArXiv Preprint)
- Strategic Attentive Writer for Learning Macro-Actions (ArXiv Preprint)
- A Deep Hierarchical Approach to Lifelong Learning in Minecraft (ArXiv Preprint)
- Planning with Abstract Markov Decision Processes (AAAI Publications)
- Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) (ArXiv Preprint)
- Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning (ArXiv Preprint)
- Learning Representations in Model-Free Hierarchical Reinforcement Learning (ArXiv Preprint)
**Shaping and curriculum learning**: When learning to solve too complex tasks from scratch, the agent usually does not learn anything. One way to alleviate this problem is to propose the agent a sequence of easier problems (curriculum) to be learnt before facing to the most complex problem. Some recent papers on the area:- T. Matiisen, A. Oliver, T. Cohen, and J. Schulman, “Teacher-Student Curriculum Learning ,” in Deep Reinforcement Learning Symposium, NIPS 2017, 2017.
- B. Ivanovic, J. Harrison, A. Sharma, M. Chen, and M. Pavone, “BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning,” Jun. 2018.
- N. Justesen and S. Risi, “Automated Curriculum Learning by Rewarding Temporally Rare Events,” Mar. 2018.
- C. Resnick, R. Raileanu, S. Kapoor, A. Peysakhovich, K. Cho, and J. Bruna, “Backplay: ‘Man muss immer umkehren,’” Jul. 2018.
**Options**: Similarly to hierarchical learning, learning higher level generic actions (we call options) that can be used to decompose a complex action by the execution of these generic actions instead of the elemental actions. Each option is represented as an MDP to solve a sub-goal of the original problem. The most interesting part of this approach is how to learn automatically useful options.

Nice additional links [Note that slides have also embedded links to main references!]

- Nice summary of Gradient methods
- THE book for Deep Neural Networks