Mario Martin's home page

Universitat Politècnica de Catalunya

Reinforcement Learning

Site for the second part of the URL course of the Master in Artificial Intelligence

Slides of Lectures:

Other resources:

Notebooks and programs:

Proposed paper and practical projects:
  1. Inverse Reinforcement Learning. Given examples of a policy, obtain the underlying reinforcement function. Useful for learning from examples and discover complex reinforcement functions, for instance, driving or walking.
  2. Game theory: Application of RL algorithms to learn to cooperate or compete in a Game Theoretical framework. 
  3. Exploration: Review of different techniques for exploration. Work will consist in a comparative study of the different techniques
  4. POMDP: When the state is not completely observable the problem is no longer Markovian and, usually, RL algorithms fail to learn. One way to solve this problem is expanding the MDP problem to a Partially Observable MDP (POMDP). Work will consist in a serching and reviewing state of the art algorithms using this approach.
  5. Memory approaches: One solution to the problem of incomplete information about the world is to include to the current perception information about past perceptions that may help to disambiguate from current perception the true state of the agent. This can be done in several ways, for instance using a window of last experiences (like Atari games) or, for instance, using recursive neural networks (RNN or LSTM) to maintain in the state information of past situations. The
  6. Deep Learning and RL: Search for the state of the art algorithms using Deep Learning in the current literature. Do that for Value based and Actor-Critic approaches.
  7. Robotics: Find and review some succesful applications of RL to Robotics. Explain the most common problems that appear when applying RL to robotics and which are the most succesful ways to deal with them.
  8. Optimization: RL has been used to optimization problems from finding good architectures for RL to recommendation systems.
  9. Transfer learning: One problem with RL is that learning to optimize one goal has to start from scratch even when the system has learned to solve another task in the same domain. Transfering knowledge from one behavior to another one to speed the learning process is called Transfer learning.
  10. Hierarchical reinforcement learning: One approach to solving very complex task consists in breaking the problem into smaller subtasks that can be learned in a hierarchical way. Automatic decomposition of one complex task into subtasks is called hierarchical RL.
  11. Classical introductions and papers: Recent papers on the topic:
  12. Shaping and curriculum learning: When learning to solve too complex tasks from scratch, the agent usually does not learn anything. One way to alleviate this problem is to propose the agent a sequence of easier problems (curriculum) to be learnt before facing to the most complex problem. Some recent papers on the area:
  13. Options: Similarly to hierarchical learning, learning higher level generic actions (we call options) that can be used to decompose a complex action by the execution of these generic actions instead of the elemental actions. Each option is represented as an MDP to solve a sub-goal of the original problem. The most interesting part of this approach is how to learn automatically useful options.

Nice additional links [Note that slides have also embedded links to main references!]