Seminar in Operational Research (MAE839)

Από Wiki Τμήματος Μαθηματικών

General

School

School of Science

Academic Unit

Department of Mathematics

Level of Studies

Undergraduate

Course Code

ΜΑΕ839

Semester

8

Course Title

Seminar in Operational Research: Markov Decision Processes and Reinforcement Learning

Independent Teaching Activities

Lectures (Weekly Hours: 3, Credits: 6)

Course Type

Special Background

Prerequisite Courses It is desirable to have an elementary knowledge of probability theory, Markov chains, and linear/dynamic programming.
Language of Instruction and Examinations

Greek

Is the Course Offered to Erasmus Students

No

Course Website (URL) See eCourse, the Learning Management System maintained by the University of Ioannina.

Learning Outcomes

Learning outcomes

The theory of Markov decision processes (MDPs) - also known under the names sequential decision theory, stochastic control or stochastic dynamic programming - studies sequential optimisation of stochastic systems by controlling their transition mechanism over time. In particular, it provides solution methodologies for a wide range of problems concerning sequential decisions in a random environment, statistically modeled by a finite-state Markov chain. The optimal strategy is calculated by appropriate algorithms, which are derived and illustrated in the first part of the course. MDPs have applications in many areas including revenue management (e.g., hotel, airline, and rental car pricing), control of queues, financial engineering, telecommunications, manufacturing, and economics. MDPs provides the mathematical foundation of Reinforcement Learning (RL). RL is one of the most important and emerging categories of Machine Learning, due to the great flexibility of its algorithms in managing large state spaces in problems modeled as MDPs. The aim of the second part of the course is to present the basic principles of Reinforcement Learning, emphasizing both the necessary mathematical framework in which it is structured, and the algorithms, many of which are installed in the R and Matlab programs, for their better understanding.

General Competences

The course aims to enable students to:

  • Become familiar with the general theory and techniques related to MDPs.
  • Become familiar with the basic algorithmic methods for MDPs and become familiar with the reinforcement learning environment.

At the end of the course, the student will be able to:

  • evaluate well-known theorems in the field of MDPs,
  • apply algorithmic methods for MDPs to real-world examples using R, Matlab software.
  • implement common RL algorithms in code.

Syllabus

MDPs in discrete time on a finite time horizon, MDPs in discrete time on an infinite time horizon. Properties of the Bellman equation, contraction and monotonicity, policy improvement algorithms, gradient descent, mirror descent and stochastic gradient descent. Basic principles of reinforcement learning, introduction to a simplified subclass of Reinforcement Learning problems also known as Multi-Armed Bandits. Reinforcement learning methods based on successive approximation algorithms (value iteration): Q-learning based on a single trajectory, with and without function approximation, offline and online versions. Reinforcement learning methods based on policy improvement algorithms (policy iteration): Policy gradient, natural policy gradient.

Teaching and Learning Methods - Evaluation

Delivery

Details will be determined by the teaching professor. Methods include presentations contacted by the students.

Use of Information and Communications Technology

Details will be determined by the teaching professor.

Teaching Methods
Activity Semester Workload
Study in class 39
Other activities determined by the teaching professor 111
Course total 150
Student Performance Evaluation

Take home problems plus a presentation. You work on the problems in groups of size 2. Other means of evaluation can be determined by the teaching professor.

Attached Bibliography

Bibliography is suggested by the teaching professor, depending on the subject under study.