DATA 37200: Learning, Decisions, and Limits (Winter 2025)
Basic Information
Class Location: JCL 011Class Time: Tu/Thu 12:30 to 1:50 pm
Instructor: Frederic Koehler and Haifeng Xu
- Office: Searle 203 (Frederic) and Crerar 260 (Haifeng)
- Office Hour: Frederic (Tuesday 4:30 - 5:30 pm); Haifeng (Thursday 4 - 5 pm)
- Email: adityaprasad AT uchicago.edu
- Office Hours: Wed 2-3 pm at JCL 2nd floor common area
Learning Objectives: (1) Understand basic toolkits for online learning and online decision making, as a complement to offline learning paradigm; (2) Prepare students to understand state-of-the-art RL algorithms, such as RLHF and AlphaGo training.
Announcements
- Dec 1: Course website is up!
- Jan 18: Homework 1 is out and due in two weeks to Gradescope (if haven't yet, you can join via code 4JNR73)
- Feb 14: HW 2 available.
Course Description
This is a graduate course on theory of machine learning. While ML theory has multiple branches in general, this course is designed to cover basics of online learning, along with basics of reinforcement learning. It aims to establish the foundation for students who are interested in conducting research related to online decision making, learning, and optimization. The course will introduce formal formulations for fundamental problems/models in this space, describe basic algorithmic ideas for solving these models, rigorously discuss performances of these algorithms as well as these problems’ fundamental limits (e.g., minmax/lower bounds). En route, we will develop necessary toolkits for algorithm development and lower bound proofs.
Topics covered in this course, and tentative syllabus (up to small changes):
- (week 1) Concentration bound, and UCB
- (week 2) Information-theoretic lower bound for KL and distribution testing
- (weeks 3-4) Online prediction, introduction to contextual bandits, online gradient descent
- (weeks 4-5) Elliptical potential lemma, and linear contextual bandits, alternative to UCB method
- (week 6) MDP, dynamic programming
- (week 6) Policy iteration and value iteration
- (week 7) Reinforcement learning and optimism principle
- (week 8) multi-agent RL, equilibria, counterfactual regret minimization, self-play
- (week 9) Sampled recent learning paradigms: RLHF, etc.
Lectures and Readings
Lec No. | Lectures | Readings |
---|---|---|
1 (Jan 7) | Intro and MAB [slides] | Various concentration inequalities and concentration for martingales |
2 (Jan 9) | UCB [slides] | Chapter 1 of this MAB book |
3-4 (Jan 14,16) | MAB lower bound [slides] | Chapter 2 of this MAB book |
5-6 (Jan 21) | Lecture 5: contextual bandits and online regression models.[board] Lecture 6: online regression via online gradient descent: direct analysis. board |
Chapters 1 and 3 of these lecture notes. Also, this survey. |
7 (Jan 28) | Lecture 7: Online gradient descent for online linear optimization. board | Chapter 2 of survey linked above. |
8 (Jan 30) | Lecture 8: Reduction to online linear optimization, reduction from contextual bandits to online learning. board | Chapter 3 of lecture notes linked above. |
9-10 (week of Feb 4) | Lecture 9 : $\epsilon$-greedy strategy, inverse proportional gap weighting. board
Lecture 10: optimism, LinUCB, elliptical potential lemma. board |
Chapter 3 of lecture notes linked above. |
11-12 (week of Feb 11) | Lecture 11 : Completing LinUCB analysis, starting RL. board
Lecture 12: MDPs, Bellman Equations, Negative Results. board |
Chapter 5 of lecture notes linked above. |
13 (Feb 18) | Lecture 13 : learning tabular MDPs with unknown transitions. board | Notes (Jiang) |
Homework
Due date | Homework | Note |
---|---|---|
01/31 | Homework 1 | Here is a HW solution template in case you need one |
2/27 | Homework 2 | Here is a HW solution template in case you need one |