DATA 37200: Learning, Decisions, and Limits

DATA 37200: Learning, Decisions, and Limits (Winter 2025)

Basic Information

Class Location: JCL 011
Class Time: Tu/Thu 12:30 to 1:50 pm
Instructor: Frederic Koehler and Haifeng Xu

Office: Searle 203 (Frederic) and Crerar 260 (Haifeng)
Office Hour: Frederic (Tuesday 4:30 - 5:30 pm); Haifeng (Thursday 4 - 5 pm)

Email: adityaprasad AT uchicago.edu
Office Hours: Wed 2-3 pm at JCL 2nd floor common area

Course Material: There will not be any official textbook, but the slides and links to reading materials will be posted on the course schedule after each lecture.

Learning Objectives: (1) Understand basic toolkits for online learning and online decision making, as a complement to offline learning paradigm; (2) Prepare students to understand state-of-the-art RL algorithms, such as RLHF and AlphaGo training.

Announcements

Dec 1: Course website is up!
Jan 18: Homework 1 is out and due in two weeks to Gradescope (if haven't yet, you can join via code 4JNR73)
Mar 1: HW 3 available.
Mar 5: Project presentation schedule is out -- let instructors know if you need to change your order.

Course Description

This is a graduate course on theory of machine learning. While ML theory has multiple branches in general, this course is designed to cover basics of online learning, along with basics of reinforcement learning. It aims to establish the foundation for students who are interested in conducting research related to online decision making, learning, and optimization. The course will introduce formal formulations for fundamental problems/models in this space, describe basic algorithmic ideas for solving these models, rigorously discuss performances of these algorithms as well as these problems’ fundamental limits (e.g., minmax/lower bounds). En route, we will develop necessary toolkits for algorithm development and lower bound proofs.

Topics covered in this course, and tentative syllabus (up to small changes):

(week 1) Concentration bound, and UCB
(week 2) Information-theoretic lower bound for KL and distribution testing
(weeks 3-4) Online prediction, introduction to contextual bandits, online gradient descent
(weeks 4-5) Elliptical potential lemma, and linear contextual bandits, alternative to UCB method
(week 6) MDP, dynamic programming
(week 6) Policy iteration and value iteration
(week 7) Reinforcement learning and optimism principle
(week 8) multi-agent RL, equilibria, counterfactual regret minimization, self-play
(week 9) Sampled recent learning paradigms: RLHF, etc.

This is primarily a theory course, and lecture-based. That said, we will focus primarily on proofs over coding. Prerequisites include linear algebra (at the level of CMSC 25300 or its equivalent), algorithms (CMSC 27200 or its equivalent) and probability (STATS 25100 or its equivalent). If not sure, consult with the instructor. Note that no background on learning theory is required.

Lectures and Readings

Lec No.	Lectures	Readings
1 (Jan 7)	Intro and MAB [slides]	Various concentration inequalities and concentration for martingales
2 (Jan 9)	UCB [slides]	Chapter 1 of this MAB book
3-4 (Jan 14,16)	MAB lower bound [slides]	Chapter 2 of this MAB book
5-6 (Jan 21)	Lecture 5: contextual bandits and online regression models.[board] Lecture 6: online regression via online gradient descent: direct analysis. board	Chapters 1 and 3 of these lecture notes. Also, this survey.
7 (Jan 28)	Lecture 7: Online gradient descent for online linear optimization. board	Chapter 2 of survey linked above.
8 (Jan 30)	Lecture 8: Reduction to online linear optimization, reduction from contextual bandits to online learning. board	Chapter 3 of lecture notes linked above.
9-10 (Jan 30)	Lecture 9 : $\epsilon$-greedy strategy, inverse proportional gap weighting. board Lecture 10: optimism, LinUCB, elliptical potential lemma. board	Chapter 3 of lecture notes linked above.
11-12 (week of Feb 11)	Lecture 11 : Completing LinUCB analysis, starting RL. board Lecture 12: MDPs, Bellman Equations, Negative Results. board	Chapter 5 of lecture notes linked above.
13 (Feb 18)	Lecture 13 : learning tabular MDPs with unknown transitions. board	Notes (Jiang)
14 (Feb 20)	Games and multi-agent learning [slides]	A note on minimax theorem through duality theory
15-16 (Feb 25, 27)	Zero-sum Extensive-Form Games [slides]	AlphaGo and Counterfactual regret paper
17 (March 4)	Fine-tuning and RLHF [slides]	Many linked papers in the slides

Homework

Due date	Homework	Note
01/31	Homework 1	Here is a HW solution template in case you need one
2/27	Homework 2	Here is a HW solution template in case you need one
3/14	Homework 3	Here is a HW solution template in case you need one

Requirements and Grading

Grades consist of three components: (1) 3~4 proof-based assignments (30% ); (2) three in-class 30-mins quizzes (25%), with 30 mins each; (3) course project (45%). Notably, the quiz is expected to not be difficult, but rather some simple tests to see whether you are indeed on top of the key materials.

The project could be one of the following three categories: (A) Reproduce the proofs of existing paper(s); (B) Novel research and results; (C) something between A and B (we would encourage most of you to start in this category, which have some novelty yet is backed up by reproducing proofs) Project instructions

There is no midterm or final for the course.

Late Homework Policy : Each student is allowed one late homework for at most two days from the due date. You may choose whichever homework to use this chance (or not use it). No additional late homework will be accepted.

Students with disabilities or learning needs

We thrive to create a learning experience that is as accessible as possible. If you anticipate any issues related to the format, materials, or requirements of this course, please meet with me outside of class so we can explore potential options. Students with disabilities may also wish to work with the UChicago Student Disability Services to discuss a range of options to removing barriers in this course, including official accommodations. Please visit their website for information on this process and to apply for services online: disabilities.uchicago.edu. If you have already been approved for accommodations through SDS, please send me your accommodation letter, and meet with me if needed, so we can develop an implementation plan together.