DATA 37200: Learning, Decisions, and Limits

DATA 37200: Learning, Decisions, and Limits (Winter 2026)

Basic Information

Class Location: JCL 011
Class Time: Tu/Thu 11 am to 12:20 PM
Instructor: Frederic Koehler

Office: Room 303, 5460 S. University Avenue
Office Hour: Wed 3:30-4:30 PM (subject to change)

TA: Joonhyung Shin (email: joons at uofc)
Course Material: There will not be any official textbook, but the slides and links to reading materials will be posted on the course schedule after each lecture. You can see last year's version of the course. This year's class will be updated a bit from last year.

Learning Objectives: (1) Understand basic toolkits for online learning and online decision making, as a complement to offline learning paradigm; (2) Prepare students to understand state-of-the-art RL algorithms, such as RLHF and AlphaGo training.

Course Description

This is a graduate course on theory of machine learning. While ML theory has multiple branches in general, this course is designed to cover basics of online learning, along with basics of reinforcement learning. It aims to establish the foundation for students who are interested in conducting research related to online decision making, learning, and optimization. The course will introduce formal formulations for fundamental problems/models in this space, describe basic algorithmic ideas for solving these models, rigorously discuss performances of these algorithms as well as these problems’ fundamental limits (e.g., minmax/lower bounds). En route, we will develop necessary toolkits for algorithm development and lower bound proofs.

This is primarily a theory course, and lecture-based. That said, we will focus primarily on proofs over coding, although there may be a small amount of coding to deepen understanding. Prerequisites include linear algebra (at the level of CMSC 25300 or its equivalent), algorithms (CMSC 27200 or its equivalent) and probability (STAT 25100 or its equivalent). If not sure, consult with the instructor. Note that no background on learning theory is required.

Lectures and Readings

Lecture Number	Lectures	Readings
1 (Jan 6)	Intro and MAB (slides)	Bandit Algorithms by Lattimore and Szepasv\'ari is a good reference for multi-armed bandits.
2 (Jan 8)	Concentration inequalities and martingales
3 (Jan 13)	UCB
4 (Jan 15)	Information-theoretic lower bounds
5 (Jan 20)	Bayesian bandits and Thompson sampling	See Chapter 1.1 of Miolane's thesis for formal statement of "Nishimori identity"
6 (Jan 22)	Introduction to contextual bandits	These lectures notes are a good reference for contextual bandits and basic RL theory: arxiv
7 (Jan 27)	$\epsilon$-greedy contextual bandits
8 (Jan 29)	Learning from experts
9 (Feb 3)	Online ridge forecaster
10 (Feb 5)	Midterm exam, in class. You may bring a 1 page double-sided cheat sheet, no calculator. The midterm covers lectures 1-8, so (non-exhaustively) it includes: (a) concentration and martingales, (b) ETC,UCB, and Thompson sampling for MAB, (c) basic contextual bandits, epsilon-greedy strategy, (d) basic information theory (e.g. KL divergence), (e) learning from experts. The relevant chapters of the two references above contain good practice problems.
11 (Feb 10)	Inverse gap weighting, LinUCB
12 (Feb 12)	Motivating LinUCB from a Bayesian perspective	[Li-Chu-Langford-Schapire '10]
13 (Feb 17)	Reinforcement Learning and Markov Decision Processes	Besides the previous references, these notes are interesting to look at related to RL and control in robotics. Another interesting set of notes
14 (Feb 19)	Linear Quadratic Regulator	A relevant book: [Bertsekas '17]
15 (Feb 24)	Learning MDPs. Multiplayer RL/two-player zero-sum Markov games.	Rmax algorithm. Two-player Markov gamess as RL: [Littman '94]
16-18 (Feb 26-Mar 5)	Tentatively: more on games and regret minimization, policy gradient, RLHF.
Final (Mar 10)	The college has scheduled the final exam for this course to be on Tuesday, March 10 2026 in JCL 011 from 10am - noon.

Homework

Due date	Homework	Note
January 30, 9 am	HW1: information theory PDF	Upload solutions to gradescope
March 2, 9 am	HW2: learning and forecasting PDF	Upload solutions to gradescope

Requirements and Grading

Grades consist of two components: (1) ~3 mostly proof-based hw assignments (30% ); (2) in-person midterm and final (70%). For the exam grade, we will take the best of two options: (2a) midterm 30%, final 40%, or (2b) midterm 20%, final 50%. The midterm will be in-class on Thursday, February 5, in class. The final was scheduled by the registrar (Tuesday March 10, 10am-noon). For both midterm and final, a 1page cheatsheet (two-sided) is allowed.

Late Homework Policy : Each student is allowed one late homework for at most two days from the due date. You may choose whichever homework to use this chance (or not use it). No additional late homework will be accepted.

Students with disabilities or learning needs

We thrive to create a learning experience that is as accessible as possible. If you anticipate any issues related to the format, materials, or requirements of this course, please meet with me outside of class so we can explore potential options. Students with disabilities may also wish to work with the UChicago Student Disability Services to discuss a range of options to removing barriers in this course, including official accommodations. Please visit their website for information on this process and to apply for services online: disabilities.uchicago.edu. If you have already been approved for accommodations through SDS, please send me your accommodation letter, and meet with me if needed, so we can develop an implementation plan together.