Warren B. Powell
Professor Emeritus, Princeton University
My book, Reinforcement Learning and Stochastic Optimization, synthesizes 15 different fields that work on sequential decision problems, but there is a price: it is an 1100 page book. xxxxxxxT
This webpage is designed to guide newcomers to the field that I have been calling sequential decision analytics. It will start with video tutorials, then transition to a book I wrote for my undergraduate course, and then I will start stepping into the “big book” which can be done in stages.
Step 1: Video tutorials
- I recommend starting with the short (40 minute) video introduction:
https://tinyurl.com/sdafieldyoutube/
- I then suggest the four-part tutorial (approximate 100 minutes total) which expands the tutorial above with more examples, a more complete discussion of modeling, and a more complete discussion of all four classes of policies.
- https://tinyurl.com/SDAPartI
- https://tinyurl.com/SDAPartII
- https://tinyurl.com/SDAPartIII
- https://tinyurl.com/SDAPartIV
Step 2: The “beginner” book
I wrote this book for an undergraduate course, but it is ideal for almost anyone who wants to first understand how to think about sequential decision problems. An introduction to the book is available at
Sequential decision analytics and modeling
The book can be downloaded by clicking here.
This book uses a teach by example style. Chapter 1 gives an introduction to the universal modeling framework. Then, all the remaining chapters (with the exception of chapter 7) are examples, each written using the same outline, and each chosen to bring out different modeling issues. Chapter 7 pauses to illustrate the key ideas using the examples in the first six chapters. This book is very easy to skim.
Step 3: Reading the “big book”
At this point, you need a copy of Reinforcement Learning and Stochastic Optimization (RLSO). The print version can be purchased on Amazon, but do not by the kindle version if you are interested in an electronic version. Instead, get the Wiley e-book version from here – they have done an exceptional job.
Once you have the book, I suggest:
- Read/skim chapter 1, which provides an overview of the book (this will overlap somewhat chapter 1 of SDAM).
- Pay special attention to “Section 1.8 – How to read this book”. This section outlines the organization of the book, and notes that sections marked by an * can be skipped the first time through the book. This will dramatically shorten the book on a first read.
- Also note that chapter 11 provides an overview of all four classes of policies, and provides guidance on how to choose among the policies. If you have a specific problem you are trying to solve, this may help you avoid going through chapters 12-19 which cover:
- Chapter 12 – Policy function approximations (PFAs)
- Chapter 13 – Cost function approximations (CFAs)
- Chapters 14-18 – Policies based on value function approximations (VFAs) – This is the material most often identified with “reinforcement learning” or “approximate dynamic programming”
- Chapter 19 – Direct lookahead approximations (DLAs) – This chapter presents two important subclasses of policies:
- Deterministic lookahead models (possibly parameterized)
- Stochastic lookahead models – Here I review how to design policies within the lookahead model.
Step 4: Self-study
Finally, I have a series of suggested ways of teaching this material at
https://tinyurl.com/RLSOcourses/
At the top I have designed a “Weekly seminar series” where I have outlined the material that a group of students or professionals could work through the material in the book. These topics can also be used as guidance for individual self study.
Also – I make available three important chapters from the book webpage (https://tinyurl.com/RLandSO/) where you can download chapter 1 for an overview (also chapters 9, on modeling, and 11, on the four classes of policies).
Step 5: Other educational material
I maintain a webpage of resources for the field of sequential decision analytics at:
https://tinyurl.com/sdalinks/ (or https://castle.princeton.edu/sdalinks)