Reinforcement Learning and Stochastic Optimization is the first book to unify the diverse communities that study sequential decision problems, which consist of the sequence: decision, information, decision, information. These problems are ubiquitous, spanning business, engineering, the physical and social sciences, economics, health, energy, finance, … essentially any area that involves making decisions as new information arrives. The challenge is to come up with a method for making decisions, where the method is called a policy.
There are established communities that work in fields such as deterministic optimization. The books to the right all use a common notation, and present similar material. The result is a mature community of students coming from hundreds of academic programs who can take these tools to industry.
There is a much larger community for statistics/ machine learning, represented by the books to the left. As with optimization, this family of books present similar material using an established set of methods that students who take a course in statistics (or machine learning) can be expected to master.
This is not true when we turn to decisions under uncertainty. Unlike optimization and machine learning, there is not a single book that spans the complete range of sequential decision problems. Each of the books to the right deal with particular classes of sequential decision problems. Each represents a different community, featuring roughly eight different notational systems, a variety of modeling frameworks, and an endless collection of algorithms for specialized problems. I call this the “jungle of stochastic optimization.” The books are advanced, and highly biased toward the most complex (and rarely used) methods for making decisions.
Reinforcement Learning and Stochastic Optimization (RLSO) is the first book to put the vast range of sequential decision problems into a single modeling framework. We have collected the diverse (and growing) set of solution approaches into an elegant framework consisting of four classes of policies which cover every method that has been proposed in the literature, or used in practice. While most of the books on decisions under uncertainty are written at fairly advanced mathematical levels, RLSO is written for the same audience that is served by the fields of optimization and machine learning with an emphasis on models and algorithms (just as the other two fields). Instead of focusing on the most complex tools, we provide a balanced presentation of all four classes of policies, which means we cover the policies that are actually used in practice (typically in an ad-hoc way).
I have taught this material in an undergraduate course at Princeton and in a graduate course that attracted students from eight different departments. It is easily adapted to the mathematical backgrounds of the audience. The only prerequisite for most of the material is a course in probability and statistics. The 370 exercises are clearly divided into seven different categories to help choose exercises appropriate to the audience. It should serve as the foundation of a new field I am calling sequential decision analytics (click here for a description). For a video introduction, click here.