From time to time I write short introductory/tutorial articles. If you want an easy introduction to ADP from the perspective of operations research, try some of the articles below. All are are written in a tutorial style and most are fairly short.
Clearing the Jungle of Stochastic Optimization, Informs Tutorials, November, 2014 (c) Informs
This is a tutorial article on modeling sequential decision problems ("dynamic programs"), with my most recent summary of the four classes of policies. The article also contains an in-depth discussion of lookahead policies, which is mentioned only briefly in chapter 6 of the ADP book.
Energy and Uncertainty: Models and Algorithms for Complex Energy Systems, AI Magazine, October, 2014.
This article uses the setting of energy systems to describe fundamentals of modeling stochastic dynamic systems, and illustrates the four classes of policies using a simple application.
A Unified Framework for Stochastic Optimization (Informs Computing Society Newsletter article - Fall, 2012)
This is a short article that describes links between stochastic search, dynamic programming and stochastic programming, drawing on the discussions in the longer articles below.
W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.
Using the contextual domain of transportation and logistics, this paper describes the fundamentals of how to model sequential decision processes (dynamic programs), and outlines four classes of policies. A section describes the linkage between stochastic search and dynamic programming, and then provides a step by step linkage from classical statement of Bellman's equation to stochastic programming. The remainder of the paper uses a variety of applications from transportation and logistics to illustrate the four classes of policies.
W.B. Powell and I.O.Ryzhov, "Optimal Learning and Approximate Dynamic Programming," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, (F. Lewis and D. Liu, eds.)
This chapter covers methods for solving the exploration vs. exploitation problem from two perspectives: policy search (a dynamic program with a pure belief state), and dynamic programming with a physical state. After providing an overview of some of the most popular heuristics, we describe how the knowledge gradient can be used in both settings. The knowledge gradient is illustrated for both offline and online learning, both for policy search and in dynamic programming with a physical state.
W. B. Powell, "Perspectives of Approximate Dynamic Programming," Annals of Operations Research, Annals of Operations Research (2012), DOI 10.1007/s10479-012-1077-6 (c) Springer.
This is an overview of ADP, covering some of the historical evolution, how to model a sequential decision process, and a summary of the four fundamental classes of policies that are used in most of the stochastic optimization literature, spanning reinforcement learning, dynamic programming, stochastic programming, and variations of lookahead policies.
Powell, W. B., "Approximate Dynamic Programming I: Modeling," Encyclopedia of Operations Research and Management Science, (c) John Wiley and Sons..
Powell, W. B., "Approximate Dynamic Programming II: Algorithms," Encyclopedia of Operations Research and Management Science, (c) John Wiley and Sons.
These two short chapters provide yet another brief introduction to the modeling and algorithmic framework of ADP. The first chapter actually has nothing to do with ADP (it grew out of the second chapter). Instead, it describes the five fundamental components of any stochastic, dynamic system. There is also a section that discusses "policies", which is often used by specific subcommunities in a narrow way. I describe nine specific examples of policies. I think this helps put ADP in the broader context of stochastic optimization.
The second chapter provides a brief introduction to algorithms for approximate dynamic programming. In the tight constraints of these chapters for Wiley's Encyclopedia, it is not possible to do a topic like this justice in 20 pages, but if you need a quick peek into ADP, this is one sample.
Powell, W.B., “Merging AI and OR to Solve High-Dimensional Resource Allocation Problems using Approximate Dynamic Programming” Informs Journal on Computing, Vol. 22, No. 1, pp. 2-17 (2010).(c) Informs
This paper addresses four problem classes, defined by two attributes: the number of entities being managed (single or many), and the complexity of the attributes of an entity (simple or complex). All the problems are stochastic, dynamic optimization problems. Single, simple-entity problems can be solved using classical methods from discrete state, discrete action dynamic programs. The AI community often works on problems with a single, complexity entity (e.g. a backgammon board). The OR community tends to work on problems with many simple entities. This paper briefly describes how advances in approximate dynamic programming performed within each of these communities can be brought together to solve problems with multiple, complex entities.
Powell, W. B. “What you should know about approximate dynamic programming,” Naval Research Logistics, Vol. 56, No. 3, pp. 239-249, 2009.
This article is a brief overview and introduction to approximate dynamic programming, with a bias toward operations research. It highlights the major dimensions of an ADP algorithm, some strategies for approximating value functions, and brief discussions of good (and bad) modeling and algorithmic strategies.
Powell, W. B., “Approximate Dynamic Programming: Lessons from the field,” Invited tutorial, Proceedings of the 40th Conference on Winter Simulation, pp. 205-214, 2008.
This is the third in a series of tutorials given at the Winter Simulation Conference. This one has additional practical insights for people who need to implement ADP and get it working on practical applications.
Powell, W. B., “Approximate Dynamic Programming – A Melting Pot of Methods,” Informs Computing Society Newsletter, Fall, 2008 (Harvey Greenberg, ed.)
This article appeared in the Informs Computing Society Newsletter. It provides an easy, high-level overview of ADP, emphasizing the perspective that ADP is much more than an algorithm - it is really an umbrella for a wide range of solution procedures which retain, at their core, the need to approximate the value of being in a state.