Modeling is the mathematical representation of a real problem. Good modeling requires a consistent, parsimonious notational style that is easy to read. Ultimately, the real test of a model involves calibration against real-world data, which means capturing the important dimensions of the application.

A few resources that may prove useful:

Recent tutorial articles:

A Unified Framework for Optimization under Uncertainty – Tutorial to be given at Informs annual meeting, Monday, Nov 14, 2016 in Nashville. This framework unifies the fields in the jungle of stochastic optimization. It also helps to read the earlier tutorial article:

Clearing the Jungle of Stochastic Optimization (tutorial given in 2014) – Discusses the core elements of a stochastic model, discusses state variables, reviews the four classes of policies, and discusses lookahead policies in depth.

A lecture on modeling – This is a power point presentation derived from a series of lectures I have been giving on modeling. Includes a series of slides that helps identify a better definition of a state variable. See also Chapter 5 – Modeling.

A lecture on policies – Similar in style to the modeling lecture, this describes and illustrates the four fundamental classes of policies (click here) for a more detailed tutorial on policies in our tutorial on the “Jungle of Stochastic Optimization.”) The “Unified Framework” tutorial (above) contains a further refinement of these ideas.

ORF 411 – Operations and Information engineering – An undergraduate course we teach in our department at Princeton, with a major theme of modeling.

Major problem classes – An overview of major problem classes along the five fundamental dimensions of sequential decision problems.

Discussions – What is a policy? What is a state variable? How do we bridge communities? … and other topics that seem relevant.

Additional readings … tutorial style articles on modeling and stochastic optimization.

Additional advice on modeling – If you are faculty, staff, post-doc or grad student at Princeton University and you need help thinking through a sequential decision problem, from any field, please contact Warren Powell (powell@princeton.edu) to set up a modeling session.

### Problem classes

Below I am going to try to list dimensions along which the problem classes of computational stochastic optimization can be divided. If you can think of one that I have missed, let me know.

**Static/two-stage/sequential**

Static problems – Make a decision (alternatively, choose a deterministic parameter or fix a design), then observe a random outcome.

Two-stage – Make a decision, observe a random outcome, and then make one more decision (also known as the recourse). This is very close to static problems, but I feel that it deserves its problem class. It has received considerable attention from the stochastic programming community where decisions are vectors.

Sequential decision problems – Also known as multistage stochastic optimization, this is where you make a decision, see a random outcome, make another decision, see a random outcome, and so on (over a finite or infinite horizon).

**States**

Our lack of an accepted definition of a state variable is one of my pet peeves in the field of sequential decision problems. See my discussion of states for my views on a definition of the state of the system. For now, I will offer some variations of state variables:

- Physical state – Many people equate “state” with “physical state,” which can be viewed as a snapshot of the system (how much product is in inventory, the price of a stock, the current estimate of the location/speed/acceleration of an aircraft).
- Information state – This includes everything we
*need*to know about the system to make decisions and model the problem. For example, I might need the price of the stock over the last three time periods to model the price in the next time period. - Belief state (or knowledge state) – This is a probability distribution giving our belief about parameters we cannot measure perfectly.

A “dynamic program” is often equated with a physical state that determines the available actions. But there are learning problems where the state of the system consists purely of a belief state (see, for example, the multiarmed bandit problem – for more, see my work on optimal learning.

Side note: The computer science community makes a distinction between the *state* of a system, and state variables which might be the different components of a state. I don’t think the operations research community makes this same distinction (and I have no idea about the controls community).

**Decisions**

Communities such as computer science (reinforcement learning), control theory and operations research/math programming can be divided roughly in terms of the nature of the decisions each field addresses:

Discrete action spaces – Universal notation is “a” -Discrete action spaces arise in a vast range of applications, often involving the management of a single entity (robots, games, people, animals, …). Problems are nonconvex and discrete.

Continuous controls – Common notation is “u” – Engineers are often flying planes, controlling machines or chemical processes or moving robots. Controls are typically low-dimensional (10-20 dimensions is a lot), where they exploit continuity but not convexity.

Vector-valued actions – Common notation is “x” – The operations research community solves a wide range of problems using high-dimensional vectors x, often with thousands (and sometimes millions) of dimensions. Convexity is a powerful and widely used property.

**Exogenous (random) information**

Exogenous information comes in many different flavors, including:

- Well defined distributions with finite moments (normal, Poisson, uniform, etc. etc.) – These applications generally arise when a problem (which might be static/two-stage or sequential) repeatedly, allowing us to experience the full range of the distribution.
- Heavy-tailed distributions – Some problems (an example is electricity spot prices) exhibit very heavy-tailed behavior, with possibly infinite variance (see the Cauchy distribution as an example).
- Rare events – Imagine planning inventories for large, expensive equipment (say, a jet engine) that is simply not supposed to fail, but sometimes it does.
- Subjective probabilities – Try planning for a new product for which there is simply no history to build a distribution of what might happen in the future.

**Transition functions**

Known as system models, state models, plant models, models, transfer functions, or laws of motion, the transition function describes how the state of the system evolves from one time period to the next. These come in different flavors:

- Systems of linear equations – Useful for certain classes of search algorithms
- Known functions (but without any specific structure)
- Unknown functions – Our system may be the climate, a complex chemical plant, or the behavior of an animal – We simply may not have equations that model the system from one period to the next. We can divide this class between problems where we assume we can estimate the equations (but acknowledge that the estimate is imperfect), or where we do not even have an estimate. In the last class, we assume only that we can obseve the next state.

**Objective functions**

Objective functions can be classified along several dimensions:

Handling uncertainty – For deterministic problems, we just add up costs and contributions. When we introduce uncertainty, we have to decide how to compute our objective. Some choices include:

- Expectations – By far the most popular, we can minimize or maximize an expectation (usually estimated as an average).
- Min/max – The community of robust optimization introduces the idea that if we are minimizing costs, we can minimize the maximum cost, thereby protecting us against the worst that might happen.
- Quantile optimization – Min/max the alpha-quantile of a function. For example, if we are maximizing profits, we might be worried about maximizing the 5th quantile.
- Risk measures – The finance community has a range of risk measures (Var, CVar, …) which generalize the min/max approach of robust optimization.

Additivity – The vast majority of research articles assume that costs/contributions are additive over time, but this is not always the case. In some cases additivity can be restored by designing an appropriate state variable. The use of risk measures can be one source of complicatoin.

Continuity – Objective functions may be Lipschitz continuous, contiinuously differentiable, upper/lower semicontinuous, and CADLAG (google it). If you have none of the above, I hope you are using discrete action spaces where this does not matter.

Convexity – Convexity is a powerful property for vector-valued applications, but there are many problems that cannot assume convexity. The reinforcement learning community works almost exclusively with discrete action spaces, where convexity is not an issue. The engineering controls community widely assumes continuity without convexity (which is how they are always producing those great 3-D Matlab plots!).

Differentiability – We can compute derivatives for some problems (often in the form of stochastic gradients), but there are many problems where derivatives are not available and we have to use derivative-free algorithms.

Computational complexity – Some functions can be evaluated in a fraction of a second, in others a single observation takes a year. This has a big impact on the design and testing of algorithms. A sample of different problem classes include:

- Complex analytic functions – Can be evaluated in a fraction of a second
- Numerical simulations – May take several seconds, several hours, or several days (sometimes even longer)
- Physical experiments – A single experiment might take a day, or a month (or longer)
- Field experiments – Estimating the market response to a change in price might take a month or several months.
- Admission policies for a university – You get one observation per year
- Impact of tax policy on climate change – Just so you know that some stochastic optimization problems are really hard.

### Discussions

I will use this space to carry on a series of discussions about topics in stochastic optimization.

Observations – These are a series of observations about stochastic optimization, some of which run against conventional thinking (at least in some communities).

What is a policy? – It is a rule for making decisons. Here I summarize four basic classes of policies that can be used as a foundadtion for building hybrids.

What is a state? – One of my favorite topics. How did we get this far without agreeing on a definition?

Bridging communities – Sometimes it can be tricky seeing the parallels between communities. Here I discuss:

- Bridging stochastic search to dynamic programming
- Bridging dynamic programming to stochastic programming

### Some observations

I am going to use this section to make a few observations about dynamic programming.

- A dynamic program is a sequential (and for us, stochastic) decision problem.
- Bellman’s equation is a) a way of characterizing an optimal policy and b) the foundation for one of four classes of policies (see below). Dynamic programming is a
*problem*; Bellman’s equation provides the foundation for a class of*methods*. - A state variable is a
*minimally dimensioned function of history that is necessary and sufficient to compute the decision function, cost/reward/contribution function, and the transition function*. This implies that all processes are Markovian (by definition of the state variable). The state variable can be thought of as the state of information, not to be confused with the physical state. See What is a state variable for a continued discussion. - A policy is a mapping from state to action. While normally presented in terms of lookup tables or analytic functions, a policy might involve solving a linear program or computing a neural network. Note that the state variable has to contain all the information (this is the sufficiency requirement) to compute the policy.
- Dynamic programs can be solved using one of four fundamental classes of policies (along with a wide range of hybrids). These include:
- Optimizing a myopic cost function approximation
- Lookahead policies (includes tree search, rolling horizon procedures and stochastic programming)
- Policy function approximations (an analytical mapping from state to action)
- Policies based on value function approximations (which builds on Bellman’s equation).

- Policy search can be used to tune almost any policy.
- Stochastic programming (whether using Benders cuts or direct solution over all scenarios and all time periods) is a lookahead policy which seeks to optimize a lookahead model. The optimal solution of a lookahead model is not an optimal policy for the full model.

### What is a policy?

A policy is any mapping from state S to a feasible action *a* (or *x*, or *u*). For all the differences of opinion about dynamic programs, I have not found anyone who disagrees with this basic statement.

The dynamic programming community sometimes equates a policy with a lookup table (“when in discrete state s, take the discrete action a”). But a policy is any mapping from state to action, which means it could involve solving a linear program (what is the best assignment of resources to tasks right now).

While there are a wide range of policies, I have found that they can be decomposed into four fundamental classes

- Policy function approximations (PFAs) – Analytical mappings from states to actions
- Robust cost function approximatinos (CFAs) – These are parametric modifications of cost functions (and/or constraints) designed to achieve robust behavior.
- Policies based on value function apprroximations (VFAs) – Generally known as approximate dynamic programming
- Lookahead policies – Policies based on optimizing an approximate lookahead model.

A myopic cost function approximation (CFA) might involve solving a linear program to assign resources (such as truck drivers) to tasks (such as loads). We might modify the function to include a bonus for covering late loads (this is the cost function approximation).

Lookahead policies are widely used under different names such as tree search, rolling/receding horizon procedures, model predictive control (popular in engineering), and stochastic programming (which is a rolling horizon procedure which explicitly accounts for uncertainty). Lookahead policies solve approximations of the real problem. This approximation typically involves optimizing over a shorter horizon (instead of optimizing the energy storage device over an entire year, we may feel it is enough to optimize over 24 hours). In addition, it is common to approximate the uncertainty in some way. The simplest approximation is to assume the future is deterministic, optimize this much easier problem, implement the solution, and then observe a random transition (in the real model). The stochastic programming community uses scenario trees, which are sampled versions of the true stochastic process. The same idea is used in the reinforcement learning community under the name Monte Carlo tree search.

Policy function approximations are analytic functions that return an action given a state, without solving an optimization problem.

Policies based on value function approximations look like

Here, we have replaced the value function (actually, the expected value function) with a linear regression. This is the policy most commonly associated with “dynamic programming” and Bellman’s equation.

Policies 1, 3 and 4 all use some form of functional approximation. We can approximate functions in three ways: lookup table, parametric and nonparametric. We can also use a mixture known as semi-parametrics.

In addition, it is possible to create a wide range of problems by using mixtures of the four fundamental classes. A popular strategy, for example, is to use tree search (or a rolling horizon procedure) with value function approximations as a terminal reward. Another powerful strategy is to use policy function approximations, possibly as low dimensional patterns, combined with myopic or lookahead policies (possibly as linear programs).

The concept of a policy is particularly powerful whenever we are representing decisions in the future, with information that is not known now. If we are at time t, a decision at time t’ > t is a random variable. You can always represent this by replacing the decision with the policy that depends on the state at t’ (which is also a random variable).

Additional readings:

Clearing the Jungle of Stochastic Optimization – This will be published in the Informs TutORials series in 2014.

A presentation of the four fundamental classes of policies is given in Chapter 6 of my ADP book.

For an illustration of the four classes of policies in the context of problems in transportation and logistics, see

### What is a state?

It seems surprising that the dynamic programming community has not, in the past 60 years, adopted a standard definition of a state variable. This topic was discussed at an NSF workshop called a “Conversation between AI and OR on Stochastic Optimization.” At this workshop, we posed the question to the attendees which produced 30 responses which can be viewed here. Needless to say, there was quite a bit of variety in the responses.

Two ideas seemed to stand out from the responses. The first was that a state variable should be a *sufficient statistic*, which means it contains all the necessary information. The second was that it should be “efficient,” “parsimonious,” or “minimal.”

Two objections were raised about the use of the term “sufficient statistic.” First, it can be seen as replacing one piece of jargon (“state”) with another, which then needs its own definition. The second is that the term “sufficient statistic” is widely used in statistic where it has its own meaning.

There also seems to be a feeling that a state should only contain necessary information. Most models do this implicitly, but not all. The stochastic programming community routinely uses the entire history, which is certainly a sufficient statistic, but not necessary.

With apologies, my favorite definition is from chapter 5 of my ADP book (you can download this from http://adp.princeton.edu). It reads

A state is the minimally dimensioned function of history that is necessary and sufficient to compute the decision function, the transition function, and the cost/reward function.

This definition is consistent with the concept of a sufficient statistic, but requires that the information be necessary as well (and therefore minimally dimensioned, which is redundant). It also provides clear guidelines for necessary – it is the data needed to compute the decision function (for example, the set of feasible actions might depend on the state), the cost/reward function, and the transition function. We maintain that if information is only needed to compute the transition function, then it should be required to compute the contribution function or decision function at a later point in time.

Examples of information that need to be in the state variable:

- The price of a stock – If we sell a stock, we need to know its price in order to compute the contribution function.
- The feasible region – We need to now how much water is in a reservoir to know how much can be released. Alternatively, if we are assigning technicians to serve customers arriving randomly over time, we need to know the customers that are waiting to be served. This information appears in the feasible region (the characteristics of the customers might also appear in the contribution function).
- The transition function – The rate of evaporation from a water reservoir may depend on temperature, which would then have to be in the state variable. Alternatively, assume that the price of a stock p_{t+1} at time t+1 depends on the price of the stock at times t, t-1 and t-2. The evolution of prices might be described by

In this case, the state variable has to include p_t, p_{t-1} and p_{t-2}, because they are needed to compute the transition function.

### Bridging communities

A real opportunity for a community for computational stochastic optimization is recognizing when the contributions of one community can be used to help solve the problems of another. Below I provide bridges between stochastic search and dynamic prograrmming, followed by a step-by-step roadmap from classical dynamic programming to stochastic programming. This is based on a recent newsletter article that appeared in the **Informs Computing Society newsletter.**

**From stochastic search to dynamic programming**

Stochastic search is itself an umbrella term that encompasses derivative-based search (stochastic gradient methods, stochastic approximation methods), and derivative-free search (which includes a lot of the work in the simulation-optimization community, and the black-box optimization community). Stochastic search is typically written in the generic form

where *x* is a deterministic parameter and *W* is a random variable.

Stochastic search is often viewed as being distinct from dynamic prograrmming because we are choosing a single decision that works well over different outcomes, while sequential problems allow you to adapt decisions over time as new information becomes available. However, a different way to think about sequential decision problems is that you are choosing a fixed *policy* that determines decisions over time. This would be written

where the state variable evolves according to

where *W* is a random variable and *x* is determined by the policy (“decision function”) . When we search over the policies, we mean that we are searching over classes of policies (this might be a lookahead policy, or one that depends on value function approximations), as well as any tunable parameters for the class of policy being considered. For example, we might represent our policy using an approximate value function, using

Here, we have written the dependence of the policy on the regression parameters theta. We might find the best set of parameters theta by solving

We easily see that this is identical to our stochastic search problem. To solve this, we might draw on the tools of infinitessimal perturbation analysis to estimate derivatives. If we cannot find a derivative, we could turn to various derivative-free techniques based on the field of ranking and selection, simulation-optimization and optimal learning.

**From dynamic programming to stochastic programming**

This is harder, and I cannot do the equations on a website. What is important to recognize, even without any math, is that stochastic programming solves an approximate model of the future to determine an action that is implemented now. Once this is done, you step forward in time using a real model (this could be a mathematical model, or an observation from a real physical system), after which you have to do it all over again. This is how you recognize a *lookahead policy*. The approximate model is called the *lookahead model*.

The stochastic programming community uses two fundamental approaches for solving the lookahead model:

- Explicitly solving the problem over a horizon from now (time t) to the end of a planning horizon (t+H) using an approximate model of the underlying stochastic process known as a scenario tree. Often, these problems are quite large.
- For convex optimization problems, a second strategy is to use Bender’s cuts, indexed by the scenario tree. This can be viewed as a form of value function approximation. Although this results in a decision function at time t that uses a value function approximation, it is still a lookahead policy because the entire process has to be repeated once you implement a decision and step forward.

A detailed, step by step derivation which starts with Bellman’s optimality equation and ending with the standard methods used in stochastic programming are given in a 6+ page article in the Informs Computing Society newsletter (Fall, 2012). These equations cannot be replicated in a website.

A much longer article gives the same presentation, and then uses applications in transportation and logistics to illustrate the different classes of policies, is

Alan J. King & Stein W. Wallace (2012).

Modeling with stochastic programming. New York: Springer Verlag – A great discussion of modeling in the context of stochastic programming.Powell, W.B., J. Shapiro and H.P. Simao, “A Representational Paradigm for Dynamic Resource Transformation Problems,” Annals of Operations Research on Modeling (C. Coullard, R. Fourer, and J. H. Owen, eds), Vol. 104, pp. 231-279, 2001. This paper, although a bit dated, laid the foundation for our modeling framework for complex resource allocation problems.

A Unified Framework for Stochastic and Dynamic Programming, Informs Computing Society newsletter, November, 2012 – This six page article provides a brief overview of how to model stochastic, dynamic systems. There is a section which gives a step by step bridge from classical dynamic programming to stochastic programming.

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8. – This is a much longer version of the ICS newsletter article above, illustrated with numerous examples drawn from transportation and logistics.

Yu-Li Chou, Stephen Pollock, H. Edwin, Romeijn, Robert L. Smith, “A Formalism for Dynamic Programming,” Working paper, Department of Industrial and Information Engineering, University of Michigan, October, 2001,” – A thoughtful perspective of dynamic programming by some of the top researchers in the field. Note the implicit definition of a state variable on page 2.

Jack Kleijnen – Simulation-Optimization via Kriging and Bootstrapping: A Survey, November 2012 – A nice survey article on a stochastic search method known as kriging.

For some recent papers that illustrate the modeling of complex sequential problems in different settings, see

An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application

SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology and Policy