A Modern Approach to Teaching Introduction to Optimization

Warren B. Powell
Professor Emeritus, Princeton University

Retirement brings an opportunity to think about a field from a fresh perspective.  While reviewing the undergraduate course at Princeton taught in operations research (along with the textbooks, both widely used by other schools), I came to the conclusion that we are approaching how we introduce optimization completely backward, especially given the status of optimization solvers.  There is far too much emphasis on sophisticated algorithms for complex problems and not enough emphasis on modeling decision problems that actually arise in practice.

My major observations:

  • Optimization should be the science of making the best decisions we can.
  • We should start with the simplest nontrivial decision problems that students are most familiar with. Examples include machine learning (a form of nonlinear optimization), and an array of sequential decision problems where the decisions may be binary (selling an asset, booking a flight, choosing a webpage design), discrete (choosing the best product to recommend, medical treatment, person to perform a task, …) or a continuous scalar (price, dosage, concentration, time, …).
  • Most optimization courses in industrial engineering, operations research, MBA programs, … emphasize linear programs.  Virtually no one without formal training has even heard of a linear program, and very few students who actually take a course in linear programming ever solve a linear program.  Of course, there are many applications of linear programming (and integer programming), but these are more complex problems, and as a result, should not be the problems that students first see in an optimization course.
  • Linear programs (along with integer and nonlinear programs) are always presented as static problems.  However, the vast majority of optimization problems are used to make decisions that arise over time, which means they are sequential decision problems.  In my approach, I start with classical static models and then illustrate how to view these as methods for making a single decision in a sequential problem.

I have designed a new approach for teaching introductory optimization courses.  This can be downloaded in either of two formats:

I recommend reading the introductory material (sections 1-4). The body of the course is given in section 5 in the form of 11 topics.  This is written for a potential course instructor of an optimization course but will be useful to anyone who has already taken a course in optimization.  Given the length (the entire document is 90 pages), I recommend first skimming section 3 which provides an overview of the different topics, focusing on the pedagogical style.

The course starts by emphasizing problems that are very familiar to students. I start with machine learning problems given the very high level of awareness this field enjoys.  These problems also lay an important foundation for optimizing parameters that will be used throughout the course.

I then transition to sequential decision problems, focusing on simple problems (selling an asset, ordering inventory, learning the best medical treatment) before transitioning to static (and then dynamic) shortest path problems (this is a simple form of linear program).  I emphasize the use of parameterized functions for making decisions (this is what is most widely used in practice) which builds on the parameter tuning we introduced in machine learning.

The sequential decision problems are much simpler than linear programs, but do require handling uncertainty, which we do in the simplest possible way.  Prior training in statistics is not necessary.  The only use of calculus arises in our use of derivatives (or gradients), but I am not sure that this material needs an entire course in calculus.  There is one optional segment that requires linear algebra (where I show a matrix-based version of the simplex method).  I introduce how to generate random numbers using Monte Carlo simulation, but again, this is very basic.

Many people with formal training will recognize a sequential decision problem as a dynamic program, which normally leads to sophisticated material based on Bellman’s equation.  The only time I use Bellman’s equation is to solve a simple deterministic shortest path problem.  Please keep in mind that while the academic community likes to bring a high level of sophistication to optimization problems that involve uncertainty, people deal with uncertainty all the time, and there are methods for solving these problems that are quite simple.

Many of the topics on sequential decision problems are based on material that is available in my book Sequential Decision Analytics and Modeling which can be downloaded for free by clicking here.  Otherwise, I assume anyone teaching this material will have their own favorite references for linear, integer, and nonlinear programming.