Convex optimization studies the problem of minimizing a convex function over a convex set. Convexity, along with its numerous implications, has been used to come up with efficient algorithms for many classes of convex programs. Consequently, convex optimization has broadly impacted several disciplines of science and engineering.

In the last few years, algorithms for convex optimization have revolutionized algorithm design, both for discrete and continuous optimization problems. The fastest known algorithms for problems such as maximum flow in graphs, maximum matching in bipartite graphs, and submodular function minimization, involve an essential and nontrivial use of algorithms for convex optimization such as gradient descent, mirror descent, interior point methods, and cutting plane methods. Surprisingly, algorithms for convex optimization have also been used to design counting problems over discrete objects such as matroids. Simultaneously, algorithms for convex optimization have become central to many modern machine learning applications. The demand for algorithms for convex optimization, driven by larger and increasingly complex input instances, has also significantly pushed the state of the art of convex optimization itself.

The goal of this book is to enable a reader to gain an in depth understanding of algorithms for convex optimization. The emphasis is to derive key algorithms for convex optimization from first principles and to establish precise running time bounds in terms of the input length. Given the broad applicability of these methods, it is not possible for a single book to show the applications of these methods to all of them. This book shows applications to fast algorithms for various discrete optimization and counting problems. The applications selected in this book serve the purpose of illustrating a rather surprising bridge between continuous and discrete optimization.

The intended audience includes advanced undergraduate students, graduate students, and researchers from theoretical computer science, discrete optimization, and machine learning.

Get your copy

The book is now available in print and you can order your copy here or here


Errata

A list of known erratum is available here. Further corrections, feedback, and comments are welcome and should be emailed to the author.

Chapters

  • We present the interplay between continuous and discrete optimization. The motivating example is that of the maximum flow problem. We also trace the history of linear programming - from the ellipsoid method to modern interior point methods. We end with some of the recent successes of the ellipsoid method for general convex programming problems such as the maximum entropy problem.

  • We review the mathematical preliminaries required for this book. These include some standard notion and facts from multivariate calculus, linear algebra, geometry, topology, dynamical systems, and graph theory.

  • We introduce convex sets, notions of convexity, and show the power that comes along with convexity: convex sets have separating hyperplanes, subgradients exist, and locally optimal solutions of convex functions are globally optimal.

  • We present the notion of convex optimization and discuss formally what it means to solve a convex program efficiently as a function of the representation length of the input and the desired accuracy.

  • We introduce the notion of Lagrangian duality and show that under a mild condition, called Slater’s condition, strong Lagrangian duality holds. Subsequently, we introduce the Legendre-Fenchel dual that often arises in Lagrangian duality and optimization methods. Finally, we present Kahn-Karush-Tucker (KKT) optimality conditions and their relation to strong duality.

  • We start by presenting the gradient descent method and show how it can be viewed as a steepest descent. Subsequently, we prove a convergence time bound on the gradient descent method when the gradient of the function is Lipschitz continuous. Finally, we use the gradient descent method to come up with a fast algorithm for a discrete optimization problem: computing maximum flow in an undirected graph.

  • We derive our second algorithm for convex optimization – called the mirror descent method – via a regularization viewpoint. First, the mirror descent algorithm is developed for optimizing convex functions over the probability simplex. Subsequently, we show how to generalize it and, importantly, derive the multiplicative weights update (MWU) method from it. This latter algorithm is then used to develop a fast approximate algorithm to solve the bipartite matching problem on graphs.

  • We present Nesterov’s accelerated gradient descent algorithm. This algorithm can be viewed as a hybrid of the previously introduced gradient descent and mirror descent methods. We also present an application of the accelerated gradient method to solving a linear system of equations.

  • We begin our journey towards designing algorithms for convex optimization whose number of iterations scale polylogarithmically with the error. As a first step, we derive and analyze the classic Newton’s method, which is an example of a second-order method. We argue that Newton’s method can be seen as steepest descent on a Riemannian manifold, which then motivates an affinely invariant analysis of its convergence.

  • We build upon Newton’s method and its convergence to derive a polynomial time algorithm for linear programming. Key to the this algorithm is a reduction from constrained to unconstrained optimization using the notion of a barrier function and the corresponding central path.

  • We present various generalizations and extensions of the path following IPM for the case of linear programming. As an application, we derive a fast algorithm for the s-t-minimum cost flow problem. Subsequently, we introduce the notion of self-concordance and give an overview of barrier functions for polytopes and more general convex sets.

  • We introduce a class of cutting plane methods for convex optimization and present an analysis of a special case, namely, the ellipsoid method. We then show how to use this ellipsoid method to solve linear programs over 0-1-polytopes when we only have access to a separation oracle for the polytope.

  • We show how to adapt the ellipsoid method to solve general convex programs. As applications, we present a polynomial time algorithm for submodular function minimization and a polynomial time algorithm to compute maximum entropy distributions over combinatorial polytopes.

Download

Download an older version of the entire book as a pdf (password: dohwAj-serkiw-4gamny). Note that this version has several typos that have been corrected in the print version.

Copyright

This material will be published by Cambridge University Press as Algorithms for Convex Optimization by Nisheeth K. Vishnoi. This pre-publication version is free to view and download for personal use only. Not for re-distribution, re-sale or use in derivative works. © Nisheeth K. Vishnoi 2020.

About the author

Nisheeth Vishnoi is a A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. His research focuses on foundational problems in computer science, machine learning, and optimization. He is also broadly interested in understanding and addressing some of the key questions that arise in nature and society from a computational viewpoint. Here, his current focus is on natural algorithms, the emergence of intelligence, and fairness in algorithmic decision making.

He is the recipient of the Best Paper Award at IEEE FOCS in 2005, the IBM Research Pat Goldberg Memorial Award in 2006, the Indian National Science Academy Young Scientist Award in 2011, the IIT Bombay Young Alumni Achievers Award in 2016, and the Best Technical Paper award at ACM FAT* in 2019. He is a Fellow of the ACM.

He holds a B. Tech., Computer Science and Engineering, Indian Institute of Technology Bombay, 1995-1999, Ph. D., Algorithms, Combinatorics and Optimization, Georgia Institute of Technology, 1999-2004.