Quite often we need to make decisions. These decisions will typically depend on some information that we have obtained, either through direct observation or because it has been communicated to us by someone (or something) else. This *side information* is not usually perfect. For example, our measurements will not be without error and information communicated to us will be subject to imperfections in the communication medium.

How can we understand the effect of imperfect side information on our decisions?

More precisely, suppose that we seek to solve an optimization problem

,

where is what we want to optimize, is the side information we have, are some constraints on , and is some function that describes how good our decision is. We use the notation to distinguish between the variables that we are optimizing over and the side information that we have.

Now, what if our side information is not perfect. This means that our observations differ from the reality . The cost of this imperfect side information is then given by

,

where .

In general, it is very difficult to compute since we typically cannot write the solution in closed-form. So, what can we do?

Recently with Slava Kungurtsev, I have been looking into this problem. Our approach is based on a *perturbative framework*. That is, we evaluate the cost when differs by only a small amount from . In particular, we start with a Taylor series of the *optimal value* of

,

where is the error in the side information and is the normalized direction.

It is important to note that we are not taking a Taylor series of the function , which we typically can obtain in closed-form. Instead, we are expanding the *optimal value* of . The key consequence of this is that it is not immediately obvious how to evaluate the directional derivative .

It appears to be John Danskin in the 60s who first studied how to evaluate directional derivatives of optimal value functions. Let’s start with the simplest form of his result.

Proposition:Let the real valued function be twice differentiable on a compact convex subset of , strictly concave in for all . Let be the optimal value of on for fixed . Denote . Then, the directional derivative of is given by .

The theorem can be proved as follows. Under the hypotheses of the theorem, we can apply the implicit function theorem to write the (unique) optimal value of the variable in terms of the side information . In particular, for side information . Our optimal value function is then given by .

To obtain the theorem, we now just need to differentiate with respect to . By applying the chain rule for multivariable functions, we have

.

Since is the optimal solution, it follows that and

as required.

To illustrate how to actually apply this approach, let’s start with a simple application of Theorem 1. Recently, I have been studying -stable noise channels (see our ISIT2016 paper). A tractable lower bound of the capacity is given by

,

where and are parameters analogous to the variance for the power constrained Gaussian noise channel (for more details see our paper). As such, we can consider the achievable rate of parallel -stable noise channels and find the optimal choice of input signal parameters (closely related to the power). In particular, we have the following optimization problem

subject to

and

.

Observe that in the case (when the noise is Gaussian), we recover the standard waterfilling problem. Moreover, the optimization problem above is strictly convex but does not have a closed-form solution.

Suppose we want to know how affects the optimal rate (obtained by solving the optimization problem). As we observed just before, the Gaussian case corresponds to the waterfilling problem and forms a natural thing to compare the rate for other values of . Taking the Taylor series expansion and applying Theorem 1, we have

,

with the full equation given in our ISIT2016 paper.

The result is illustrated in the following figure (where we use the approximation obtained by ignoring the little-o term). Observe that the approximation from our perturbative approach is in good agreement with a numerical estimate for a range of . This suggests that a perturbative approach is a promising way of studying otherwise intractable value functions.

We can still go further. In the example above, we only allowed perturbations in . It is also possible to consider perturbations in or . The complex case where perturbations are in is especially important in wireless communications as the side information can be interpreted in terms of channel state information.

In fact, we have worked through the details in the case of multi-user MIMO with limited feedback and optimal power control. In a future post I will describe how we can deal with perturbations in and what we learn from the multi-user MIMO case study.