Cross Entropy Method

In the Cross Entropy Method, we wish to estimate the likelihood

Here X is a random variable whose distribution is known and belongs to a parametrized family of densities f( , v). Further S(X) is often a solution to an optimization problem.

It is assumed that l is a relatively rare event, say of order 10^{-5}. One way to estimate l is to take N IID samples X_1,...,X_N and then average

This gives us an unbiased estimate. However, because the event is rare, we may require a large number of samples N in order to provide a good estimate of l. Importance sampling is a straightforward method that can mitigate these effects. Here sample each X_i from a different density g(\cdot) and perform the estimate

where W(x) = f(x)/g(x). Such a change can improve the estimation of l. For instance, the choice g^*(x)= {\mathbb I} [ S(x) \geq \gamma ] f(x)/l gives the correct estimate of l with certainty. Clearly g^* cannot work in practice; it requires knowledge of l, the parameter which we are estimating. Nonetheless a good estimate of g^* may be possible. The point of the cross entropy method is to find a reasonable choice for g so that importance sampling can be applied.

In particular, the cross-entropy method advocates estimating g^*(\cdot) by f(\cdot ; v) according to a parameter choice that minimizes the relative entropy between g^* and f(\cdot, v). Namely,

With a little rearrangement, we see that this is equivalent to the optimization

With importance sampling in mind, we apply a change of measure to the above integral to get

where W(x; u,w)=f(x;u)/f(x;w). The above optimization may not be explicitly solvable so we replace it with the (importance sampled) optimization

where X_i are IID samples from density f(\cdot ; w).

In many examples the above optimization in concave and solvable either by descent methods or by differentiating and solving for the condition

The reason for incorporating importance sampling into the estimate is because the event \{ \} could be too unlikely to form an accurate estimate under the choice w=v. The idea behind the cross entropy method is to apply a multi-level approach where we sequentially increase \gamma and w as follows

  • We estimate the \rho-th percentile of S(X) by taking by ordering the samples S(X_1)\leq S(X_2) \leq ... \leq S(X_N) and calculating the empirical \rho-th percentile, which gives the update

The event \rho is chosen to be reasonably likely, e.g. \rho = 0.01. Note that since, in principle, the parameter \gamma' is such that less likely \{ S(X) \geq \gamma' \} under v when compared with the previous choice of \gamma.

  • Given this \gamma', we find v' as the best estimate according to our cross-entropy rule . This, in principle, should increase the likelihood of the event \{ S(X) \geq \gamma' \} when compared with prior choice of v.
  • When \gamma' passes the required target level \gamma then the have found a parameter choice that gives reasonable importance to the event \{ S(X) \geq \gamma\} and we can calculate l with importance sampling rule .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s