Reformulations of Basic Problems
Last time, we briefly mentioned the issues with state augmentation to deal with reformulations to the basic problem. In this post, we'll go into a bit more detail about how to deal with such reformulations
As a general rule of thumb, when we augment state, we want to include at time \(k\) all the information known to the DDS at the time into the state variable at time \(k\).
Time Delays
Consider the situation in which the DDS is no longer Markovian, i.e., \(x_{k+1}\) depends on \(x_0,\ldots,x_k\) and \(u_0,\ldots,u_k\). This is essentially impossible to handle in an open loop basic problem without sacrificing the dynamic programming algorithm, but we can use state augmentation to reformulate the basic problem in the closed case. Let
$$\tilde{x_k}=(x_0,\ldots,x_k,u_0,\ldots,u_{k-1})$$Thus, we've reformulated the problem into a standard closed loop basic problem over states \(\tilde{x_k}\). This of course didn't come without a steep cost. The state space has increased from \(S\) to at most \(S^{2k}\).
For this reason, we often consider partially Markovian models, i.e., models in which \(x_{k+1}\) depends on \(x_{k-i},\ldots,x_k\), \(u_{k-i},\ldots,u_k\) for some manageable \(i\), so that the state space increases to at most \(S^{2i}\) instead. We call this a time delay of \(i\)
Correlated Disturbances
Consider the modification to the DDS model in which \(w_k\) now can depend on previous \(w_i\), i.e., the disturbances are correlated. There is no good way to handle this situation in general, but if we make the assumption that each \(w_k\) can be written as follows (with \(\lambda\) a constant and \(\eta_k\) independent random variables)$$w_k=\lambda w_{k-1}+\eta_k$$
We can modify the problem into a basic problem. In fact, it's as simple as letting \(\tilde x_k=(x_k,w_k)\). See if you can figure out all of the details involved in converting this setup into a basic problem. This can even be generalized to multiple dimensions with matrix coefficients.
Besides the state space growth (which in this case isn't too bad), the main issue with this approach is the assumption of perfect state knowledge. To be able to run our algorithm, we need to be able to perfectly measure the randomness and previous \(x_k\) state of the DDS in generating the state \(x_{k+1}\). But often, we can't do either, since \(w_k\) is generally an uncertainty on our measurement of \(x_k\). To deal with these sorts of situations, we'll have to wait until Chapter 4.
Forecasts
Suppose that we no longer are given the probability distribution \(P_k\), but rather are told that we now have a set of probability distributions \(Q_k\) (a finite set), and there exists a probability distribution \(\zeta_k\) on \(Q\) representing the likelihood of a particular distribution in \(Q\) being the one that determines the disturbance.
We call \(\zeta\) the forecast, since we are informed of its output before making the decision \(\mu\). (For simplicity, let \(\zeta_k^i\) refer to the probability of the \(i\)th element of \(Q\) being selected at time \(k\)).
As a real world example, \(\zeta_k\) could be a forecast of the demand for a product on year \(k\): small, medium or large. How do we convert this into a basic problem?
It's quite simple: let \(\tilde{x_k}=(x_k,\zeta_k)\) and the stochastic term be \(\tilde{w_k}=(w_k,\zeta_k)\). Can you see how to finish the setup?
Simplifying Uncontrollable Components
In the three examples above, we've seen how to take complications in the setup of the basic problem, and via state augmentation, rewrite the complications as part of the basic problem. Of course, this adds to the size of the state space, which can very easily become unmanageable.
In general, there is no good way to deal with this expansion, other than heuristics. However, there is a special case that comes up relatively often.
Suppose we've augmented our space to have state \(\tilde{x_k}=(x_k,y_k)\), where \(x_k\) is the original state variable (\(x_{k+1}=f(\tilde{x_k},u_k,P(\cdot\mid\tilde{x_k},y_k))\)) and \(y_k\) is dependent on \(x_k\) stochastically. However, the controller observes the value of \(y_k\) prior to picking \(u_k\) (unlike \(w_k\)). Define$$\tilde{V_k}(x_k)=E_{y_k}\set{V_k(\tilde{x_k}\mid x_k)}$$
Via basic arithmetic, we have $$\tilde{V_k}(x_k)=E_{y_k}\Set{\min\limits_{u_k\in C_k(\tilde{x_k})}E_{w_k}\Set{g_k(\tilde{x_k},u_k,w_k)+\tilde{V_k}(f_k(\tilde{x_k},u_k,w_k))}\mid x_k}$$
We now can use the same approach as the Dynamic Programming Algorithm. The advantage is the size of the state space. If we just used the state augmentation approach, the state space size would be \(n\cdot m\), with \(n\) the state space size of \(x_k\) and \(m\) the state space size of \(y_k\). But, this approach leaves us with a state space of size \(n\).
Can you see how to directly simplify the augmentation of forecasts?