Mathematical Issues
In this post, we take a slight detour into making rigorous some of the handwaving earlier on we did to establish the basic problem. For those of you who don't care about definitional considerations, you may want to skip this post.
What really is a basic problem?
We can summarize our setup of the basic problem into the following steps:
- The controller observes \(x_k\) and applies \(u_k=\mu_k(x_k)\)
- The disturbance is generated by \(w_k\sim P_k(\cdot\mid x_k,u_k)\)
- The cost \(g_k(x_k,u_k,w_k)\) is incurred and added to the previous costs
- The next state is generated by \(x_{k+1}=f_k(x_k,u_k,w_k)\)
Of course, if we are at the final state, we skip the last step and instead add the terminal cost \(g_n(x_n)\) to finish the process. This all seems fine, and it is, until we want to do anything interesting with the basic problem.
Minimizing Costs, rigorously
In our Dynamic Programming Algorithm, we define our "cost-to-go" \(V_k\) by evaluting the expected value of \(g_k\). This, of course, assumes \(g_k\) can be interpreted as a random variable. Let's try to do that.
In order to do so without much effort, we make some basic assumptions about basic problems (haha). In particular, we'd like to assume that \(P_k(\cdot\mid x_k,u_k)\) for any \(x_k,u_k\) are distributions over at most countable sets, and for any policy, the cost \(g_k\) at any step is finite.
In this case, we can define a new term \(\tilde{g_k}\) as follows:$$\tilde{g_k}(x_k,\mu_k(x_k))=E_{w_k\sim P_k(\cdot\mid x_k,\mu_k(x_k))}\Set{g_k(x_k,\mu_k(x_k),w_k)\mid x_k,\mu_k(x_k)}$$In other words, \(\tilde{g_k}\) is a random variable dependent on \(x_k,\mu_k(x_k)\). The expected value is well defined because of the countability of the state space of \(P_k\) and finiteness of the cost. Thus, we can let the total cost be $$V_\pi(x_0)=E_{x_1,\ldots,x_n}\Set{g_n(x_n)+\sum\limits_{k=0}^{n-1}\tilde{g_k}(x_k,\mu_k(x_k))}$$This cost can be interpreted as a random variable by letting the state space be the cartesian product of \(S_k\) for each \(k\).