Separation principle in stochastic control

The separation principle is one of the fundamental principles of stochastic control theory, which states that the problems of optimal control and state estimation can be decoupled under certain conditions. In its most basic formulation it deals with a linear stochastic system
 * $$\begin{align}

dx & =A(t)x(t)\,dt+B_1(t)u(t)\,dt+B_2(t)\,dw \\ dy & =C(t)x(t)\,dt +D(t)\,dw \end{align}$$ with a state process $$x$$, an output process $$y$$ and a control $$u$$, where $$w$$ is a vector-valued Wiener process, $$x(0)$$ is a zero-mean Gaussian random vector independent of $$w$$, $$y(0)=0$$, and $$A$$, $$B_1$$, $$B_2$$, $$C$$, $$D$$ are matrix-valued functions which generally are taken to be continuous of bounded variation. Moreover, $$DD'$$ is nonsingular on some interval $$[0,T]$$. The problem is to design an output feedback law $$\pi:\, y \mapsto u$$ which maps the observed process $$y$$ to the control input $$u$$ in a nonanticipatory manner so as to minimize the functional

J(u) = \mathbb{E}\left\{ \int_0^T x(t)'Q(t)x(t)\,dt+\int_0^Tu(t)'R(t)u(t)\,dt +x(T)'Sx(T)\right\}, $$ where $$\mathbb{E}$$ denotes expected value, prime ($$'$$) denotes transpose. and $$Q$$ and $$R$$ are continuous matrix functions of bounded variation, $$ Q(t)$$ is positive semi-definite and $$R(t)$$ is positive definite for all $$t$$. Under suitable conditions, which need to be properly stated, the optimal policy $$\pi$$ can be chosen in the form

u(t)=K(t)\hat x(t), $$ where $$\hat x(t)$$ is the linear least-squares estimate of the state vector $$x(t)$$ obtained from the Kalman filter

d\hat x=A(t)\hat x(t)\,dt+B_1(t)u(t)\,dt +L(t)(dy-C(t)\hat x(t)\,dt),\quad \hat x(0)=0, $$ where $$K$$ is the gain of the optimal linear-quadratic regulator obtained by taking $$B_2=D=0$$ and $$x(0)$$ deterministic, and where $$L$$ is the Kalman gain. There is also a non-Gaussian version of this problem (to be discussed below) where the Wiener process $$w$$ is replaced by a more general square-integrable martingale with possible jumps. In this case, the Kalman filter needs to be replaced by a nonlinear filter providing an estimate of the (strict sense) conditional mean

\hat{x}(t)= \operatorname E\{ x(t)\mid {\cal Y}_t\}, $$ where

{\cal Y}_t:=\sigma\{ y(\tau), \tau\in [0,t]\}, \quad 0\leq t\leq T, $$ is the filtration generated by the output process; i.e., the family of increasing sigma fields representing the data as it is produced.

In the early literature on the separation principle it was common to allow as admissible controls $$u$$ all processes that are adapted to the filtration $$\{{\cal Y}_t, \, 0\leq t\leq T\}$$. This is equivalent to allowing all non-anticipatory Borel functions as feedback laws, which raises the question of existence of a unique solution to the equations of the feedback loop. Moreover, one needs to exclude the possibility that a nonlinear controller extracts more information from the data than what is possible with a linear control law.

Choices of the class of admissible control laws
Linear-quadratic control problems are often solved by a completion-of-squares argument. In our present context we have

J(u)=\operatorname{E}\left\{ \int_0^T(u-Kx)'R(u-Kx) \, dt\right\} +\text{terms that do not depend on }u, $$ in which the first term takes the form
 * $$\begin{align}

\operatorname{E}\left\{ \int_0^T(u-Kx)'R(u-Kx)\,dt\right\}=\operatorname{E}\left\{\int_0^T[(u-K\hat{x})'R(u-K\hat{x})+\operatorname{tr}(K'RK\Sigma)] \, dt\right\}, \end{align}$$ where $$\Sigma$$ is the covariance matrix

\Sigma(t):=\operatorname{E}\{[x(t)-\hat{x}(t)][x(t)-\hat{x}(t)]'\}. $$ The separation principle would now follow immediately if $$\begin{align}\Sigma\end{align}$$ were independent of the control. However this needs to be established.

The state equation can be integrated to take the form

x(t)=x_0(t)+\int_0^t \Phi(t,s)B_1(s)u(s) \, ds, $$ where $$x_0$$ is the state process obtained by setting $$u=0$$ and $$\Phi$$ is the transition matrix function. By linearity, $$\hat{x}(t)=\operatorname{E}\{x(t)\mid {\cal Y}_t\}$$ equals

\hat{x}(t)=\hat{x}_0(t)+\int_0^t \Phi(t,s)B_1(s)u(s)\,ds, $$ where $$\hat{x}_0(t)=\operatorname{E}\{x_0(t)\mid {\cal Y}_t\}$$. Consequently,

\Sigma(t):=\mathbb{E}\{[x_0(t)-\hat{x}_0(t)][x_0(t)-\hat{x}_0(t)]'\}, $$ but we need to establish that $$\begin{align}\hat{x}_0\end{align}$$ does not depend on the control. This would be the case if

{\cal Y}_t ={\cal Y}_t^0:=\sigma\{ y_0(\tau), \tau\in [0,t]\}, \quad 0\leq t\leq T, $$ where $$y_0$$ is the output process obtained by setting $$u=0$$. This issue was discussed in detail by Lindquist. In fact, since the control process $$u$$ is in general a nonlinear function of the data and thus non-Gaussian, then so is the output process $$y$$. To avoid these problems one might begin by uncoupling the feedback loop and determine an optimal control process in the class of stochastic processes $$u$$ that are adapted to the family $$\{ {\cal Y}_t^0\}$$ of sigma fields. This problem, where one optimizes over the class of all control processes adapted to a fixed filtration, is called a stochastic open loop (SOL) problem. It is not uncommon in the literature to assume from the outset that the control is adapted to $$\{ {\mathcal Y}_t^0\}$$; see, e.g., Section 2.3 in Bensoussan, also van Handel and Willems.

In Lindquist 1973 a procedure was proposed for how to embed the class of admissible controls in various SOL classes in a problem-dependent manner, and then construct the corresponding feedback law. The largest class $$\Pi$$ of admissible feedback laws $$\pi$$ consists of the non-anticipatory functions $$u:=\pi(y)$$ such that the feedback equation  has a unique solution and the corresponding control process $$u_\pi$$ is adapted to $$\{{\mathcal Y}_t^0\}$$. Next, we give a few examples of specific classes of feedback laws that belong to this general class, as well as some other strategies in the literature to overcome the problems described above.

Linear control laws
The admissible class $$\Pi$$ of control laws could be restricted to contain only certain linear ones as in Davis. More generally, the linear class

({\mathcal L})\quad u(t)=\bar{u}(t)+\int_0^tF(t,\tau)\,dy, $$ where $$\bar{u}$$ is a deterministic function and $$F$$ is an $$L_2$$ kernel, ensures that $$\Sigma$$ is independent of the control. In fact, the Gaussian property will then be preserved, and $$\hat{x}$$ will be generated by the Kalman filter. Then the error process $$\tilde{x}:= x-\hat{x}$$ is generated by

d\tilde{x}=(A-LC)\tilde{x}\,dt +(B_2-LD)\,dw, \quad \tilde{x}(0)=x(0), $$ which is clearly independent of the choice of control, and thus so is $$\Sigma$$.

Lipschitz-continuous control laws
Wonham proved a separation theorem for controls in the class $$\begin{align}\pi:\, u(t)=\psi(t,\hat{x}(t))\end{align}$$, even for a more general cost functional than J(u). However, the proof is far from simple and there are many technical assumptions. For example, $$\begin{align}C(t)\end{align}$$ must square and have a determinant bounded away from zero, which is a serious restriction. A later proof by Fleming and Rishel is considerably simpler. They also prove the separation theorem with quadratic cost functional $$J(u)$$ for a class of Lipschitz continuous feedback laws, namely $$u(t)=\phi(t,y)$$, where $$\phi:\, [0,T]\times C^n [0,T]\to{\mathbb R}^m$$ is a non-anticipatory function of $$y$$ which is Lipschitz continuous in this argument. Kushner proposed a more restricted class $$u(t)=\psi(t,\hat{\xi}(t))$$, where the modified state process $$\hat{\xi}$$ is given by

\hat{\xi}(t)=\operatorname{E}\{ x_0(t)\mid {\mathcal Y}_t^0\}+ \int_0^t \Phi(t,s)B_1(s)u(s)\,ds, $$ leading to the identity $$\begin{align}\hat{x}=\hat{\xi}\end{align}$$.

Imposing delay
If there is a delay in the processing of the observed data so that, for each $$t$$, $$u(t)$$ is a function of $$y(\tau); \, 0\leq\tau\leq t-\varepsilon$$, then $${\cal Y}_t ={\cal Y}_t^0$$, $$0\leq t\leq T$$, see Example 3 in Georgiou and Lindquist. Consequently, $$\Sigma$$ is independent of the control. Nevertheless, the control policy $$\pi$$ must be such that the feedback equations have a unique solution.

Consequently, the problem with possibly control-dependent sigma fields does not occur in the usual discrete-time formulation. However, a procedure used in several textbooks to construct the continuous-time $$\Sigma$$ as the limit of finite difference quotients of the discrete-time $$\Sigma$$, which does not depend on the control, is circular or a best incomplete; see Remark 4 in Georgiou and Lindquist.

Weak solutions
An approach introduced by Duncan and Varaiya and  Davis and Varaiya, see also Section 2.4 in Bensoussan is based on weak solutions of the  stochastic differential equation. Considering such solutions of

dx =A(t)x(t)\,dt+B_1(t)u(t)\,dt+B_2(t)\,dw $$ we can change the probability measure (that depends on $$\begin{align}u\end{align}$$) via a Girsanov transformation so that

d\tilde{w}:= B_1(t)u(t)\,dt+B_2(t)\,dw $$ becomes a new Wiener process, which (under the new probability measure) can be assumed to be unaffected by the control. The question of how this could be implemented in an engineering system is left open.

Nonlinear filtering solutions
Although a nonlinear control law will produce a non-Gaussian state process, it can be shown, using nonlinear filtering theory (Chapters 16.1 in Lipster and Shirayev ), that the state process is conditionally Gaussian given the filtration $$\begin{align}\{{\mathcal Y}_t\}\end{align}$$. This fact can be used to show that $$\begin{align}\hat{x}\end{align}$$ is actually generated by a Kalman filter (see Chapters 11 and 12 in Lipster and Shirayev ). However, this requires quite a sophisticated analysis and is restricted to the case where the driving noise $$\begin{align}w\end{align}$$ is a Wiener process.

Additional historical perspective can be found in Mitter.

Issues on feedback in linear stochastic systems
At this point it is suitable to consider a more general class of controlled linear stochastic systems that also covers systems with time delays, namely
 * $$\begin{align}

z(t) & =z_0(t) + \int_0^t G(t,s)u(s)\,ds \\ y(t) & = Hz(t) \end{align}$$ with $$\begin{align}z_0\end{align}$$ a stochastic vector process which does not depend on the control. The standard stochastic system is then obtained as a special case where $$z=[x',y']'$$, $$z_0=[x_0',y_0']'$$ and $$H=[I,0]$$. We shall use the short-hand notation

z=z_0+g\pi Hz $$ for the feedback system, where

g\;:\; (t,u) \mapsto \int_0^t G(t,\tau)u(\tau)\,d\tau $$ is a Volterra operator.

In this more general formulation the embedding procedure of Lindquist defines the class $$\Pi$$ of admissible feedback laws $$\pi$$ as the class of non-anticipatory functions $$u:=\pi(y)$$ such that the feedback equation $$z=z_0+g\pi Hz$$ has a unique solution $$z_\pi$$ and $$u=\pi(Hz_\pi)$$ is adapted to $$\{{\mathcal Y}_t^0\}$$.

In Georgiou and Lindquist a new framework for the separation principle was proposed. This approach considers stochastic systems as well-defined maps between sample paths rather than between stochastic processes and allows us to extend the separation principle to systems driven by martingales with possible jumps. The approach is motivated by engineering thinking where systems and feedback loops process signals, and not stochastic processes per se or transformations of probability measures. Hence the purpose is to create a natural class of admissible control laws that make engineering sense, including those that are nonlinear and discontinuous.

The feedback equation $$z=z_0+g\pi Hz$$ has a unique strong solution if there exists a non-anticipating function $$F$$ such that $$z=F(z_0)$$ satisfies the equation with probability one and all other solutions coincide with $$z$$ with probability one. However, in the sample-wise setting, more is required, namely that such a unique solution exists and that $$z=z_0+g\pi Hz$$ holds for all $$z_0$$, not just almost all. The resulting feedback loop is deterministically well-posedin the sense that the feedback equations admit a unique solution that causally depends on the input for each input sample path.

In this context, a signal is defined to be a sample path of a stochastic process with possible discontinuities. More precisely, signals will belong to the Skorohod space $$D$$, i.e., the space of functions which are continuous on the right and have a left limit at all points (càdlàg functions). In particular, the space $$C$$ of continuous functions is a proper subspace of $$D$$. Hence the response of a typical nonlinear operation that involves thresholding and switching can be modeled as a signal. The same goes for sample paths of counting processes and other martingales. A system is defined to be a measurable non-anticipatory map $$D\to D$$ sending sample paths to sample paths so that their outputs at any time $$t$$ is a measurable function of past values of the input and time. For example, stochastic differential equations with Lipschitz coefficients driven by a Wiener process induce maps between corresponding path spaces, see page 127 in Rogers and Williams, and pages 126-128 in Klebaner. Also, under fairly general conditions (see e.g., Chapter V in Protter ), stochastic differential equations driven by martingales with sample paths in $$D$$ have strong solutions who are semi-martingales.

For the time setting $$f(z):=g\pi Hz$$, the feedback system $$z=z_0+g\pi Hz$$ can be written $$z=z_0+f(z)$$, where $$z_0$$ can be interpreted as an input.

Definition. A feedback loop $$z=z_0+f(z)$$ is deterministically well-posed if it has a unique solution $$z\in D$$ for all inputs $$z_0\in D$$ and $$(1-f)^{-1}$$ is a system.

This implies that the processes $$z$$ and $$z_0$$ define identical filtrations. Consequently, no new information is created by the loop. However, what we need is that $${\cal Y}_t ={\cal Y}_t^0$$ for $$0\leq t\leq T$$. This is ensured by the following lemma (Lemma 8 in Georgiou and Lindquist ).

Key Lemma. If the feedback loop $$z=z_0+g\pi Hz$$ is deterministically well-posed, $$g\pi$$ is a system, and $$H$$ is a linear system having a right inverse $$H^{-R}$$ that is also a system, then $$(1-Hg\pi)^{-1}$$ is a system and $${\cal Y}_t ={\cal Y}_t^0$$ for $$0\leq t\leq T$$.

The condition on $$H$$ in this lemma is clearly satisfied in the standard linear stochastic system, for which $$H=[0,I]$$, and hence $$H^{-R}=H'$$. The remaining conditions are collected in the following definition.

Definition. A feedback law $$\pi$$ is deterministically well-posed for the system $$z=z_0+g\pi Hz$$ if $$g\pi$$ is a system and the feedback system $$z=z_0+g\pi Hz$$ deterministically well-posed.

Examples of simple systems that are not deterministically well-posed are given in Remark 12 in Georgiou and Lindquist.

A separation principle for physically realizable control laws
By only considering feedback laws that are deterministically well-posed, all admissible control laws are physically realizable in the engineering sense that they induce a signal that travels through the feedback loop. The proof of the following theorem can be found in Georgiou and Lindquist 2013.

Separation theorem. Given the linear stochastic system

\begin{align} dx & =A(t)x(t)\,dt+B_1(t)u(t)\,dt+B_2(t)\,dw \\ dy & =C(t)x(t)\,dt +D(t)\,dw \end{align} $$ where $$w$$ is a vector-valued Wiener process, $$x(0)$$ is a zero-mean Gaussian random vector independent of $$w$$, consider the problem of minimizing the quadratic functional J(u) over the class of all deterministically well-posed feedback laws $$\pi$$. Then the unique optimal control law is given by $$u(t)=K(t)\hat{x}(t)$$ where $$K$$ is defined as above and $$\hat{x}$$ is given by the Kalman filter. More generally, if $$w$$ is a square-integrable martingale and $$x(0)$$ is an arbitrary zero mean random vector, $$u(t)=K(t)\hat{x}(t)$$, where $$\hat{x}(t)=\operatorname{E}\{x(t)\mid {\cal Y}_t\}$$, is the optimal control law provided it is deterministically well-posed.

In the general non-Gaussian case, which may involve counting processes, the Kalman filter needs to be replaced by a nonlinear filter.

A Separation principle for delay-differential systems
Stochastic control for time-delay systems were first studied in Lindquist, and Brooks, although Brooks relies on the strong assumption that the observation $$y$$ is functionally independent of the control $$u$$, thus avoiding the key question of feedback.

Consider the delay-differential system
 * $$\begin{align}

dx   &=\left(\int_{t-h}^t d_s\,A(t,s)x(s)\right) \,dt + B_1(t)u(t)\,dt+B_2(t)\,dw   \\ dy  & =\left(\int_{t-h}^t d_s\,C(t,s)x(s)\right) \,dt +D(t)\,dw \end{align}$$ where $$w$$ is now a (square-integrable) Gaussian (vector) martingale, and where $$\begin{align}A\end{align}$$ and $$C$$ are of bounded variation in the first argument and continuous on the right in the second, $$x(t)=\xi(t)$$ is deterministic for $$-h\leq t\leq 0$$, and $$y(0)=0$$. More precisely, $$A(t,s)=0$$ for $$s\geq t$$, $$A(t,s)=A(t,t-h)$$ for $$t\leq t-h$$, and the total variation of $$s\mapsto A(t,s)$$ is bounded by an integrable function in the variable $$t$$, and the same holds for $$C$$.

We want to determine a control law which minimizes

J(u)=\operatorname{E}\left(\int_0^T x(t)'Q(t)x(t)\,d\alpha(t)+\int_0^Tu(t)'R(t)u(t)\,dt\right), $$ where $$\begin{align}d\alpha\end{align}$$ is a positive Stieltjes measure. The corresponding deterministic problem obtained by setting $$\begin{align}w=0\end{align}$$ is given by

u(t)=\int_{t-h}^t d_\tau \, K(t,\tau)x(\tau), $$ with $$\begin{align}K\end{align}$$.

The following separation principle for the delay system above can be found in Georgiou and Lindquist 2013 and generalizes the corresponding result in Lindquist 1973

Theorem. There is a unique feedback law $$\begin{align}\pi:\, y\mapsto u\end{align}$$ in the class of deterministically well-posed control laws that minimizes $$\begin{align}J(u)\end{align}$$, and it is given by

u(t)=\int_{t-h}^t d_s \, K(t,s)\hat{x}(s\mid t), $$ where $$K$$ is the deterministic control gain and $$\hat{x}(s\mid t) := E\{ x(s)\mid {\cal Y}_t\}$$ is given by the linear (distributed) filter
 * $$\begin{align}

d\hat{x}(t\mid t) &  =\int_{t-h}^t d_s \, A(t,s)\hat{x}(s\mid t) \, dt +B_1u\,dt+ X(t,t)\,dv \\ d\hat{x}(t\mid t) &  =\int_{t-h}^t d_s \, A(t,s)\hat{x}(s\mid t) \, dt +B_1u\,dt+ X(t,t)\,dv \end{align}$$ where $$v$$ is the innovation process

dv=dy - \int_{t-h}^t d_sC(t,s)\hat{x}(s\mid t)\, dt, \quad v(0)=0, $$ and the gain $$x$$ is as defined in page 120 in Lindquist.