Scalar DT LQR

The questions below are due on Friday December 06, 2024; 05:00:00 PM.

You are not logged in.
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

Goals

In this problem, we will examine a scalar example of the Ricatti Equation used in LQR optimization.

Intro

LQR is an optimization approach for computing state-feedback gains, one that is based on picking the gains to minimize a user-selected weighted sum the squared tracking error and the squared control input, both accumulated over time. But in general, optimization-based control system design strategies all follow a common set of steps:

Create a DT state-space model for the system to controlled.
Determine a controller performance metric. When using LQR, that means deciding on state and input weights, \textbf{Q} and \textbf{R} (matrices in the general vector case), based on system constraints.
With the model and the metric, determine the cost to drive an initial condition to zero, as a function of the state-feedback gain matrix, \textbf{K}.
With the cost as a function of gain, use an optimizer to find the cost-minizing gain matrix, \textbf{K}_{opt}.
Determine the measured outputs for the system, and an observer model for state-estimation.
Decide on a state-estimation performance metric.
With the model and metric, determine the cost to drive an initial state estimation error to zero, as a function of a measured-output-feedback gain matrix \textbf{L}.
With the cost as a function of gain, use an optimizer to find the cost-minizing gain matrix, \textbf{L}_{opt}.
Use \textbf{K}_{opt} in a state-space controller and \textbf{L}_{opt} in an observer-based state-estimator.

From the above perspective, insisting on a weighted sum-of-squares cost function seems overly restrictive. For example, one could specify problem-specific constraints like: be within one percent of steady-state in seven seconds; limit the motor current to less than 1.5 amps; prevent any states from exceeding twice their initial values. What is wrong optimizing the gains while satisfying such constraints? The answer is that a good formulation is hard to find, much can go wrong. Constraint sets can be impossible to resolve, cost functions can be computationally-intractable to optimize, feasibility checks and/or the cost evaluations can require analyses beyond the canonical initial condition problems. And even if the optimization succeeds, the resulting \textbf{K}_{opt} could be fragile, meaning that small mismatches between model and system can decimate performance.

In this section, we will use the scalar problem to show that the LQR optimization to determine \textbf{K}_{opt} is not difficult. In the scalar case, it involves solving a quadratric, but in the vector case, it requires an eigendecomposition. We also note another important idea in LQR, that the cost function can be summarized by a state-independent matrix, \textbf{P}.

A First-Order Wind Turbine Example

Consider the wind turbine generator, like the one in the figure below, with fins that help it automatically align to incoming wind direction. Imagine adding a speed-adjustable friction-coupled rotor beneath the turbine, and then controlling the rotor speed to improve the alignment between the turbine and the optimal wind direction. A good feedback controller should align the turbine faster, and minimize the extent and duration of misalignment due to cross-wind gusts.

To design a state-feedback controller for the rotor-turbine system, we first need a state-space model. Consider a simple be a first-order LDE model,

x[n] = a x[n-1] + b u[n-1].

where x is the misalignment angle, or more specifically, the angle between the turbine and the dominate wind direction; a is the one-period misalignment reduction factor (due to the fins and the rotor's friction coupling), u is the rotational speed of the rod rotor, and b = \Delta T is the sample period. If the fins always realign the turbine after a cross-wind gust, even without feedback ( u = 0 ), then the a in our model must have a magnitude less than one, |a| < 1.

One particularly effective approach to analyzing the system's performance is to model a misalignment produced by a wind gust as a non-zero initial condition, x[0] , equal to the misalignment immediately after the gust. In that case, we can solve the LDE and determine an explicit formula for x[n],

x[n] = a^nx[0].

If a \lt 1 , but is very close to one, then x[n] will go to zero very slowly. As a numerical example, suppose the sample period is ten milliseconds (0.01 seconds) and a = 0.9999 . The value of a is so close to one because, in one hundredth of a second, the fins will not realign the turbine by very much. In fact, after two minutes,

x[120] = 0.9999^{12000}x[0] \approx 0.3 x[0]

and the turbine is only two-thirds of the way to realignment. On a windy day, gusts could recur fast enough that the turbine would be misaligned all the time.

We can reduce the time to realign the turbine using state-feedback to set the rod rotor velocity and steer the turbine back to alignment. To see this, we set the input to u[n] = -k\cdot x[n], and then the feedback system becomes

x[n] = a x[n-1] + b u[n-1] = ax[n-1] -b \cdot k \cdot x[n-1]\\ = (a-b\cdot k)x[n-1] = (0.9999 - 0.01 k)x[n-1].

The question is, how shall we pick k? Currently, our only tool, acker, is based on moving the natural frequencies as close to zero as possible. For this simple problem, the natural frequency is

\lambda = (a-b\cdot k),

but picking k so that the natural frequency is near zero will, just like in the heat distribution problem, require behavior that is beyond reasonable equipment limits.

Misalignment after Two Minutes

Consider the turbine is described by the above difference equation, with a = 0.9999 and the update period \Delta T = 0.01 seconds. And suppose that gusts of wind are only able to misalign the turbine by, at most, one radian.

What is the smallest value of feedback gain, k , that will insure the misalignment is smaller than 0.1 radian after two gust-free minutes (120 seconds).

What is the turbine's approximate maximum rotational velocity in that case?

Deadbeat Control

From the natural frequency point of view, the fastest turbine realignment would be produced using a gain

k_{best?} = 0.9999/0.01 = 99.99,

as then the difference equation becomes

x[n] = a x[n-1] -b \cdot k \cdot x[n-1] = (a-b\cdot k)x[n-1] = (0.9999 - 0.01 \cdot 99.99) x[n-1] = 0 \cdot x[n-1],

x[1] = 0 \cdot x[0] = 0.

In other words, if k \approx 100, the turbine realigns in one step, or in ten milliseconds. The problem is that the maximum rotor speed would have to be

99.99 \cdot x[0] = 99.99 \cdot 1,

and the maximum turbine speed would be approximately

\left| \frac{x[1] - x[0]}{\delta T} \right| = \left| \frac{0 - 1}{\delta T} \right| = 100.

but 100 radians/second (approximately 1000 revolutions per minute) is way too fast!

We do not have to abandon the idea of exact correction in one step, we can also reduce the speed requirements by increasing the sample period. And to increase the sample period, all we have to do is skip samples. For example, if we skip 99 out of every 100 samples, we could rewrite the difference equation in terms of the non-skipped samples.

We can start by defining m = floor(n/100) where floor(arg) is the greatest integer less than arg. Then the non-skipped values of the turbine angle are given by x[m*100] for m = \{1,2,3,...\}. Since the state changes every sample (the turbine does not know we are skipping), but the control only changes when the non-skipped samples change,

x[100m+1] = a x[100m] - b \cdot k \cdot x[100m]

x[100m+2] = a x[100m+1] - b \cdot k \cdot x[100m]

x[100m+3] = a x[100m+2] - b \cdot k \cdot x[100m].\\ \vdots

x[100m+1] = a x[100m] - b \cdot k \cdot x[100m] \\ x[100m+2] = a^2 x[100m] - (a + 1) \cdot b \cdot k \cdot x[100m] \\ x[100m+3] = a^3 x[100m] - (a^2 + a + 1) \cdot b \cdot k \cdot x[100m]. \\ \vdots

We can summarize the equations on the non-skipped samples, denoted \tilde{x}[m] ,

\tilde{x}[m] = a^{100} \tilde{x}[m-1] - \left(\sum_{n=0}^{99}a^{n}\right) \cdot b \cdot k \cdot \tilde{x}[m-1].

Using a = 0.9999 and b = \Delta T = 0.01 , we can evalute the above terms exactly, but the resulting equation is more interpretable if we make a few approximations. Since 0.9999^{100} \approx 0.99 , and \left(\sum_{n=0}^{99}a^{n}\right) \approx 99.5 \approx 100,

\tilde{x}[m] \approx 0.99 \tilde{x}[m-1] - k \cdot \tilde{x}[m-1]

where we have used that 100 \Delta T = 1.

We can now set k \approx 1 to zero the natural frequency, and a one radian misalignment will only cause a one radian/second rotational velocity, or about ten rotations per minute. Very reasonable.

Adjusting the sample period to insure perfect correction in one step is referred to as deadbeat control, because the error is dead (zero) in one beat (one sample period).

Ignoring samples seems unlikely to produce a good controller design, but there are applications when deadbeat control is quite effective. In our case, its behavior would likely be similar to bang-bang control, a strategy that would rotate the rotor forward or backward at its maximum speed, depending only on the sign of the alignment error. A strategy that more gradually decreased the rotor speed as the turbine approached alignment would probably be less mechanically taxing, and we will focus on formulating such a strategy.

Deadbeat is NOT One Step

Consider the second order state-feedback system

\begin{bmatrix}x_1[n] \\ x_2[n]\end{bmatrix} = \begin{bmatrix}1 & \Delta T\\ 0 & 1\end{bmatrix}\begin{bmatrix}x_1[n-1] \\ x_2[n-1]\end{bmatrix} + \begin{bmatrix}0 \\ \Delta T \end{bmatrix}u[n-1]

where \Delta T = 0.01 is the sample period.

If state feedback is used, u[n] = -\textbf{K} \cdot \textbf{x}[n] , what gains will make both natural frequencies zero?

K_1 =

K_2 =

If you use feedback with the above gains, and the initial conditions are x_1[0] = 1 and x_2[0] = 0, what is \textbf{x}[1] and \textbf{x}[2] ? Notice that even though both natural frequencies are zero, the state does not go to zero in one step.

x_1[1]

x_2[1]

x_1[2]

x_2[2]

A Sum of Squares Approach

An alternative approach to determining the feedback gains is to develop a metric of interest, and then pick the gain so as to optimize that metric. The best known metric is the sum-of-squares metric used in linear-quadratic regulators (LQR).

A sum-of-squares approach to the turbine example.

To develop some intuition about the sum-of-squares metric, consider the canonical test problem for our scalar example of the turbine system with feedback. That is, assume the input is proportional to the state alone (no external input), u[n] = k x[n] , but the system has a non-zero initial condition, x[0] . This was also a model of our turbine system after a wind-gust misalignment.

In this canonical case,

x[n] = (a-bk)^n x[0]

and

u[n] = -k x[n] = -k (a-bk)^n x[0].

Given our system has a single input and a single state, we will weight their terms in a sum-of-squares metric with r and q , respectively. With q as the state weight and r as the input weight, the sum-of-squares metric is

\sum_{n=0}^{\infty} \left( q x^2[n] + r u^2[n] \right) = \sum_{n=0}^{\infty} (q + rk^2) x^2[n] = \sum_{n=0}^{\infty} (q + rk^2) (a-bk)^{2n} x^2[0],

where the metric is smaller if x \rightarrow 0 faster, and is larger if u has large values or remains nonzero for a long time.

Assuming the gain is such that the feedback system is stable, |a-bk| \lt 1 , summing the series in the above equation yields a version of the metric in the form

\frac{(q + rk^2)}{1 - (a-bk)^2} x^2[0].

Since we are using the metric to find the best gain, k_{opt} , any gain-independent scaling terms can be factored out. That is,

k_{opt} = k \;\; \text{that minimizes} \;\; \frac{(q + rk^2)}{1 - (a-bk)^2} x^2[0] = k \;\; \text{that minimizes} \;\; \frac{(q + rk^2)}{1 - (a-bk)^2}.

Take note of the fact that we can safely ingore the actual initial condition. This fact remains true even when there is a vector of states.

The limiting cases of q=0, r=1 and q=1, r=0.

If we give a weight of zero to the state, q = 0, then

k_{opt} = k \;\; \text{that minimizes} \;\; \frac{rk^2}{1 - (a-bk)^2} = 0,

As long as k does not destabilize the problem, that is |a - bk | \lt 1 , \frac{rk^2}{1 - (a-bk)^2} \ge 0 , so its minimum value must be zero, which is achieved when k = 0. In other words, if we want to minimize the input "energy", and we do not care about the state, then we should set the input energy to zero.

If we give a weight of zero to the input, r = 0, then

k_{opt} = k \;\; \text{that minimizes} \;\; \frac{q}{1 - (a-bk)^2} = \frac{a}{b}.

To see this, note that (a-bk)^2 \ge 0 since it is a squared real quantity, and (a-bk) \lt 1 by our assumption that the closed-loop system is stable. So \frac{1}{1 - (a-bk)^2} will be smallest when (a-bk)^2 = 0 , leading to the above result.

Using the numerical values from the turbine example, a = 0.9999 and b = 0.01 , and using a nonzero state weight and a zero input weight, q = 1 , and r = 0 , then the cost minimizing k = \frac{a}{b} = 99.99. With this k , the closed loop natural frequency zero, so the turbine will realign in one period.

Since we set r = 0 in our metric, we are ignoring the cost due to a very high input, so we might expect such a result. In fact, we have reprised deadbeat control, with its good and bad properties. That is, when a gust of wind causes a one radian misalignment, the turbine will be realigned extremely quickly (in a hundredth of a second). But, that rapid turbine alignment counts on a rotor velocity command k x \approx 100 , and the controller will try to rotate the turbine at 1000 rpm.

The general q > 0 and r > 0 case.

If we set the input weight, r, to almost any non-zero value, optimizing our cost function will lead to far more reasonable gains. For example, below are normalized plots of the cost function

\frac{(q + rk^2)}{1 - (a-bk)^2}

versus gain (k) for q = 1 and r ranging from 0.01 to 1.0. Notice that the gains associated with minimum of each cost function curve varies from k \approx 1.4 for the r=1 case, to k \approx 11 for the r = 0.01 case.

The LQR cost function using p

We showed that the LQR cost function for the canonical example,

x[n] = (a-bk)^n x[0],

is given by

\sum_{n=0}^{\infty} \left( q x^2[n] + r u^2[n] \right) = \sum_{n=0}^{\infty} (q + rk^2) x^2[n] = \sum_{n=0}^{\infty} (q + rk^2) (a-bk)^{2n} x^2[0] = p x^2[0]

where we are introducing the scalar, p, that relates the square of the state, x^2[0], to the LQR cost function. Note that p is not dependent on the initial state, x[0], but does depend on the weights q and r , the system parameters a and b, and the feedback gain k. That is a lot of parameters, q,r,a,b,k, but since p summarizes the relationship between the initial state and the cost to drive that state to zero, it should depend on all the parameters.

As we showed above, when picking feedback gains based on minimizing the LQR metric, the initial state is irrelevent, but p is important. That is, we are determining the best k by minimizing p,

k_{opt} = k\;\;\; that \;\; minimizes \;\; p.

For this scalar problem, k_{opt} is the value of k that zeros the derivative of p with respect to k . Or more mathematically, we must solve

\frac{dp}{dk} = \frac{d}{dk} \left(\frac{(q + rk^2)}{1 - (a-bk)^2}\right) = 0

for k. However, there is another approach for formulating the p-optimization problem that generalizes to the many state case.

For our scalar case, we were able to determine p by summing the series generated by the difference equation (assuming that the system is stable) to show that p is given by

p = \frac{(q + rk^2)}{1 - (a-bk)^2}.

and by cross-multiplying by (1-(a-bk)^2)

p (1-(a-bk)^2) = (q + rk^2)

which can be reorganized as,

p = (q+rk^2) + p(a-bk)^2.

Note that p appears on both sides of the above equation, and indicates a more fundamental idea. Since we will not be able to sum the series when analyzing the vector case, below we will show another approach for deriving the above equation that exposes this more fundamental property about p. But let us address the optimization first.

Since we do not know k , and we do not know p , the above equation might not seem that helpful. But it is, as we can differentiate the equation with respect to k,

\frac{dp}{dk} = 2rk + 2\frac{dp}{dk} (a-bk)^2 + 2p(a-bk)(-b)

which we can simplify since we are interested in the value of k that minimizes p. We can then find that value of k, k_{opt} , because we know that \frac{dp}{dk} = 0 when k = k_{opt}. Using the zero derivative,

0 = 2rk_{opt} - 2pb(a-bk_{opt})

k_{opt} = \frac{abp}{r+b^2p}.

When we plug the value for k_{opt} in to the equation for p , we get the well-known Ricatti equation for p given the p-minimizing gain. Here, in scalar form, (we let the computer do this algebra),

p_{opt} = a^2p_{opt} + q - \frac{(abp_{opt} )^2}{b^2p_{opt} + r}.

You may wonder why these formulas are any better that just differentiating p. The answer is partly that these equations tell you two things: one, the optimal k, and two, the cost to drive a given state to zero. But another reason for the importance of the above form is that it generalizes to the vector case, or it will, once we find an alternative to summing the series.

The invariance of p.

Suppose we were to ask the question, what is the LQR cost function starting from x[1] ? That is, what is

\sum_{n=1}^{\infty} (q + rk^2) (a-bk)^{2(n-1)} x^2[1]?

Suppose we let \tilde{n} = n-1 , then

\sum_{n=1}^{\infty} (q + rk^2) (a-bk)^{2(n-1)} x^2[1] = \sum_{\tilde{n}=0}^{\infty} (q + rk^2) (a-bk)^{2(\tilde{n})} x^2[1],

but since the upper limit of the summation is infinity,

\sum_{\tilde{n}=0}^{\infty} (q + rk^2) (a-bk)^{2(\tilde{n})} x^2[1] = p x^2[1]

The LQR cost to drive x[n] to zero, p x^2[n] , DOES NOT DEPEND ON n! That should not be surprising. If we are presently in San Francisco and want to know how much gas we will need to drive to Los Angeles, it does matter if we started driving yesterday from Seattle, or started driving four days ago from Ottawa.

The history independence of the LQR cost gives us an interesting relationship for p,

p x^2[0] = \sum_{n=0}^{\infty} (q + rk^2) x^2[n] = (q + rk^2)x[0] + \sum_{n=1}^{\infty} (q + rk^2) x^2[n] = (q + rk^2)x^2[0] + p x^2[1].

We can eliminate x[1] since x[1] = (a-bk) x[0], and therefore

p x^2[1] = p \left((a-bk)x[0]\right)^2 = p (a-bk)^2 x^2[0],

which leads us to

p x^2[0] = \left((q + rk^2)+ p (a-bk)^2\right) x^2[0].

Since the above equation must hold for all possible values of x[0] , we recover the equation for p from above, without summing the series,

p = (q + rk^2) + p (a-bk)^2.

Having derived the equation for p as a function of k without resorting to series summation, we can then differentiate with respect to k, set the derivative equal to zero, and determine both p_{opt} and k_{opt}. And it is all done using an approach that generalizes easily to the vector case.