Draft:Newton's method for systems of nonlinear equations

When multiple nonlinear equations need to be solved for more than one variable, Newton's Method for Systems of Equations may be used to solve the equations simultaneously for the solution vector. The process is very similar to solving Newton's method for one variable, except the single nonlinear equation is replaced with a system of nonlinear equations, the derivative is replaced with a Jacobian matrix of partial derivatives, and the subtraction is replaced with a vector subtraction. Newton's method for systems of nonlinear equations reduces to Newton's method for nonlinear equations when the system of equations includes only one equation.

Procedure
For single equations, Newton's method consists of the following iterations until the iterations no longer produce any changes to $$x$$ of significance,

$$x_{k+1} = x_k - \frac{f(x_k)-y_k }{f'(x_k)}$$

For systems of equations, the same is true, but for vector $$X$$ instead of scalar $$x$$,

$$\begin{align} &X_{k+1} = X_k-C(X_k) \\ \end{align}$$

Where $$C(X_k)$$ is the solution vector to $$J(X_k)C(X_k)=(F(X_k) - Y)$$

$$J(X_k)$$ is the Jacobian matrix for $$X_k$$

$$Y$$ is a vector of known values

If the $$Y$$ vector is set to all zeros, the defining equations may be rewritten in the commonly found form below.

$$\begin{align} &X_{k+1} = X_k-J(X_k)^{-1}F(X_k) \\ \end{align}$$

or

$$\begin{align} &J(X_k)(X_{k+1}-X_k) = -F(X_k) \\ \end{align}$$

Simple example
For example, the following set of equations needs to be solved for vector of points $$(x_1,x_2)$$, given the vector of known values, (2,3).

$$ \begin{array}{lcr} 5x_1^2 + x_1x_2 + sin^2(2x_2) &= 2 \\ e^{2x_1-x_2 }+5x_2 &= 3 \end{array}$$

the function vector, $$F (X_k)$$, and Jacobian Matrix, $$J(X_k)$$ for iteration k, and the vector of known values, $$Y$$, are defined below.

$$\begin{align}

&F(X_k) = \begin{bmatrix} \begin{align} & 5x_{1}^2 + x_{1}x^2_{2} + sin^2(2x_{2}) \\ & e^{2x_{1}-x_{2} }+4x_{2} \end{align} \end{bmatrix}_k \\ &J(X_k) = \begin{bmatrix} \frac{\partial{ f(x_{1}) }}{\partial{x_{1}}} &\frac{\partial{ f(x_{1}) }}{\partial{x_{2}}} \\ \frac{\partial{ f(x_{2}) }}{\partial{x_{1}}} &\frac{\partial{ f(x_{2}) }}{\partial{x_{2}}} \end{bmatrix}_k = \begin{bmatrix} \begin{align} &10x_{1} + x^2_{2 } & &2x_1x_2+4 sin(2x_{2})cos(2x_{2}) \\ &2e^{2x_{1}-x_{2} } & &-e^{2x_{1}-x_{2}}+4 \end{align} \end{bmatrix}_k \\ &Y = \begin{bmatrix} 2 \\ 3 \end{bmatrix}

\end{align}$$

Note that $$F(X_k)$$ could have been rewritten to absorb $$Y$$, and thus eliminate $$Y$$ from the equations. The equation to solve for each iteration are

$$\begin{align} \begin{bmatrix} \begin{align} &10x_{1} + x^2_{2 } & &2x_1x_2+4 sin(2x_{2})cos(2x_{2}) \\ &2e^{2x_{1}-x_{2} } & &-e^{2x_{1}-x_{2}}+4 \end{align} \end{bmatrix}_k

\begin{bmatrix} c_{1 } \\ c_{2 } \end{bmatrix}_{k+1} = \begin{bmatrix} 5x_{1}^2 + x_{1}x^2_{2} + sin^2(2x_{2}) - 2\\ e^{2x_{1}-x_{2} }+4x_{2} -3 \end{bmatrix}_k

\end{align}$$

and

$$X_{k+1 } = X_k - C_{k+1 }$$

The iterations should be repeated until $$\bigg[\sum_{i=1}^{i=2} |f(x_i)_k-(y_i)_k|\bigg] < E $$, where $$E$$ is a value acceptably small enough to meet application requirements.

If vector $$X_0$$ is initially chosen to be $$\begin{bmatrix} 1 & 1\end{bmatrix}$$, that is, $$x_1=1\text{ and }x_2=1$$, and $$E $$ is chosen to be 1.e-03, then the example converges after four iterations to a value of $$X_4 = (0.567297,-0.309442)$$.

Iterations
The following iterations were made during the course of the solution.

Practical considerations
Newton's Method for systems of equations especially large sets of equations, can be finickier than for single equations. Care should be taken to insure a solution is found within a reasonable number of iterations.

Singular matrices
The solution to the linear set of equations that must be solved may not necessarily result in a usable non-singular solution. This can be because the set of equations has no solution, or because of a poorly chosen starting vector for $$X_0$$. For example, had the initial $$X_0$$been chosen to be (0,0), the first iteration should have resulted in a singular matrix and the convergence would have failed. Care should be taken to choose a valid starting $$X_0$$that is known to produce a non-singular first iteration.

Asymptotic divergence to infinity
Even though a system of equations is known to have a solution, the iterations may asymptotically diverge to infinity. This is especially more likely to happen with large number of equations. However, this condition can be mitigated or prevented by selecting a good starting point for $$X_0$$. In addition, the following steps may further mitigate divergence.


 * Limit the range of the linear solution, the C vector, to a small range, but large enough to converge in a reasonable number of iterations.
 * Limit the $$X$$vector iterations to known limits for each $$x_k$$ entry.
 * Scale down the C vector entries slightly to slow down the convergence. Slow convergences are less likely to go divergent.

Slow convergence
In time sensitive applications, convergence speed is importance in that slow convergence can have detrimental affects on the application that is being supported. Convergence time can be minimized through the following steps.


 * Good selection of the initial $$X_0$$starting point is very importance in minimizing the number of iterations required for an acceptable convergence error. For example, the example above, had the initial $$X_0$$ been chosen to be (2,2) instead of (1,1), then seven iterations would have been required instead of four.
 * Limit the $$X$$ vector iterations to known limits for each $$x_k$$ entry. The $$X$$ vector does not have diverge to slow things down. If no limit has been placed on the vector, or the limit is too big, the iterations may spend too much time recomputing large values instead of converging.
 * Scale down the C vector entries slightly to slow down the convergence. This may help prevent the iterations from jumping around and taking too long to converge.
 * Select a conference error point as large as possible that still meets the application requirements. For example, had an error of 1.e-15 been chosen for the example above, six iterations should have been required, as opposed to only four needed to converge to an error of 1,e-03. The additional two iterations may be acceptable for high precision applications, but would be a waste for applications that only need light precision.

Insure a solution does exist
It is much easier to determine that a known solution exists or does not exist with single equations. For example, $$X^2=4$$ has an obvious known solution (2), while $$X^2=-4$$ is obvious that no solution exists in the set of real number. With sets of equations, especially large sets, it is far more difficult to determine that a solution exists or does not exist. If a solution does not exists, the iterations will certainly fail, but if a solution does exist, the iterations may still fail. Upfront work may be required to determine that a solution does or does not exist before making conclusions.

Multiple solutions
It is easier to determine that multiple solutions exist with single equations. The $$X^2=4$$ from the preceding paragraph, for example, has a solution of $$X=2 \text{ and }X=-2$$. For sets of equations, especially large sets, it may not be so obvious, and even if it is obvious, it may be more difficult to insure convergence takes place at the desired solution. Care should taken to start with an $$X_0$$ as near as possible to the desired solution., and that limits are installed on the individual $$X$$ entries to move the iterations away from undesirable solutions and toward the desirable solutions(s). It should be noted that many solutions exist in the example used above.

Digital verses continuous derivatives
Derivatives should be calculated using continuous functions whenever possible to maximize accuracy and minimize convergence problems. If the function is unknown and not possible to calculate continuous derivatives, digital derivatives may be used, but care should be taken to maximize accuracy. Use double samples close together, if possible. If not possible, such as in a string of data, cubic interpolations are preferred due to the cubic iterations retention of a defined first derivative. If accuracy is not an issue, linear interpolations may be used, while keeping in mind that the first directives are not defined at the data point, requiring that the next or prior linear segment be used to estimate the derivative.

Applications
Shaping the frequency response in filter design, such as constricting a Chebyshev pass band ripple to a percentage of the pass band.