NLS

Non-Linear Least Squares (NLS)

While linear least squares provides a closed-form solution for fitting linear models, many real-world problems involve non-linear models. A common example is fitting a Gaussian function to a set of observations.

Given a set of $m$ observations $(x_{i}, y_{i})$ , we want to find the parameters $β = [a, μ, σ]^{T}$ that best fit the non-linear model:

y_{i} = f (x_{i}, β) = a \cdot \exp (- \frac{(x_{i} - μ)^{2}}{2 σ^{2}})

Since this function is non-linear in its parameters, we cannot solve it directly. Instead, we must use an iterative optimization algorithm like the Gauss-Newton method.

The Gauss-Newton Algorithm

The core idea is to start with an initial guess for the parameters and iteratively refine it by solving a sequence of linear least-squares problems.

Start with an initial estimate for the parameters, $β_{0} = [a_{0}, μ_{0}, σ_{0}]^{T}$ .
At each iteration, we linearize the non-linear function $f (x, β)$ around the current estimate $β_{t}$ using a first-order Taylor series expansion:

f (x, β) \approx f (x, β_{t}) + J (x, β_{t}) δ β

where $δ β = β - β_{t}$ is the update step we want to find, and $J$ is the Jacobian matrix of $f$ with respect to the parameters $β$ :

J = [\begin{matrix} \frac{\partial f}{\partial a} & \frac{\partial f}{\partial μ} & \frac{\partial f}{\partial σ} \end{matrix}]

We can now write the error (or residual) for each observation $i$ as:

e_{i} = y_{i} - f (x_{i}, β) \approx y_{i} - (f (x_{i}, β_{t}) + J_{i} δ β)

(y_{i} - f (x_{i}, β_{t})) \approx J_{i} δ β

This is a linear system of the form $y_{n e w} = J δ β$ , where $y_{n e w}$ is the vector of residuals. We can stack the equations for all $m$ observations:

{[\begin{matrix} y_{1} - f (x_{1}, β_{t}) \\ ⋮ \\ y_{m} - f (x_{m}, β_{t}) \end{matrix}]}_{m \times 1} = {[\begin{matrix} J_{1} \\ ⋮ \\ J_{m} \end{matrix}]}_{m \times 3} {[\begin{matrix} δ a \\ δ μ \\ δ σ \end{matrix}]}_{3 \times 1}

We solve this linear least-squares problem for $δ β$ using the normal equations:

δ β = (J^{T} J)^{- 1} J^{T} y_{n e w}

Update the parameter estimate for the next iteration:

β_{t + 1} = β_{t} + δ β

Repeat steps 2-5, linearizing around the new estimate $β_{t + 1}$ , until the change in parameters $| | δ β | |$ is below a small threshold $ϵ$ , or a maximum number of iterations is reached.

Levenberg-Marquardt (LM) Algorithm

The Gauss-Newton algorithm can be unstable if the Jacobian $J$ is ill-conditioned. The Levenberg-Marquardt (LM) algorithm is a more robust alternative that adds a damping factor $λ$ to the normal equations. This helps to control the step size and direction, making convergence more reliable.

(J^{T} J + λ I) δ β = J^{T} y_{n e w}