Non-Linear Least Squares (NLS)

While linear least squares provides a closed-form solution for fitting linear models, many real-world problems involve non-linear models. A common example is fitting a Gaussian function to a set of observations.

Given a set of m observations (xi,yi), we want to find the parameters β=[a,μ,σ]T that best fit the non-linear model:

yi=f(xi,β)=aexp((xiμ)22σ2)

Since this function is non-linear in its parameters, we cannot solve it directly. Instead, we must use an iterative optimization algorithm like the Gauss-Newton method.

The Gauss-Newton Algorithm

The core idea is to start with an initial guess for the parameters and iteratively refine it by solving a sequence of linear least-squares problems.

  1. Start with an initial estimate for the parameters, β0=[a0,μ0,σ0]T.

  2. At each iteration, we linearize the non-linear function f(x,β) around the current estimate βt using a first-order Taylor series expansion:

f(x,β)f(x,βt)+J(x,βt)δβ

where δβ=ββt is the update step we want to find, and J is the Jacobian matrix of f with respect to the parameters β:

J=[fafμfσ]
  1. We can now write the error (or residual) for each observation i as:
ei=yif(xi,β)yi(f(xi,βt)+Jiδβ)(yif(xi,βt))Jiδβ

This is a linear system of the form ynew=Jδβ, where ynew is the vector of residuals. We can stack the equations for all m observations:

[y1f(x1,βt)ymf(xm,βt)]m×1=[J1Jm]m×3[δaδμδσ]3×1
  1. We solve this linear least-squares problem for δβ using the normal equations:
δβ=(JTJ)1JTynew
  1. Update the parameter estimate for the next iteration:
βt+1=βt+δβ
  1. Repeat steps 2-5, linearizing around the new estimate βt+1, until the change in parameters ||δβ|| is below a small threshold ϵ, or a maximum number of iterations is reached.

Levenberg-Marquardt (LM) Algorithm

The Gauss-Newton algorithm can be unstable if the Jacobian J is ill-conditioned. The Levenberg-Marquardt (LM) algorithm is a more robust alternative that adds a damping factor λ to the normal equations. This helps to control the step size and direction, making convergence more reliable.

(JTJ+λI)δβ=JTynew