Mixture Models

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. - Wikipedia

Gaussian Mixture Models (GMMs)

GMM: Superposition of K Gaussian densities of the following form is mixture of Gaussians.

p (x) = \sum_{k = 1}^{K} π_{k} N (x | μ_{k}, Σ_{k})

Mixture coefficients: $π_{k}$ (prior probability of a point belonging to cluster $k$ )
We get the bottom relation by integrating both sides on the above equation.

\sum_{k = 1}^{K} π_{k} = 1

Likelihood Function:

L (θ | x) = \prod_{i = 1}^{n} p (x_{i}) = \prod_{i = 1}^{n} \sum_{k = 1}^{K} π_{k} N (x_{i} | μ_{k}, Σ_{k})

where $θ = {π_{1}, . . ., π_{K}, μ_{1}, . . ., μ_{K}, Σ_{1}, . . ., Σ_{K}}$ .

Log Likelihood:

\log L (θ | x) = \sum_{i = 1}^{n} \log \sum_{k = 1}^{K} π_{k} N (x_{i} | μ_{k}, Σ_{k})

This is a hard problem.

E-step

compute soft assignment (posterior probabilities)

γ_{i k} = p (k | x_{i}) = \frac{π_{k} N (x_{i} | μ_{k}, Σ_{k})}{\sum_{j = 1}^{K} π_{j} N (x_{i} | μ_{j}, Σ_{j})}

M-step

re-estimate parameters

N_{k} = \sum_{i = 1}^{n} γ_{i k}

μ_{k}^{new} = \frac{1}{N_{k}} \sum_{i = 1}^{n} γ_{i k} x_{i}

Σ_{k}^{new} = \frac{1}{N_{k}} \sum_{i = 1}^{n} γ_{i k} (x_{i} - μ_{k}^{new}) (x_{i} - μ_{k}^{new})^{T}

π_{k}^{new} = \frac{N_{k}}{n}

Advantages

Flexibility, K-Means just assumes that clusters are spherical.
Uncertainty estimation is an added benefit of soft assignment.
Density Estimate, can help in identify outliers, anomalies (low $p (x)$ )
Useful as a Generative Model
Less sensitive to initialization than K-Means

How to choose K?

Pick the 'k' which generates maximum likelihood for a 'hold out' set.
Cross-validation, Information Criteria (AIC, BIC)