Expectation Maximization

A peek into generative algorithms

K-means clustering is one of the most simple algorithms out there and has been quite robust as well. If you are looking to perform machine learning on some unlabeled data, you should consider K-means clustering as your first algorithm of choice. Having said that, there are some fundamental problems with K-means clustering. To start with, K-means clustering does not have any mathematical base as is the norm in general machine learning algorithms. Also, in the real world, it is seen that K-means tends to cluster to local minima. Hence, we will take a look at Expectation Maximization which is kind of a generalization of K-means clustering.

Expectation Maximization comes under Gaussian models which are more of a way of thinking and modeling rather than a particular algorithm. Clusters are modeled as Gaussian distributions and not by their means. What happens in such cases is that there is a correspondence between all data points and all clusters rather than correspondence between each data point to its own cluster, as is the case in K-means clustering. Thus, we also cover cases where there is overlapping between different clusters.

Expectation Maximization works the same way as K-means except that the data is assigned to each cluster with the weights being soft probabilities instead of distances. The advantage is that the model becomes generative as we define the probability distribution for each model.

Typically, we encounter a lot of dimensions in our features and hence, we will use a multivariate Gaussian, which is shown in the below form

Σ is the covariance matrix
μ is the mean vector

Now, to get the values of nu and sigma, you just need to get the maximum likelihood estimates which can be easily computed below

The algorithm for EM-learning consists of the below steps.

1.E-step: An initial guess is made for the parameters of the model and a probability distribution is created.
2.Newly observed data is fed into the model.
3.M-step: The probability distribution of the E-step is changed and modified to include the new data.
4.The above process is repeated till there is no change between the E-step and the M-step.
5.This algorithm is proven to converge.

This algorithm is proven to converge.


1.Since the resultant probability distribution can be considered to be the joined probability distribution of multiple Gaussians hence, this can be used to disentangle mixed signals.
2.EM algorithm is an algorithm for maximum likelihood estimation and hence, can be used for estimating Hidden Markov Models and many other uses.

On a final note, as EM algorithm is used for finding a probability distribution, it can be very slow even on the fastest computer. In such cases, depending on the task it may make sense to use a simpler algorithm.

Joydeep Bhattacharjee