Pattern Recognition

Maximum Likelihood Estimation (MLE) vs K-Nearest-Neighbor Search (KNN)

Maximum Likelihood Estimation (MLE) is a parametric method, while K-Nearest-Neighbor Estimation (KNN) is a non-parametric method. That’s in the former we have information about the underlying probability density function (PDF); i.e., Gaussian, Exponential, Poisson … etc. This type of information is not available for the latter – non-parametric methods.

Among the state-of-art techniques for parametric estimation are MLE and Bayesian Estimation. The main conceptual difference is that MLE considers the parameters as fixed values. Then, they can be obtained by solving a maximization problem resulting in a delta-function. However, Bayesian Estimation assumes that the parameters are random variables and then applies Bayesian Learning to obtain them.

On the other hand, we have Histogram, Parzen Window, KNN, and Nearest-Neighbor Estimation as examples of non-parametric methods. The given feature space is divided into cells / bins. Each cell has a computed probability depending on the number of samples falling in such a cell relative to the total number of training samples. In Histogram and Parzen Window estimation we assume a cell of a fixed volume. However, the resulting probability estimation varies dramatically depending on the chosen fixed volume. This can be avoided by allowing the cell volume to vary to search for the k nearest neighbors to a given test feature vector. Hence, our window grows or shrinks depending on the density of the training set. We expect a large window, if the density is low and a small window of good resolution otherwise.

Since we have to compute the distance between a given feature vector and all the samples in our training set each time to perform a classification, KNN suffers from a high complexity in terms of time and space. A more reasonable approach in terms of complexity is the Nearest-Neighbor estimation. It uses Voronoi Tesselation to divide the feature space to cells. Each cell consists of the neighbors of a training point and is labelled by the category of this training point. This saves us from having to compute the distances every time we’re given a test sample, but sacrifices the accuracy on the other hand.

References:

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s