Pattern Recognition

A Multi Independent Feature Bayesian Classifier

In a previous post, a single feature Bayesian classifier was implemented. In order to generalize our classifier to use more than one feature, let us introduce the concept of Bayes Risk.

Allowing the use of more than one feature merely requires replacing the scalar x by the feature vector X, where x is in a d-dimensional Euclidean space Rd, called the feature space.

Allowing actions  other than classification primarily allows the possibility of rejection .i.e. of refusing to make a decision in close cases; this is a useful option if being indecisive is not too costly.

Formally, the loss function states exactly how costly each action is, and is used to convert a probability determination into a decision.  Some decision mistakes are more costly than others, but for the sake of simplicity, let’s assume that all errors are equally costly.

Let {ω1, …, ωc} be the finite set of c states of nature (“categories”) and let {α1, …, αa} be the finite set of a possible actions. The loss function λ(αi, ωj) describes the loss incured for taking the action αi when the state of nature is ωj.

Let the feature vector x be a d-component vector-valued random variable and let p(Xj) be the state-conditional probability density function for X, with the probability density function for X conditioned on ωj being the true state of nature.

As before, P(ωj) describes the prior probability that nature is in state ωj. Then, the posterior probability P(ωj|X) can be computed from p(Xj) by Bayes formula:

P(ωj|X) = p(Xj)P(ωj) / p(X)

where the evidence is now

p(x) = ∑p(Xj)P(ωj) ∀ j=1, 2, 3, …, c

Suppose that we observe a particular and that we contemplate taking action αi. If the true state of nature is ωj, by definition we will incur the loss λ(αi, ωj).  Because P(ωj|X) is the probability that the true state of nature is ωj, the expected loss associated with taking action αi is merely

R(αiX) = ∑λ(αi, ωj)P(ωj|X) ∀ j=1, 2, 3, …, c

In decision-theoretic terminology, an expected loss is called a risk, and R(αiX) is called the conditional risk. The loss function of interest for this case is hence the so-called symmetrical  or zero-one loss function,

λ(αi, ωj) = { 0            if i = j
{ 1             if i ≠ j          ∀ i, j = 1, …, c

This loss function assigns no loss to a correct decision, and assigns a unit loss to any error; thus, all errors are equally costly.

Whenever we encounter a particular observation X, we can minimize our expected loss by selecting the action that minimizes the conditional risk. This leads to the following statement of the Bayes decision rule: To minimize the overall risk, compute the conditional risk R(αiX) ∀ i=1, …, a and then select the action αfor which R(αiX) is minimum, or stated formally.

α* = arg mini R(αiX).

The resulting minimum overall risk is called Bayes risk, denoted R*, and is the best performance that can be achieved.

In order to make sense of the abstract theory, let’s borrow the experiment from my university’s Pattern Recognition Lab. tutorial and walk through the implementation.

The assumption that the likelihood follows a Gaussian (normal) distribution still holds. Likelihood is characterized by its mean (μ) and standard deviation (σ) calculated for each for each state of nature (category). This time the user specifies sample points with mouse clicks, which are then used to calculate the mean (μ) and standard deviation (σ) as follows:

μi = ∑ xij / n ∀ i = 1,2,3, …, d & j=1, 2, 3, …, n

where n is the number of samples. In this example, the observed features are the red, the green and the blue channels of the image, hence d = 3. Similarly,

σ²i = ∑(xiji) / (n-1) ∀ i = 1,2,3, …, d & j=1, 2, 3, …, n

σi = √σ²i

As we capture mouse clicks over the image, we use the pixel values to fill a DataGridView for the samples:

        public void handle_input_image_mouse_click(int x, int y, DataGridView dgrdview_samples, PictureBox pictureBox_classified)
            if (null == input_image || x < 0 || x >= input_image.Width || y < 0 || y >= input_image.Height)
            Color c = input_image.GetPixel(x, y);
            dgrdview_samples.Rows.Add(c.R, c.G, c.B);
            MessageBox.Show(String.Format("Click at Point := ({0}, {1}) \n\n Color (R,G,B) := {2}, {3}, {4}", x, y, c.R, c.G, c.B));
            pictureBox_classified.Image = null;

As the user is finished specifying the samples for a certain statue of nature (category), the values are used to propagate the samples array property of an object of a class named StateOfNature to encapsulate the relevant attributes:

        public void handle_create_classes_from_samples_click(DataGridView dgrdview_samples, DataGridView dgrview_meu_sigma, DataGridView dgrdview_loss_function)
            StateOfNature state_of_nature = new StateOfNature();
            for (int i = 0; i < dgrdview_samples.Rows.Count; i++)
                if (dgrdview_samples.Rows[i].Cells[0].Value == null || dgrdview_samples.Rows[i].Cells[1].Value == null || dgrdview_samples.Rows[i].Cells[2].Value == null)
                int r = int.Parse(dgrdview_samples.Rows[i].Cells[0].Value.ToString());
                int g = int.Parse(dgrdview_samples.Rows[i].Cells[1].Value.ToString());
                int b = int.Parse(dgrdview_samples.Rows[i].Cells[2].Value.ToString());
                Color c = Color.FromArgb(r, g, b);
            dgrview_meu_sigma.Rows.Add(state_of_nature.meu_red, state_of_nature.sigma_red, state_of_nature.meu_green, state_of_nature.sigma_green, state_of_nature.meu_blue, state_of_nature.sigma_blue);
            state_of_nature.color = obj_unique_random_colors_array.get_unique_random_color();
            // add a column for the new class wi in lambda matrix
            DataGridViewColumn dgrdview_col = new DataGridViewColumn();
            dgrdview_col.Width = column_width;
            dgrdview_col.Name = "w_" + class_regions_array.Count;
            dgrdview_col.HeaderText = "w" + class_regions_array.Count;
            dgrdview_col.CellTemplate = dgrdview_samples.Rows[0].Cells[0];
            // clear samples table

Using the above formulas for the mean (μ) and the standard deviation (σ):

        public void calculate_meus_and_sigmas_from_samples()

        public void calculate_sums_from_samples()
            if (samples.Count <= 0)
            for (int i = 0; i < samples.Count; i++)
                sum_reds += samples[i].R;
                sum_greens += samples[i].G;
                sum_blues += samples[i].B;

        public void calculate_meus_from_samples()
            if (samples.Count <= 0)
            meu_red = sum_reds / samples.Count;
            meu_green = sum_greens / samples.Count;
            meu_blue = sum_blues / samples.Count;

        public void calculate_sigmas_from_samples()
            if (samples.Count <= 0)
            for (int i = 0; i < samples.Count; i++)
                sigma_red += Math.Pow(samples[i].R - meu_red, 2);
                sigma_green += Math.Pow(samples[i].G - meu_green, 2);
                sigma_blue += Math.Pow(samples[i].B - meu_blue, 2);
            sigma_red = Math.Sqrt(sigma_red / (samples.Count /*- 1*/));
            sigma_green = Math.Sqrt(sigma_green / (samples.Count /*- 1*/));
            sigma_blue = Math.Sqrt(sigma_blue / (samples.Count /*- 1*/));

With the following code, the loss function (λ) is filled as a 2D array:

        public void propagate_loss_function_lambda_matrix(DataGridView dgrdv_lambda)
            // correct num actions
            int correct_num_actions = 0;
            for (int i = 0; i < dgrdv_lambda.Rows.Count; i++)
                if (new DataGridView_Helpers().is_grid_row_empty(dgrdv_lambda.Rows[i]))
            // end correction
            lambda = new double[correct_num_actions, class_regions_array.Count];
            confusion_matrix = new int[correct_num_actions, class_regions_array.Count];
            for (int i = 0; i < correct_num_actions; i++) // loop actions
                for (int j = 0; j < class_regions_array.Count; j++) // loop states of nature (categories)
                    if (dgrdv_lambda.Rows[i].Cells[j + 1].Value != null)
                        lambda[i, j] = double.Parse(dgrdv_lambda.Rows[i].Cells[j + 1].Value.ToString());

Finally, let’s apply the Bayesian inference to each pixel of the input image as follows:

        public Bitmap handle_render_image_click(DataGridView dgrdview_loss_function, bool is_confused = false)
            Bitmap ret = new Bitmap(input_image.Width, input_image.Height);
            if (class_regions_array == null)
                return ret;
            for (int x = 0; x < input_image.Width; x++)
                for (int y = 0; y < input_image.Height; y++)
                    Color original_color = input_image.GetPixel(x, y);
                    int[] observed_features_vector_x = new int[] { original_color.R, original_color.G, original_color.B };
                    int class_index = new BayesianInferenceEngine().classify(class_regions_array, observed_features_vector_x, lambda);
                    if (class_index == -1 || class_index == class_regions_array.Count)
                        ret.SetPixel(x, y, Color.Black);
                        ret.SetPixel(x, y, class_regions_array[class_index].color);

                    if (is_confused)
                        update_confusion_matrix(x, y, class_index);
            return ret;

As explained above, the formula for minimizing the overall Bayes risk Ris used, and then the action which has the minimum associated conditional risk R(αiX) is chosen as the correct classification:

        // multi feature
        public int classify(List<StateOfNature> classes, int[] x, double[,] loss_function_lambda)
            if (classes == null || loss_function_lambda == null || x == null)
                return -1;
            double minimum_conditional_risk = double.PositiveInfinity;
            double conditional_risk, posteriori;
            int class_index = -1;
            for (int i = 0; i < classes.Count+1; i++) // loop all actions = # of classes + 1 @todo remove hard coded actions
                conditional_risk = 0;
                for (int j = 0; j < classes.Count; j++)
                    posteriori = calculate_posteriori(classes[j], x);
                    conditional_risk += loss_function_lambda[i, j] * posteriori;
                if (conditional_risk < minimum_conditional_risk)
                    class_index = i;
                    minimum_conditional_risk = conditional_risk;
            return class_index;

In this example, the posteriori calculations are based on the assumption that the observed features are mutually independent, hence their joint probability is given by the following formula:

P(ωj|X) = ∏P(ωj|Xi) ∀ i = 1,2,3, …, d & j = 1,2,3, …, c

This code snippet, calculates the posteriori ignoring the evidence in the denominator which acts as merely a scaling factor:

        public double calculate_posteriori(StateOfNature state_of_nature, int[] observed_features_x_vector)
            double posteriori = 1;
            double[] meu_vector = state_of_nature.get_meus_vector();
            double[] sigma_vector = state_of_nature.get_sigmas_vector();
            for (int i = 0; i < observed_features_x_vector.Length; i++)
                double likelihood = composition_object_normal_distribution.my_normal_function(observed_features_x_vector[i], meu_vector[i], sigma_vector[i]);
                double prior = state_of_nature.prior;
                posteriori *= likelihood * prior; // disjoint probability of independent features (random variables)
            return posteriori;

In a future blog post, we’ll see how to use a more general formula which takes mutual dependency between the observed features into account.

For the sake of completeness, let’s include the function to calculate the normal (Gaussian) distribution using the formula:

p(x|μ,σ) = (1 / σ√2π) e^-((x-μ)² / 2σ² )

        public double my_normal_function(double x, double meu, double sigma)
            return (1 / (Math.Sqrt(2 * Math.PI) * sigma)) * Math.Exp(-1 * Math.Pow(x - meu, 2) / (2 * Math.Pow(sigma, 2)));

Comparing the results to the ones obtained using the single feature Bayesian classifier in the previous blog post, we can tell the favorable increase in the accuracy of our classifier.

a multi independent feature bayesian classifier



2 thoughts on “A Multi Independent Feature Bayesian Classifier

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s