A Multi Independent Feature Bayesian Classifier

In a previous post, a single feature Bayesian classifier was implemented. In order to generalize our classifier to use more than one feature, let us introduce the concept of Bayes Risk.

Allowing the use of more than one feature merely requires replacing the scalar x by the feature vector X, where x is in a d-dimensional Euclidean space R^d, called the feature space.

Allowing actions other than classification primarily allows the possibility of rejection .i.e. of refusing to make a decision in close cases; this is a useful option if being indecisive is not too costly.

Formally, the loss function states exactly how costly each action is, and is used to convert a probability determination into a decision. Some decision mistakes are more costly than others, but for the sake of simplicity, let’s assume that all errors are equally costly.

Let {ω₁, …, ω_c} be the finite set of c states of nature (“categories”) and let {α₁, …, α_a} be the finite set of a possible actions. The loss function λ(α_i, ω_j) describes the loss incured for taking the action α_i when the state of nature is ω_j.

Let the feature vector x be a d-component vector-valued random variable and let p(X|ω_j) be the state-conditional probability density function for X, with the probability density function for X conditioned on ω_j being the true state of nature.

As before, P(ω_j) describes the prior probability that nature is in state ω_j. Then, the posterior probability P(ω_j|X) can be computed from p(X|ω_j) by Bayes formula:

P(ω_j|X) = p(X|ω_j)P(ω_j) / p(X)

where the evidence is now

p(x) = ∑p(X|ω_j)P(ω_j) ∀ j=1, 2, 3, …, c

Suppose that we observe a particular X and that we contemplate taking action α_i. If the true state of nature is ω_j, by definition we will incur the loss λ(α_i, ω_j). Because P(ω_j|X) is the probability that the true state of nature is ω_j, the expected loss associated with taking action α_i is merely

R(α_i, X) = ∑λ(α_i, ω_j)P(ω_j|X) ∀ j=1, 2, 3, …, c

In decision-theoretic terminology, an expected loss is called a risk, and R(α_i, X) is called the conditional risk. The loss function of interest for this case is hence the so-called symmetrical or zero-one loss function,

λ(α_i, ω_j) = { 0 if i = j
{ 1 if i ≠ j ∀ i, j = 1, …, c

This loss function assigns no loss to a correct decision, and assigns a unit loss to any error; thus, all errors are equally costly.

Whenever we encounter a particular observation X, we can minimize our expected loss by selecting the action that minimizes the conditional risk. This leads to the following statement of the Bayes decision rule: To minimize the overall risk, compute the conditional risk R(α_i, X) ∀ i=1, …, a and then select the action α_ifor which R(α_i, X) is minimum, or stated formally.

α^* = arg min_i R(α_i, X).

The resulting minimum overall risk is called Bayes risk, denoted R^*, and is the best performance that can be achieved.

In order to make sense of the abstract theory, let’s borrow the experiment from my university’s Pattern Recognition Lab. tutorial and walk through the implementation.

The assumption that the likelihood follows a Gaussian (normal) distribution still holds. Likelihood is characterized by its mean (μ) and standard deviation (σ) calculated for each for each state of nature (category). This time the user specifies sample points with mouse clicks, which are then used to calculate the mean (μ) and standard deviation (σ) as follows:

μ_i = ∑ x_ij / n ∀ i = 1,2,3, …, d & j=1, 2, 3, …, n

where n is the number of samples. In this example, the observed features are the red, the green and the blue channels of the image, hence d = 3. Similarly,

σ²_i = ∑(x_ij -μ_i) / (n-1) ∀ i = 1,2,3, …, d & j=1, 2, 3, …, n

σ_i = √σ²_i

As we capture mouse clicks over the image, we use the pixel values to fill a DataGridView for the samples:


        public void handle_input_image_mouse_click(int x, int y, DataGridView dgrdview_samples, PictureBox pictureBox_classified)
        {
            if (null == input_image || x &amp;lt; 0 || x &amp;gt;= input_image.Width || y &amp;lt; 0 || y &amp;gt;= input_image.Height)
                return;
            Color c = input_image.GetPixel(x, y);
            dgrdview_samples.Rows.Add(c.R, c.G, c.B);
            MessageBox.Show(String.Format(&quot;Click at Point := ({0}, {1}) \n\n Color (R,G,B) := {2}, {3}, {4}&quot;, x, y, c.R, c.G, c.B));
            pictureBox_classified.Image = null;
        }

As the user is finished specifying the samples for a certain statue of nature (category), the values are used to propagate the samples array property of an object of a class named StateOfNature to encapsulate the relevant attributes:


        public void handle_create_classes_from_samples_click(DataGridView dgrdview_samples, DataGridView dgrview_meu_sigma, DataGridView dgrdview_loss_function)
        {
            StateOfNature state_of_nature = new StateOfNature();
            for (int i = 0; i &amp;lt; dgrdview_samples.Rows.Count; i++)
            {
                if (dgrdview_samples.Rows[i].Cells[0].Value == null || dgrdview_samples.Rows[i].Cells[1].Value == null || dgrdview_samples.Rows[i].Cells[2].Value == null)
                    continue;
                int r = int.Parse(dgrdview_samples.Rows[i].Cells[0].Value.ToString());
                int g = int.Parse(dgrdview_samples.Rows[i].Cells[1].Value.ToString());
                int b = int.Parse(dgrdview_samples.Rows[i].Cells[2].Value.ToString());
                Color c = Color.FromArgb(r, g, b);
                state_of_nature.samples.Add(c);
            }
            state_of_nature.calculate_meus_and_sigmas_from_samples();
            dgrview_meu_sigma.Rows.Add(state_of_nature.meu_red, state_of_nature.sigma_red, state_of_nature.meu_green, state_of_nature.sigma_green, state_of_nature.meu_blue, state_of_nature.sigma_blue);
            state_of_nature.color = obj_unique_random_colors_array.get_unique_random_color();
            class_regions_array.Add(state_of_nature);
            // add a column for the new class wi in lambda matrix
            DataGridViewColumn dgrdview_col = new DataGridViewColumn();
            dgrdview_col.Width = column_width;
            dgrdview_col.Name = &quot;w_&quot; + class_regions_array.Count;
            dgrdview_col.HeaderText = &quot;w&quot; + class_regions_array.Count;
            dgrdview_col.CellTemplate = dgrdview_samples.Rows[0].Cells[0];
            dgrdview_loss_function.Columns.Add(dgrdview_col);
            // clear samples table
            dgrdview_samples.Rows.Clear();
        }

Using the above formulas for the mean (μ) and the standard deviation (σ):


        public void calculate_meus_and_sigmas_from_samples()
        {
            calculate_sums_from_samples();
            calculate_meus_from_samples();
            calculate_sigmas_from_samples();
        }

        public void calculate_sums_from_samples()
        {
            if (samples.Count &amp;lt;= 0)
                return;
            for (int i = 0; i &amp;lt; samples.Count; i++)
            {
                sum_reds += samples[i].R;
                sum_greens += samples[i].G;
                sum_blues += samples[i].B;
            }
        }

        public void calculate_meus_from_samples()
        {
            if (samples.Count &amp;lt;= 0)
                return;
            meu_red = sum_reds / samples.Count;
            meu_green = sum_greens / samples.Count;
            meu_blue = sum_blues / samples.Count;
        }

        public void calculate_sigmas_from_samples()
        {
            if (samples.Count &amp;lt;= 0)
                return;
            for (int i = 0; i &amp;lt; samples.Count; i++)
            {
                sigma_red += Math.Pow(samples[i].R - meu_red, 2);
                sigma_green += Math.Pow(samples[i].G - meu_green, 2);
                sigma_blue += Math.Pow(samples[i].B - meu_blue, 2);
            }
            sigma_red = Math.Sqrt(sigma_red / (samples.Count /*- 1*/));
            sigma_green = Math.Sqrt(sigma_green / (samples.Count /*- 1*/));
            sigma_blue = Math.Sqrt(sigma_blue / (samples.Count /*- 1*/));
        }

With the following code, the loss function (λ) is filled as a 2D array:


        public void propagate_loss_function_lambda_matrix(DataGridView dgrdv_lambda)
        {
            // correct num actions
            int correct_num_actions = 0;
            for (int i = 0; i &amp;lt; dgrdv_lambda.Rows.Count; i++)
            {
                if (new DataGridView_Helpers().is_grid_row_empty(dgrdv_lambda.Rows[i]))
                    continue;
                correct_num_actions++;
            }
            // end correction
            lambda = new double[correct_num_actions, class_regions_array.Count];
            confusion_matrix = new int[correct_num_actions, class_regions_array.Count];
            for (int i = 0; i &amp;lt; correct_num_actions; i++) // loop actions
            {
                for (int j = 0; j &amp;lt; class_regions_array.Count; j++) // loop states of nature (categories)
                {
                    if (dgrdv_lambda.Rows[i].Cells[j + 1].Value != null)
                        lambda[i, j] = double.Parse(dgrdv_lambda.Rows[i].Cells[j + 1].Value.ToString());
                }
            }
        }

Finally, let’s apply the Bayesian inference to each pixel of the input image as follows:


        public Bitmap handle_render_image_click(DataGridView dgrdview_loss_function, bool is_confused = false)
        {
            propagate_loss_function_lambda_matrix(dgrdview_loss_function);
            Bitmap ret = new Bitmap(input_image.Width, input_image.Height);
            if (class_regions_array == null)
                return ret;
            for (int x = 0; x &amp;lt; input_image.Width; x++)
            {
                for (int y = 0; y &amp;lt; input_image.Height; y++)
                {
                    Color original_color = input_image.GetPixel(x, y);
                    int[] observed_features_vector_x = new int[] { original_color.R, original_color.G, original_color.B };
                    int class_index = new BayesianInferenceEngine().classify(class_regions_array, observed_features_vector_x, lambda);
                    if (class_index == -1 || class_index == class_regions_array.Count)
                        ret.SetPixel(x, y, Color.Black);
                    else
                        ret.SetPixel(x, y, class_regions_array[class_index].color);

                    if (is_confused)
                        update_confusion_matrix(x, y, class_index);
                }
            }
            return ret;
        }

As explained above, the formula for minimizing the overall Bayes risk R^*is used, and then the action which has the minimum associated conditional risk R(α_i, X) is chosen as the correct classification:

        // multi feature
        public int classify(List&amp;lt;StateOfNature&amp;gt; classes, int[] x, double[,] loss_function_lambda)
        {
            if (classes == null || loss_function_lambda == null || x == null)
                return -1;
            double minimum_conditional_risk = double.PositiveInfinity;
            double conditional_risk, posteriori;
            int class_index = -1;
            for (int i = 0; i &amp;lt; classes.Count+1; i++) // loop all actions = # of classes + 1 @todo remove hard coded actions
            {
                conditional_risk = 0;
                for (int j = 0; j &amp;lt; classes.Count; j++)
                {
                    posteriori = calculate_posteriori(classes[j], x);
                    conditional_risk += loss_function_lambda[i, j] * posteriori;
                }
                if (conditional_risk &amp;lt; minimum_conditional_risk)
                {
                    class_index = i;
                    minimum_conditional_risk = conditional_risk;
                }
            }
            return class_index;
        }

In this example, the posteriori calculations are based on the assumption that the observed features are mutually independent, hence their joint probability is given by the following formula:

P(ω_j|X) = ∏P(ω_j|X_i) ∀ i = 1,2,3, …, d & j = 1,2,3, …, c

This code snippet, calculates the posteriori ignoring the evidence in the denominator which acts as merely a scaling factor:


        public double calculate_posteriori(StateOfNature state_of_nature, int[] observed_features_x_vector)
        {
            double posteriori = 1;
            double[] meu_vector = state_of_nature.get_meus_vector();
            double[] sigma_vector = state_of_nature.get_sigmas_vector();
            for (int i = 0; i &amp;lt; observed_features_x_vector.Length; i++)
            {
                double likelihood = composition_object_normal_distribution.my_normal_function(observed_features_x_vector[i], meu_vector[i], sigma_vector[i]);
                double prior = state_of_nature.prior;
                posteriori *= likelihood * prior; // disjoint probability of independent features (random variables)
            }
            return posteriori;
        }

In a future blog post, we’ll see how to use a more general formula which takes mutual dependency between the observed features into account.

For the sake of completeness, let’s include the function to calculate the normal (Gaussian) distribution using the formula:

p(x|μ,σ) = (1 / σ√2π) e^-((x-μ)² / 2σ² )

        
        public double my_normal_function(double x, double meu, double sigma)
        {
            return (1 / (Math.Sqrt(2 * Math.PI) * sigma)) * Math.Exp(-1 * Math.Pow(x - meu, 2) / (2 * Math.Pow(sigma, 2)));
        }

Comparing the results to the ones obtained using the single feature Bayesian classifier in the previous blog post, we can tell the favorable increase in the accuracy of our classifier.

References:

Pattern Classification – Richard O. Duda, Peter E. Hart, David G. Stork – John Wiley & Sons, Nov 9, 2012.
The complete source code for this example : https://github.com/thecortex/CS-Tasks

Qamar-ud-Din

Machine Learning Engineer @ mQuBits

A Multi Independent Feature Bayesian Classifier

2 thoughts on “A Multi Independent Feature Bayesian Classifier”

Leave a comment Cancel reply

Share this:

Related

Share this:

2 thoughts on “A Multi Independent Feature Bayesian Classifier”

Leave a comment Cancel reply