The Naive Bayes classifier is a simple but effective probabilistic learning algorithm based on the Bayes theorem with strong independence assumptions among the features. Despite the naive in its name, Naive Bayes has been proved a great classifier not only for text related tasks like spam filtering, and sport events classification but also to predict medical diagnosis.
- Naïve Bayes is a classification algorithm for categorical variables, which is based on the well-known Bayes theorem.
- Used mostly in high-dimensional text classification
- The Naïve Bayes Classifier is a simple probabilistic classifier and it has very few number of parameters which are used to build the ML models that can predict at a faster speed than other classification algorithms.
- It is a probabilistic classifier i.e., it predicts based on the likelihood of an object.
- Naïve Bayes Algorithm: It is used in spam filtration, Sentimental analysis, classifying articles and many more.
The most popular types differ based on the distributions of the feature values. Some of these include:
- Gaussian Naïve Bayes (GaussianNB): This is a variant of the Naïve Bayes classifier, which is used with Gaussian distributions—i.e. normal distributions—and continuous variables. This model is fitted by finding the mean and standard deviation of each class.
- Multinomial Naïve Bayes (MultinomialNB): This type of Naïve Bayes classifier assumes that the features are from multinomial distributions. This variant is useful when using discrete data, such as frequency counts, and it is typically applied within natural language processing use cases, like spam classification.
- Bernoulli Naïve Bayes (BernoulliNB): This is another variant of the Naïve Bayes classifier, which is used with Boolean variables—that is, variables with two values, such as True and False or 1 and 0.
Numerical Example
We have the following 4 records (each with 6 attributes plus the output):
We want to build a Naïve Bayes classifier that predicts Output given the attributes.
Prior Probabilities
Let:
- or
Count how many yes vs. no in the dataset:
- yes: 3 records (Rows 1, 2, 4)
- no: 1 record (Row 3)
Hence:
Likelihoods (Conditional Probabilities)
Naïve Bayes requires computing . We look at each attribute-value pair for each class.
3.1. Sky
- Sky = sunny or rainy.
Class = yes
There are 3 “yes” examples:
- (sunny, warm, normal, strong, warm, same)
- (sunny, warm, high, strong, warm, same)
- (sunny, warm, high, strong, cool, change)
All 3 have Sky = sunny, so:
Class = no
There is 1 “no” example:
3. (rainy, cold, high, strong, warm, change)
That single example has Sky = rainy, so:

Checking Other Rows
- Row 2 is also yes in the dataset. A similar calculation yields a nonzero value for yes and 0 for no.
- Row 3 is no. If you compute with “yes,” you’ll see some attribute probability is 0 (e.g., P(sky=rainy|yes)=0 or P(temp=cold|yes)=0), leading to a product of 0 for yes. For no, it will be nonzero, so we pick no.
- Row 4 ends up with a nonzero value for yes and 0 for no.
Hence, all 4 records are classified correctly by these computations.
Python Code to develop above example is as follows
import numpy as np
# Given dataset (encoded manually)
data = {
"Sky": ["sunny", "sunny", "rainy", "sunny"],
"Temperature": ["warm", "warm", "cold", "warm"],
"Humid": ["normal", "high", "high", "high"],
"Wind": ["strong", "strong", "strong", "strong"],
"Water": ["warm", "warm", "warm", "cool"],
"Forest": ["same", "same", "change", "change"],
"Output": ["yes", "yes", "no", "yes"]
}
# Unique class labels
classes = ["yes", "no"]
# Encode categorical variables
from collections import defaultdict
encoder = defaultdict(dict)
for column in data.keys():
unique_vals = list(set(data[column]))
for i, val in enumerate(unique_vals):
encoder[column][val] = i
# Encode dataset
encoded_data = {col: [encoder[col][val] for val in values] for col, values in data.items()}
def compute_prior(y):
priors = {}
total = len(y)
for c in classes:
priors[c] = y.count(c) / total
return priors
def compute_likelihoods(X, y):
likelihoods = {}
for feature in X.keys():
likelihoods[feature] = {}
for c in classes:
likelihoods[feature][c] = {}
class_count = y.count(c)
for value in set(X[feature]):
count = sum(1 for i in range(len(y)) if X[feature][i] == value and y[i] == c)
likelihoods[feature][c][value] = count / class_count if class_count > 0 else 0
return likelihoods
# Compute priors and likelihoods
y = encoded_data["Output"]
X = {key: val for key, val in encoded_data.items() if key != "Output"}
priors = compute_prior(data["Output"])
likelihoods = compute_likelihoods(X, data["Output"])
def predict(sample):
posterior_probs = {}
for c in classes:
posterior_probs[c] = priors[c]
for feature, value in sample.items():
if value in likelihoods[feature][c]:
posterior_probs[c] *= likelihoods[feature][c][value]
else:
posterior_probs[c] *= 0 # Handle zero probability cases
return max(posterior_probs, key=posterior_probs.get)
# Example test case
sample = {"Sky": "sunny", "Temperature": "warm", "Humid": "normal", "Wind": "strong", "Water": "warm", "Forest": "same"}
encoded_sample = {key: encoder[key][value] for key, value in sample.items()}
prediction = predict(encoded_sample)
print(f"Predicted Output for {sample}: {prediction}")