Naive Bayes Classifier is a probabilistic classifier and is based on Bayes Theorem.
In Machine Learning, a classification problem represents the selection of the Best Hypothesis given the data.
Given a new data point, we try to classify which class label this new data instance belongs to. The prior knowledge about the pass data helps us in classifying the new data point.
The Naive Bayes Theorem

Bayes theorem gives us the probability of Event A to happen given that B has occurred. For example.
What is the probability that it will rain given that its cloudy weather? The probability of rain can be called as our hypothesis and the event representing cloudy weather can be called as evidence.
- P(A|B) – is called a posterior probability
- P(B|A) – is the conditional probability of B given A
- P(A) – is called as Prior probability of event A.
- P(B) – regardless of the hypothesis, it is the probability of event B to occur
So, we have some idea about the Bayes theorem. Now, let’s see how Naive Bayes works.
How does the Naive Bayes Classifier works?
To demonstrate how Naive Bayes classifier works, we will consider an Email Spam Classification problem which classifiers whether an Email is a Spam or Not Spam.
Let’s consider we have total 12 emails. 8 of which are Not Spam and remaining 4 are Spam.
- Number of Not-Spam email – 8
- Number of Spam email – 4
- Total emails – 12
- Therefore, P(not-spam)=8/12=0.666, P(spam)=4/12=0.333
Suppose the entire Corpus comprise jus four words [ Friend, Offer, Money, Wonderful]. The following histogram represents the word count of each word in each category.
We will now calculate the conditional probabilities of each word.
The formula given below will calculate the Probability of the word Friend to occur given the mail is NOT-SPAM.
P(Friend|Not Spam) = Count of word Friends in Not Spam corpus/ Total words in Not-Spam Corpus
Calculating the probabilities for whole text corpus
P(Friend| Not-Spam) = 8/17 = 0.47 | P(Friend| Spam) =3 /20 =0.15 |
P(Offer | Not-Spam) = 1/17 =0.058 | P(Offer | Spam) = 8/20 =0.40 |
P(Money | Not-Spam) = 2/17 = 0.11 | P(Money | Spam) = 8/20 = 0.40 |
P(Wonderful | Not-Spam) = 6/17 = 0.35 | P(Money | Spam) = 1/20 = 0.05 |
P(Not-Spam) = 8/12 = 0.666 | P(Spam) = 4/12 = 0.333 |
Now that we have all the prior and conditional probabilities , we can apply the Bayes theorem to it.
Support we get an email: ” Offer Money” and based on our previously calculated probabilities we need to classify it as SPAM or NOT-SPAM
P(Not-Spam | Offer, Money) ~ P(Not-Spam) * P(Offer| Not-Spam) * P(Money | Not-Spam) |
P(Not-Spam | Offer, Money) ~ 0.666*0.058*0.01 = 0.00424 |
P(Spam | Offer, Money) ~ P(Spam) * P(Offer| Spam) * P(Money | Spam) |
P(Spam | Offer, Money) ~ 0.333 * 0.40*0.40 |
P(Spam | Offer, Money) ~ 0.0532 |
The Probabilities of email to be a Spam given the words Offers and Money is greater than the Probability of the email to be Not – Spam (0.0532 > 0.00424). So our Classifier will classify this email to be Spam. In summary, we just calculated the posterior Probability as shown in the Bayes theorem.
If we came across a variable that is not present in the other categories then the word count of than variable becomes 0 (zero) and we will unable to make a prediction.
This Problem is also known as a “Zero Frequency” problem. To avoid this, we make use of smoothening methods. e.i Laplace estimation. Smoothening techniques do not affect the conditional probabilities.
Types of Naive Bayes Classifier
- Multinomial – It is used for Discrete Counts. The one we described in the example above is an example of Multinomial
- Gaussian – This type of Naive Bayes classifier assumes the data to follow a Normal Distribution.
- Bernoulli – This type of Classifier is useful when our feature vector are Binary
Python Implementation
We will make use of the breast cancer Wisconsin dataset. You can know more about the dataset at here.
#Loading the Dataset
from sklearn.datasets import load_breast_cancer
data_loaded = load_breast_cancer()
X = data_loaded.data
y = data_loaded.target
The dataset has 30 features using which prediction needs to be done. We can access data just by using data attr. The dataset has features and target variables.
#Splitting the dataset into training and testing variables
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2,random_state=20)
#keeping 80% as training data and 20% as testing data.
Now, importing the Gaussian Naive Bayes Class and fitting the training data to it.
from sklearn.naive_bayes import GaussianNB
#Calling the Class
naive_bayes = GaussianNB()
#Fitting the data to the classifier
naive_bayes.fit(X_train , y_train)
#Predict on test data
y_predicted = naive_bayes.predict(X_test)
The fit method of GaussianNB class requires the feature data (X_train) and the target variables as input (y_train)
So, let’s calculate how accurate our model
#Import metrics class from sklearn
from sklearn import metrics
metrics.accuracy_score(y_predicted , y_test)
Accuracy = 0.956140350877193