8.8 KiB
Naive Bayes
Introduction
The Naive Bayes model uses probabilities to predict an outcome.It is a supervised machine learning technique, i.e. it reqires labelled data for training. It is used for classification and is based on the Bayes' Theorem. The basic assumption of this model is the independence among the features, i.e. a feature is unaffected by any other feture.
Bayes' Theorem
Bayes' theorem is given by:
P(a|b) = \frac{P(b|a)*P(a)}{P(b)}
where:
P(a|b)
is the posterior probability, i.e. probability of 'a' given that 'b' is true,P(b|a)
is the likelihood probability i.e. probability of 'b' given that 'a' is true,P(a)
andP(b)
are the probabilities of 'a' and 'b' respectively, independent of each other.
Applications
Naive Bayes classifier has numerous applications including :
- Text classification.
- Sentiment analysis.
- Spam filtering.
- Multiclass classification (eg. Weather prediction).
- Recommendation Systems.
- Healthcare sector.
- Document categorization.
Advantages
- Easy to implement.
- Useful even if training dataset is limited (where a decision tree would not be recommended).
- Supports multiclass classification which is not supported by some machine learning algorithms like SVM and logistic regression.
- Scalable, fast and efficient.
Disadvantages
- Assumes features to be independent, which may not be true in certain scenarios.
- Zero probability error.
- Sensitive to noise.
Zero Probability Error
Zero probability error is said to occur if in some case the number of occurances of an event given another event is zero. To handle zero probability error, Laplace's correction is used by adding a small constant .
Example:
Given the data below, find whether tennis can be played if ( outlook=overcast, wind=weak ).
Data
SNo | Outlook (A) | Wind (B) | PlayTennis (R) |
---|---|---|---|
1 | Rain | Weak | No |
2 | Rain | Strong | No |
3 | Overcast | Weak | Yes |
4 | Rain | Weak | Yes |
5 | Overcast | Weak | Yes |
6 | Rain | Strong | No |
7 | Overcast | Strong | Yes |
8 | Rain | Weak | No |
9 | Overcast | Weak | Yes |
10 | Rain | Weak | Yes |
- Calculate prior probabilities
P(Yes) = \frac{6}{10} = 0.6
P(No) = \frac{4}{10} = 0.4
-
Calculate likelihoods
1.Outlook (A):
A\R Yes