8.8 KiB

Czysty Wina Historia

Naive Bayes

Introduction

The Naive Bayes model uses probabilities to predict an outcome.It is a supervised machine learning technique, i.e. it reqires labelled data for training. It is used for classification and is based on the Bayes' Theorem. The basic assumption of this model is the independence among the features, i.e. a feature is unaffected by any other feture.

Bayes' Theorem

Bayes' theorem is given by:


P(a|b) = \frac{P(b|a)*P(a)}{P(b)}

where:

P(a|b) is the posterior probability, i.e. probability of 'a' given that 'b' is true,
P(b|a) is the likelihood probability i.e. probability of 'b' given that 'a' is true,
P(a) and P(b) are the probabilities of 'a' and 'b' respectively, independent of each other.

Applications

Naive Bayes classifier has numerous applications including :

Text classification.
Sentiment analysis.
Spam filtering.
Multiclass classification (eg. Weather prediction).
Recommendation Systems.
Healthcare sector.
Document categorization.

Advantages

Easy to implement.
Useful even if training dataset is limited (where a decision tree would not be recommended).
Supports multiclass classification which is not supported by some machine learning algorithms like SVM and logistic regression.
Scalable, fast and efficient.

Disadvantages

Assumes features to be independent, which may not be true in certain scenarios.
Zero probability error.
Sensitive to noise.

Zero Probability Error

Zero probability error is said to occur if in some case the number of occurances of an event given another event is zero. To handle zero probability error, Laplace's correction is used by adding a small constant .

Example:

Given the data below, find whether tennis can be played if ( outlook=overcast, wind=weak ).

Data

SNo	Outlook (A)	Wind (B)	PlayTennis (R)
1	Rain	Weak	No
2	Rain	Strong	No
3	Overcast	Weak	Yes
4	Rain	Weak	Yes
5	Overcast	Weak	Yes
6	Rain	Strong	No
7	Overcast	Strong	Yes
8	Rain	Weak	No
9	Overcast	Weak	Yes
10	Rain	Weak	Yes

Calculate prior probabilities


   P(Yes)  =  \frac{6}{10} = 0.6


   P(No)  =  \frac{4}{10} = 0.4

Calculate likelihoods

1.Outlook (A):

A\R Yes

8.8 KiB Czysty Wina Historia