Naive Bayes

Introduction

It is a classification technique based on Bayes’ Theorem with an independence assumption among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

The Naïve Bayes classifier is a popular supervised machine learning algorithm used for classification tasks such as text classification. It belongs to the family of generative learning algorithms, which means that it models the distribution of inputs for a given class or category. This approach is based on the assumption that the features of the input data are conditionally independent given the class, allowing the algorithm to make predictions quickly and accurately.

Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

朴素贝叶斯（Naive Bayes）是一种用于分类问题的机器学习算法。它基于贝叶斯定理，可以帮助我们预测事物属于某个类别的概率。

这个算法被称为"朴素"是因为它做了一个简化的假设，即所有特征都是彼此独立的，互不影响。这意味着在预测时，朴素贝叶斯假设每个特征对于最终的分类都是独立的，即使这在现实生活中可能不成立。

朴素贝叶斯的工作原理如下：

首先，它从已知的数据中学习各个类别的特征分布，也就是每个类别下各个特征的概率分布。

当需要对新的数据进行分类时，朴素贝叶斯会计算每个类别的概率，然后选择概率最高的类别作为预测结果。

朴素贝叶斯常用于文本分类、垃圾邮件过滤、情感分析等任务。它的优势在于简单易懂、计算速度快，尤其适合处理大规模的数据集。但由于它的独立性假设，对于某些复杂的数据关系可能表现不佳。尽管如此，它仍然是许多机器学习问题中的有力工具。

Types of Naïve Bayes classifiers

There isn’t just one type of Naïve Bayes classifier. The most popular types differ based on the distributions of the feature values. Some of these include:

Gaussian Naïve Bayes (GaussianNB): This is a variant of the Naïve Bayes classifier, which is used with Gaussian distributions—i.e. normal distributions—and continuous variables. This model is fitted by finding the mean and standard deviation of each class.

Multinomial Naïve Bayes (MultinomialNB): This type of Naïve Bayes classifier assumes that the features are from multinomial distributions. This variant is useful when using discrete data, such as frequency counts, and it is typically applied within natural language processing use cases, like spam classification.

Bernoulli Naïve Bayes (BernoulliNB): This is another variant of the Naïve Bayes classifier, which is used with Boolean variables—that is, variables with two values, such as True and False or 1 and 0.

What Are the Assumptions Made by the Naive Bayes Algorithm?

There are several variants of Naive Bayes, such as Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes. Each variant has its own assumptions and is suited for different types of data. Here are some assumptions that the Naive Bayers algorithm makes:

The main assumption is that it assumes that the features are conditionally independent of each other.

Each of the features is equal in terms of weightage and importance.

The algorithm assumes that the features follow a normal distribution.

The algorithm also assumes that there is no or almost no correlation among features.

Pros and Cons

Pros

Speed, assumptions of feature independence allows the algorithm to be very fast. If this assumption holds true, performs exceptionally well.

Performs well with multi-class prediction

Works well with high dimensions, works well with problems such as text classification (spam detection in the code demo)

Cons

Assumes all features are independent, this is rarely accurate in real life.

Zero Frequency: If the categorical variable has a category in the test data set, which was not observed in the training data set, the model assigns a zero probability to this category and fails at making a prediction. Use smoothing to deal with this issue.

Smoothing is a techniwue in detecting trends with noisy data for cases where the shape of the trend is unknown. Laplace Smoothing is commong with Naive Bayes, it is used with categorical data and meant to allevaite the problem of zero probability. Attaching an additional link in bullet below about smoothing in relation to Naive Bayes.
https://towardsdatascience.com/introduction-to-naïve-bayes-classifier-fa59e3e24aaf

Applications of the Naïve Bayes classifier

Along with a number of other algorithms, Naïve Bayes belongs to a family of data mining algorithms which turn large volumes of data into useful information. Some applications of Naïve Bayes include:

Spam filtering: Spam classification is one of the most popular applications of Naïve Bayes cited in literature. For a deeper read on this use case, check out this chapter from Oreilly (link resides outside ibm.com).

Document classification: Document and text classification go hand in hand. Another popular use case of Naïve Bayes is content classification. Imagine the content categories of a News media website. All the content categories can be classified under a topic taxonomy based on the each article on the site. Federick Mosteller and David Wallace are credited with the first application of Bayesian inference within their 1963 paper (link resides outside ibm.com).

Sentiment analysis: While this is another form of text classification, sentiment analysis is commonly leveraged within marketing to better understand and quantify opinions and attitudes around specific products and brands.

Mental state predictions: Using fMRI data, naïve bayes has been leveraged to predict different cognitive states among humans. The goal of this research (link resides outside ibm.com) was to assist in better understanding hidden cognitive states, particularly among brain injury patients.

Python


# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Naive Bayes model on the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)