From edd8aec3a2b1f9855e7456d0065520d9a43243e2 Mon Sep 17 00:00:00 2001 From: rohit Date: Tue, 4 Jun 2024 00:28:41 +0530 Subject: [PATCH 1/3] Added K-Nearest Neighbors (KNN).md file --- .../K-nearest neighbor (KNN).md | 122 ++++++++++++++++++ contrib/machine-learning/index.md | 1 + 2 files changed, 123 insertions(+) create mode 100644 contrib/machine-learning/K-nearest neighbor (KNN).md diff --git a/contrib/machine-learning/K-nearest neighbor (KNN).md b/contrib/machine-learning/K-nearest neighbor (KNN).md new file mode 100644 index 0000000..748f808 --- /dev/null +++ b/contrib/machine-learning/K-nearest neighbor (KNN).md @@ -0,0 +1,122 @@ +# K-Nearest Neighbors (KNN) Machine Learning Algorithm in Python + +## Introduction +K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks. It assumes that similar things exist in close proximity. In other words, similar data points are near to each other. + +## How KNN Works +KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label (in classification) or averaging the labels (in regression). + +### Steps: +1. **Choose the number K of neighbors** +2. **Calculate the distance** between the query-instance and all the training samples +3. **Sort the distances** and determine the nearest neighbors based on the K-th minimum distance +4. **Gather the labels** of the nearest neighbors +5. **Vote for the most frequent label** (in case of classification) or **average the labels** (in case of regression) + +## When to Use KNN +### Advantages: +- **Simple and easy to understand:** KNN is intuitive and easy to implement. +- **No training phase:** KNN is a lazy learner, meaning there is no explicit training phase. +- **Effective with a small dataset:** KNN performs well with a small number of input variables. + +### Disadvantages: +- **Computationally expensive:** The algorithm becomes significantly slower as the number of examples and/or predictors/independent variables increase. +- **Sensitive to irrelevant features:** All features contribute to the distance equally. +- **Memory-intensive:** Storing all the training data can be costly. + +### Use Cases: +- **Recommender Systems:** Suggest items based on similarity to user preferences. +- **Image Recognition:** Classify images by comparing new images to the training set. +- **Finance:** Predict credit risk or fraud detection based on historical data. + +## KNN in Python + +### Required Libraries +To implement KNN, we need the following Python libraries: +- `numpy` +- `pandas` +- `scikit-learn` +- `matplotlib` (for visualization) + +### Installation +```bash +pip install numpy pandas scikit-learn matplotlib +``` + +### Example Code +Let's implement a simple KNN classifier using the Iris dataset. + +#### Step 1: Import Libraries +```python +import numpy as np +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.neighbors import KNeighborsClassifier +from sklearn.metrics import accuracy_score +import matplotlib.pyplot as plt +``` + +#### Step 2: Load Dataset +```python +from sklearn.datasets import load_iris +iris = load_iris() +X = iris.data +y = iris.target +``` + +#### Step 3: Split Dataset +```python +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) +``` + +#### Step 4: Train KNN Model +```python +knn = KNeighborsClassifier(n_neighbors=3) +knn.fit(X_train, y_train) +``` + +#### Step 5: Make Predictions +```python +y_pred = knn.predict(X_test) +``` + +#### Step 6: Evaluate the Model +```python +accuracy = accuracy_score(y_test, y_pred) +print(f'Accuracy: {accuracy}') +``` + +### Visualization (Optional) +```python +# Plotting the decision boundary for visualization (for 2D data) +h = .02 # step size in the mesh +# Create color maps +cmap_light = plt.cm.RdYlBu +cmap_bold = plt.cm.RdYlBu + +# For simplicity, we take only the first two features of the dataset +X_plot = X[:, :2] +x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1 +y_min, y_max = X_plot[:, 1].min() - 1, y_plot[:, 1].max() + 1 +xx, yy = np.meshgrid(np.arange(x_min, x_max, h), + np.arange(y_min, y_max, h)) + +Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]) +Z = Z.reshape(xx.shape) +plt.figure() +plt.pcolormesh(xx, yy, Z, cmap=cmap_light) + +# Plot also the training points +plt.scatter(X_plot[:, 0], X_plot[:, 1], c=y, edgecolor='k', cmap=cmap_bold) +plt.xlim(xx.min(), xx.max()) +plt.ylim(yy.min(), yy.max()) +plt.title("3-Class classification (k = 3)") +plt.show() +``` + +## Generalization and Considerations +- **Choosing K:** The choice of K is critical. Smaller values of K can lead to noisy models, while larger values make the algorithm computationally expensive and might oversimplify the model. +- **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation. +- **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm. + +In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance. \ No newline at end of file diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index b6945cd..e5f5371 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -16,3 +16,4 @@ - [Types_of_Cost_Functions](cost-functions.md) - [Clustering](clustering.md) - [Grid Search](grid-search.md) +- [K-nearest neighbor (KNN)](K-nearest neighbor (KNN).md) \ No newline at end of file From 8bafaaa091b06d8cbcd49b607633276ea70d3b01 Mon Sep 17 00:00:00 2001 From: Ashita Prasad Date: Sat, 8 Jun 2024 10:20:05 +0530 Subject: [PATCH 2/3] Rename K-nearest neighbor (KNN).md to knn.md --- .../machine-learning/{K-nearest neighbor (KNN).md => knn.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename contrib/machine-learning/{K-nearest neighbor (KNN).md => knn.md} (99%) diff --git a/contrib/machine-learning/K-nearest neighbor (KNN).md b/contrib/machine-learning/knn.md similarity index 99% rename from contrib/machine-learning/K-nearest neighbor (KNN).md rename to contrib/machine-learning/knn.md index 748f808..85578f3 100644 --- a/contrib/machine-learning/K-nearest neighbor (KNN).md +++ b/contrib/machine-learning/knn.md @@ -119,4 +119,4 @@ plt.show() - **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation. - **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm. -In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance. \ No newline at end of file +In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance. From 906333954390d915e4c85b92e6ccc95919a878d8 Mon Sep 17 00:00:00 2001 From: Ashita Prasad Date: Sat, 8 Jun 2024 10:20:27 +0530 Subject: [PATCH 3/3] Update index.md --- contrib/machine-learning/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index e5f5371..deafdda 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -16,4 +16,4 @@ - [Types_of_Cost_Functions](cost-functions.md) - [Clustering](clustering.md) - [Grid Search](grid-search.md) -- [K-nearest neighbor (KNN)](K-nearest neighbor (KNN).md) \ No newline at end of file +- [K-nearest neighbor (KNN)](knn.md)