From 21612e4b81468a127feea94ee419d22efcb1133c Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 07:58:39 +0530 Subject: [PATCH 01/13] Update index.md --- contrib/machine-learning/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index 46100df..e3a8f0b 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -10,3 +10,4 @@ - [PyTorch.md](pytorch.md) - [Types of optimizers](Types_of_optimizers.md) - [Logistic Regression](logistic-regression.md) +- [Grid Search](grid-search.md) From a953122b2681a430e0805de2031f5c20af2ee8cc Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 07:59:40 +0530 Subject: [PATCH 02/13] Create grid-search.md --- contrib/machine-learning/grid-search.md | 68 +++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 contrib/machine-learning/grid-search.md diff --git a/contrib/machine-learning/grid-search.md b/contrib/machine-learning/grid-search.md new file mode 100644 index 0000000..3bf53ff --- /dev/null +++ b/contrib/machine-learning/grid-search.md @@ -0,0 +1,68 @@ +# Grid Search + +Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set. +The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity. + +## Implementation + +Before applying Grid Searching on any algorithm, Data is used to divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination. +Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accurac +Let us consider that the model accepts the below three parameters in the form of input: +1. Number of hidden layers [2, 4] +2. Number of neurons in every layer [5, 10] +3. Number of epochs [10, 50] + +If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be [2, 5, 10]. Finding such combinations manually would be a headache. +Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters. +Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy. +Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite. + +## Explaination of the Code + +The code provided performs hyperparameter tuning for a Logistic Regression model using a manual grid search approach. It evaluates the model's performance for different values of the regularization strength hyperparameter C on the Iris dataset. +1. datasets from sklearn is imported to load the Iris dataset. +2. LogisticRegression from sklearn.linear_model is imported to create and fit the logistic regression model. +3. The Iris dataset is loaded, with X containing the features and y containing the target labels. +4. A LogisticRegression model is instantiated with max_iter=10000 to ensure convergence during the fitting process, as the default maximum iterations (100) might not be sufficient. +5. A list of different values for the regularization strength C is defined. The hyperparameter C controls the regularization strength, with smaller values specifying stronger regularization. +6. An empty list scores is initialized to store the model's performance scores for different values of C. +7. A for loop iterates over each value in the C list: +8. logit.set_params(C=choice) sets the C parameter of the logistic regression model to the current value in the loop. +9. logit.fit(X, y) fits the logistic regression model to the entire Iris dataset (this is typically done on training data in a real scenario, not the entire dataset). +10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list. +11. After the loop, the scores list is printed, showing the accuracy for each value of C. + +## Python Code + +from sklearn import datasets + +from sklearn.linear_model import LogisticRegression + +iris = datasets.load_iris() + +X = iris['data'] + +y = iris['target'] + +logit = LogisticRegression(max_iter = 10000) + +C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] + +scores = [] + +for choice in C: + + logit.set_params(C=choice) + + logit.fit(X, y) + + scores.append(logit.score(X, y)) + +print(scores) + +## Results + +[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667] + +We can see that the lower values of C performed worse than the base parameter of 1. However, as we increased the value of C to 1.75 the model experienced increased accuracy. +It seems that increasing C beyond this amount does not help increase model accuracy. From 50d2b654e03d777ef20c20a210732a9b1a44c54e Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:05:09 +0530 Subject: [PATCH 03/13] Update index.md --- contrib/plotting-visualization/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/plotting-visualization/index.md b/contrib/plotting-visualization/index.md index 32261d6..a8416a8 100644 --- a/contrib/plotting-visualization/index.md +++ b/contrib/plotting-visualization/index.md @@ -3,3 +3,4 @@ - [Installing Matplotlib](matplotlib-installation.md) - [Bar Plots in Matplotlib](matplotlib-bar-plots.md) - [Pie Charts in Matplotlib](matplotlib-pie-charts.md) +- [Box Plots in Matplotlib](matplotlib-box-plots.md) From c54f6128e9cdee0834d265c7b6ca7a42a105c540 Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:06:46 +0530 Subject: [PATCH 04/13] Create matplotlib-box-plots.md --- .../matplotlib-box-plots.md | 104 ++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 contrib/plotting-visualization/matplotlib-box-plots.md diff --git a/contrib/plotting-visualization/matplotlib-box-plots.md b/contrib/plotting-visualization/matplotlib-box-plots.md new file mode 100644 index 0000000..323b5a4 --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-box-plots.md @@ -0,0 +1,104 @@ +# Box Plot + +A box plot represents the distribution of a dataset in a graph. It displays the summary statistics of a dataset, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The box represents the interquartile range (IQR) between the first and third quartiles, while whiskers extend from the box to the minimum and maximum values. Outliers, if present, may be displayed as individual points beyond the whiskers. + +For example - Imagine you have the exam scores of students from three classes. A box plot is a way to show how these scores are spread out. + +## Key Ranges in Data Distribution + +The data can be distributed between five key ranges, which are as follows - +1. Minimum: Q1-1.5*IQR +2. 1st quartile (Q1): 25th percentile +3. Median: 50th percentile +4. 3rd quartile(Q3): 75th percentile +5. Maximum: Q3+1.5*IQR + +## Purpose of Box Plots + +We can create the box plot of the data to determine the following- +1. The number of outliers in a dataset +2. Is the data skewed or not (skewness is a measure of asymmetry of the distribution) +3. The range of the data + +## Creating Box Plots using Matplotlib + +By using inbuilt funtion boxplot() of pyplot module of matplotlib - + +Syntax - matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none) + +1. data: The data should be an array or sequence of arrays which will be plotted. +2. notch: This parameter accepts only Boolean values, either true or false. +3. vert: This attribute accepts a Boolean value. If it is set to true, then the graph will be vertical. Otherwise, it will be horizontal. +4. position: It accepts the array of integers which defines the position of the box. +5. widths: It accepts the array of integers which defines the width of the box. +6. patch_artist: this parameter accepts Boolean values, either true or false, and this is an optional parameter. +7. labels: This accepts the strings which define the labels for each data point +8. meanline: It accepts a boolean value, and it is optional. +9. order: It sets the order of the boxplot. +10. bootstrap: It accepts the integer value, which specifies the range of the notched boxplot. + +## Implementation of Box Plot in Python + +### Import libraries +import matplotlib.pyplot as plt +import numpy as np + +### Creating dataset +np.random.seed(10) +data = np.random.normal(100, 20, 200) +fig = plt.figure(figsize =(10, 7)) + +### Creating plot +plt.boxplot(data) + +### show plot +plt.show() + +### Implementation of Multiple Box Plot in Python +import matplotlib.pyplot as plt +import numpy as np +np.random.seed(10) +dataSet1 = np.random.normal(100, 10, 220) +dataSet2 = np.random.normal(80, 20, 200) +dataSet3 = np.random.normal(60, 35, 220) +dataSet4 = np.random.normal(50, 40, 200) +dataSet = [dataSet1, dataSet2, dataSet3, dataSet4] +figure = plt.figure(figsize =(10, 7)) +ax = figure.add_axes([0, 0, 1, 1]) +bp = ax.boxplot(dataSet) +plt.show() + +### Implementation of Box Plot with Outliers (visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales) +import matplotlib.pyplot as plt +import numpy as np + +### Data for monthly sales +product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200] +product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250] +product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95] + +### Introducing outliers +product_A_sales.extend([300, 80]) +product_B_sales.extend([50, 300]) +product_C_sales.extend([70, 250]) + +### Creating a box plot with outliers +plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o') +plt.title('Monthly Sales Performance by Product with Outliers') +plt.xlabel('Products') +plt.ylabel('Sales') +plt.show() + +### Implementation of Grouped Box Plot (to compare the exam scores of students from three different classes (A, B, and C)) +import matplotlib.pyplot as plt +import numpy as np +class_A_scores = [75, 80, 85, 90, 95] +class_B_scores = [70, 75, 80, 85, 90] +class_C_scores = [65, 70, 75, 80, 85] + +### Creating a grouped box plot +plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C']) +plt.title('Exam Scores by Class') +plt.xlabel('Classes') +plt.ylabel('Scores') +plt.show() From c3f715e98321ff5afe0fb50d5dd80e7ec7366a59 Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:09:21 +0530 Subject: [PATCH 05/13] Update matplotlib-box-plots.md --- .../matplotlib-box-plots.md | 39 ++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/contrib/plotting-visualization/matplotlib-box-plots.md b/contrib/plotting-visualization/matplotlib-box-plots.md index 323b5a4..4d76243 100644 --- a/contrib/plotting-visualization/matplotlib-box-plots.md +++ b/contrib/plotting-visualization/matplotlib-box-plots.md @@ -40,60 +40,97 @@ Syntax - matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths ## Implementation of Box Plot in Python ### Import libraries + import matplotlib.pyplot as plt + import numpy as np ### Creating dataset + np.random.seed(10) + data = np.random.normal(100, 20, 200) + fig = plt.figure(figsize =(10, 7)) ### Creating plot + plt.boxplot(data) ### show plot + plt.show() ### Implementation of Multiple Box Plot in Python + import matplotlib.pyplot as plt + import numpy as np + np.random.seed(10) + dataSet1 = np.random.normal(100, 10, 220) -dataSet2 = np.random.normal(80, 20, 200) + +dataSet2 = np.random.normal(80, 20, 200) + dataSet3 = np.random.normal(60, 35, 220) + dataSet4 = np.random.normal(50, 40, 200) + dataSet = [dataSet1, dataSet2, dataSet3, dataSet4] + figure = plt.figure(figsize =(10, 7)) + ax = figure.add_axes([0, 0, 1, 1]) + bp = ax.boxplot(dataSet) + plt.show() ### Implementation of Box Plot with Outliers (visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales) + import matplotlib.pyplot as plt + import numpy as np ### Data for monthly sales + product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200] + product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250] + product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95] ### Introducing outliers + product_A_sales.extend([300, 80]) + product_B_sales.extend([50, 300]) + product_C_sales.extend([70, 250]) ### Creating a box plot with outliers + plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o') + plt.title('Monthly Sales Performance by Product with Outliers') + plt.xlabel('Products') + plt.ylabel('Sales') + plt.show() ### Implementation of Grouped Box Plot (to compare the exam scores of students from three different classes (A, B, and C)) + import matplotlib.pyplot as plt + import numpy as np + class_A_scores = [75, 80, 85, 90, 95] + class_B_scores = [70, 75, 80, 85, 90] + class_C_scores = [65, 70, 75, 80, 85] ### Creating a grouped box plot From d552e04ded85206b731474e0e272f1701641c13a Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:09:57 +0530 Subject: [PATCH 06/13] Update matplotlib-box-plots.md --- contrib/plotting-visualization/matplotlib-box-plots.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/contrib/plotting-visualization/matplotlib-box-plots.md b/contrib/plotting-visualization/matplotlib-box-plots.md index 4d76243..1780cab 100644 --- a/contrib/plotting-visualization/matplotlib-box-plots.md +++ b/contrib/plotting-visualization/matplotlib-box-plots.md @@ -134,8 +134,13 @@ class_B_scores = [70, 75, 80, 85, 90] class_C_scores = [65, 70, 75, 80, 85] ### Creating a grouped box plot + plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C']) + plt.title('Exam Scores by Class') + plt.xlabel('Classes') + plt.ylabel('Scores') + plt.show() From 208ff1ea06192dc51ebc45f1e32b29a185d4b561 Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:13:42 +0530 Subject: [PATCH 07/13] Delete contrib/plotting-visualization/matplotlib-box-plots.md --- .../matplotlib-box-plots.md | 146 ------------------ 1 file changed, 146 deletions(-) delete mode 100644 contrib/plotting-visualization/matplotlib-box-plots.md diff --git a/contrib/plotting-visualization/matplotlib-box-plots.md b/contrib/plotting-visualization/matplotlib-box-plots.md deleted file mode 100644 index 1780cab..0000000 --- a/contrib/plotting-visualization/matplotlib-box-plots.md +++ /dev/null @@ -1,146 +0,0 @@ -# Box Plot - -A box plot represents the distribution of a dataset in a graph. It displays the summary statistics of a dataset, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The box represents the interquartile range (IQR) between the first and third quartiles, while whiskers extend from the box to the minimum and maximum values. Outliers, if present, may be displayed as individual points beyond the whiskers. - -For example - Imagine you have the exam scores of students from three classes. A box plot is a way to show how these scores are spread out. - -## Key Ranges in Data Distribution - -The data can be distributed between five key ranges, which are as follows - -1. Minimum: Q1-1.5*IQR -2. 1st quartile (Q1): 25th percentile -3. Median: 50th percentile -4. 3rd quartile(Q3): 75th percentile -5. Maximum: Q3+1.5*IQR - -## Purpose of Box Plots - -We can create the box plot of the data to determine the following- -1. The number of outliers in a dataset -2. Is the data skewed or not (skewness is a measure of asymmetry of the distribution) -3. The range of the data - -## Creating Box Plots using Matplotlib - -By using inbuilt funtion boxplot() of pyplot module of matplotlib - - -Syntax - matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none) - -1. data: The data should be an array or sequence of arrays which will be plotted. -2. notch: This parameter accepts only Boolean values, either true or false. -3. vert: This attribute accepts a Boolean value. If it is set to true, then the graph will be vertical. Otherwise, it will be horizontal. -4. position: It accepts the array of integers which defines the position of the box. -5. widths: It accepts the array of integers which defines the width of the box. -6. patch_artist: this parameter accepts Boolean values, either true or false, and this is an optional parameter. -7. labels: This accepts the strings which define the labels for each data point -8. meanline: It accepts a boolean value, and it is optional. -9. order: It sets the order of the boxplot. -10. bootstrap: It accepts the integer value, which specifies the range of the notched boxplot. - -## Implementation of Box Plot in Python - -### Import libraries - -import matplotlib.pyplot as plt - -import numpy as np - -### Creating dataset - -np.random.seed(10) - -data = np.random.normal(100, 20, 200) - -fig = plt.figure(figsize =(10, 7)) - -### Creating plot - -plt.boxplot(data) - -### show plot - -plt.show() - -### Implementation of Multiple Box Plot in Python - -import matplotlib.pyplot as plt - -import numpy as np - -np.random.seed(10) - -dataSet1 = np.random.normal(100, 10, 220) - -dataSet2 = np.random.normal(80, 20, 200) - -dataSet3 = np.random.normal(60, 35, 220) - -dataSet4 = np.random.normal(50, 40, 200) - -dataSet = [dataSet1, dataSet2, dataSet3, dataSet4] - -figure = plt.figure(figsize =(10, 7)) - -ax = figure.add_axes([0, 0, 1, 1]) - -bp = ax.boxplot(dataSet) - -plt.show() - -### Implementation of Box Plot with Outliers (visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales) - -import matplotlib.pyplot as plt - -import numpy as np - -### Data for monthly sales - -product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200] - -product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250] - -product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95] - -### Introducing outliers - -product_A_sales.extend([300, 80]) - -product_B_sales.extend([50, 300]) - -product_C_sales.extend([70, 250]) - -### Creating a box plot with outliers - -plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o') - -plt.title('Monthly Sales Performance by Product with Outliers') - -plt.xlabel('Products') - -plt.ylabel('Sales') - -plt.show() - -### Implementation of Grouped Box Plot (to compare the exam scores of students from three different classes (A, B, and C)) - -import matplotlib.pyplot as plt - -import numpy as np - -class_A_scores = [75, 80, 85, 90, 95] - -class_B_scores = [70, 75, 80, 85, 90] - -class_C_scores = [65, 70, 75, 80, 85] - -### Creating a grouped box plot - -plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C']) - -plt.title('Exam Scores by Class') - -plt.xlabel('Classes') - -plt.ylabel('Scores') - -plt.show() From dd24cd3a2e82ef40f55ead219868947afdfc12c6 Mon Sep 17 00:00:00 2001 From: Vrisha Shah <74671946+Vrisha213@users.noreply.github.com> Date: Fri, 31 May 2024 08:14:00 +0530 Subject: [PATCH 08/13] Update index.md --- contrib/plotting-visualization/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/plotting-visualization/index.md b/contrib/plotting-visualization/index.md index a8416a8..92ec106 100644 --- a/contrib/plotting-visualization/index.md +++ b/contrib/plotting-visualization/index.md @@ -3,4 +3,4 @@ - [Installing Matplotlib](matplotlib-installation.md) - [Bar Plots in Matplotlib](matplotlib-bar-plots.md) - [Pie Charts in Matplotlib](matplotlib-pie-charts.md) -- [Box Plots in Matplotlib](matplotlib-box-plots.md) + From ef42e881851eb20f9a9c8c28a9afdcff89c7fb67 Mon Sep 17 00:00:00 2001 From: Ankit Mahato Date: Fri, 31 May 2024 20:49:03 +0530 Subject: [PATCH 09/13] Update index.md --- contrib/plotting-visualization/index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/contrib/plotting-visualization/index.md b/contrib/plotting-visualization/index.md index 92ec106..32261d6 100644 --- a/contrib/plotting-visualization/index.md +++ b/contrib/plotting-visualization/index.md @@ -3,4 +3,3 @@ - [Installing Matplotlib](matplotlib-installation.md) - [Bar Plots in Matplotlib](matplotlib-bar-plots.md) - [Pie Charts in Matplotlib](matplotlib-pie-charts.md) - From f26f8da60ec1814d6165a6cb908aa2a0ea7a1b2c Mon Sep 17 00:00:00 2001 From: Ankit Mahato Date: Fri, 31 May 2024 20:53:21 +0530 Subject: [PATCH 10/13] Update grid-search.md --- contrib/machine-learning/grid-search.md | 41 +++++++++++++------------ 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/contrib/machine-learning/grid-search.md b/contrib/machine-learning/grid-search.md index 3bf53ff..ae44412 100644 --- a/contrib/machine-learning/grid-search.md +++ b/contrib/machine-learning/grid-search.md @@ -1,20 +1,26 @@ # Grid Search Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set. + The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity. ## Implementation -Before applying Grid Searching on any algorithm, Data is used to divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination. -Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accurac -Let us consider that the model accepts the below three parameters in the form of input: -1. Number of hidden layers [2, 4] -2. Number of neurons in every layer [5, 10] -3. Number of epochs [10, 50] +Before applying Grid Searching on any algorithm, data is divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination. + +Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy. + +Let us consider that the model accepts the below three parameters in the form of input: +1. Number of hidden layers `[2, 4]` +2. Number of neurons in every layer `[5, 10]` +3. Number of epochs `[10, 50]` + +If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be `[2, 5, 10]`. Finding such combinations manually would be a headache. -If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be [2, 5, 10]. Finding such combinations manually would be a headache. Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters. + Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy. + Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite. ## Explaination of the Code @@ -32,16 +38,14 @@ The code provided performs hyperparameter tuning for a Logistic Regression model 10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list. 11. After the loop, the scores list is printed, showing the accuracy for each value of C. -## Python Code +### Python Code +```python from sklearn import datasets - from sklearn.linear_model import LogisticRegression iris = datasets.load_iris() - X = iris['data'] - y = iris['target'] logit = LogisticRegression(max_iter = 10000) @@ -49,20 +53,19 @@ logit = LogisticRegression(max_iter = 10000) C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] scores = [] - for choice in C: - logit.set_params(C=choice) - logit.fit(X, y) - scores.append(logit.score(X, y)) - print(scores) +``` -## Results +#### Results +``` [0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667] +``` -We can see that the lower values of C performed worse than the base parameter of 1. However, as we increased the value of C to 1.75 the model experienced increased accuracy. -It seems that increasing C beyond this amount does not help increase model accuracy. +We can see that the lower values of `C` performed worse than the base parameter of `1`. However, as we increased the value of `C` to `1.75` the model experienced increased accuracy. + +It seems that increasing `C` beyond this amount does not help increase model accuracy. From d83b6e1deaa84b2d0d2411c7a5eb8b4284ab5351 Mon Sep 17 00:00:00 2001 From: Ankit Mahato Date: Sat, 1 Jun 2024 20:56:34 +0530 Subject: [PATCH 11/13] Update CONTRIBUTING.md --- CONTRIBUTING.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4a366da..8688009 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along - Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping) - API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development) - Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms) -- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) -- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) +- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)** +- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)** You can check out some content ideas below. From d372975f6c7accb7dd3e7a7852ba61d3e24ea724 Mon Sep 17 00:00:00 2001 From: Mohammed Ahmed Majid <109688855+PilotAxis@users.noreply.github.com> Date: Sat, 1 Jun 2024 21:45:43 +0530 Subject: [PATCH 12/13] Added file exception-handling.md --- contrib/advanced-python/exception-handling.md | 192 ++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 contrib/advanced-python/exception-handling.md diff --git a/contrib/advanced-python/exception-handling.md b/contrib/advanced-python/exception-handling.md new file mode 100644 index 0000000..bddcac4 --- /dev/null +++ b/contrib/advanced-python/exception-handling.md @@ -0,0 +1,192 @@ +# Exception Handling in Python + +Exception handling is a way of managing errors that may occur during program execution, through which you can handle exceptions gracefully. Python's exception handling mechanism has been designed to avoid unexpected termination of the program and offer a means to either regain control after an error or display meaningful messages to the user. + +- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute. +- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing. + +## Python Built-in Exceptions + +There are plenty of built-in exceptions in Python that are raised when a corresponding error occur. +We can view all the built-in exceptions using the built-in `local()` function as follows: + +```python +print(dir(locals()['__builtins__'])) +``` + +|**S.No**|**Exception**|**Description**| +|---|---|---| +|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.| +|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.| +|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.| +|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.| +|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.| +|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)| +|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.| +|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.| +|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.| +|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.| + +## Try and Except Statement - Catching Exception + +The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust. + +Here's an example to explain this: + +```python +try: + # code that might raise an exception + result = 10 / 0 +except: + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it. + +## Specific Exception Handling + +You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types. + +Here's an example: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except ZeroDivisionError: + print("Oops! You tried to divide by zero.") +except NameError: + print("There's a variable named 'undefined_variable' that hasn't been defined yet.") +``` + +Output + +```markdown +Oops! You tried to divide by zero. +``` + +If you comment on the line `result = 10 / 0`, then the output will be + +```markdown +There's a variable named 'undefined_variable' that hasn't been defined yet. +``` + +## Important Note + +In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except (ZeroDivisionError, NameError): + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +## Try with Else Clause + +The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block. + +Here's an example to understand this: + +```python +def calculate_average(numbers): + if len(numbers) == 0: # Handle empty list case seperately (optional) + return None + try: + total = sum(numbers) + average = total / len(numbers) + except ZeroDivisionError: + print("Cannot calculate average for a list containing zero.") + else: + print("The average is:", average) + return average #Optionally return the average here + +# Example usage +numbers = [10, 20, 30] +result = calculate_average(numbers) + +if result is not None: # Check if result is available (handles empty list case) + print("Calculation succesfull!") +``` + +Output + +```markdown +The average is: 20.0 +``` + +## Finally Keyword in Python + +The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not. + +To understand this, let us take an example: + +```python +try: + a = 10 // 0 + print(a) +except ZeroDivisionError: + print("Cannot be divided by zero.") +finally: + print("Program executed!") +``` + +Output + +```markdown +Cannot be divided by zero. +Program executed! +``` + +## Raise Keyword in Python + +In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception. + +Let us take an example: + +```python +def divide(x, y): + if y == 0: + raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message + result = x / y + return result + +try: + division_result = divide(10, 0) + print("Result:", division_result) +except ZeroDivisionError as e: + print("An error occured:", e) # Handle the exception and print the message +``` + +Output + +```markdown +An error occured: Can't divide by zero! +``` + +## Advantages of Exception Handling + +- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience. +- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses. +- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed. + +## Disadvantages of Exception Handling + +- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling. +- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications. +- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations. From a9e3a4673c62076ba2a764cf6d77d5acda177fc3 Mon Sep 17 00:00:00 2001 From: Mohammed Ahmed Majid Date: Sun, 2 Jun 2024 00:39:56 +0530 Subject: [PATCH 13/13] Updated Files --- contrib/advanced-python/exception-handling.md | 6 +++--- contrib/advanced-python/index.md | 1 + 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/contrib/advanced-python/exception-handling.md b/contrib/advanced-python/exception-handling.md index bddcac4..3e0c672 100644 --- a/contrib/advanced-python/exception-handling.md +++ b/contrib/advanced-python/exception-handling.md @@ -1,6 +1,6 @@ # Exception Handling in Python -Exception handling is a way of managing errors that may occur during program execution, through which you can handle exceptions gracefully. Python's exception handling mechanism has been designed to avoid unexpected termination of the program and offer a means to either regain control after an error or display meaningful messages to the user. +Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user. - **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute. - **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing. @@ -35,7 +35,7 @@ Here's an example to explain this: ```python try: - # code that might raise an exception + # Code that might raise an exception result = 10 / 0 except: print("An error occured!") @@ -72,7 +72,7 @@ Output Oops! You tried to divide by zero. ``` -If you comment on the line `result = 10 / 0`, then the output will be +If you comment on the line `result = 10 / 0`, then the output will be: ```markdown There's a variable named 'undefined_variable' that hasn't been defined yet. diff --git a/contrib/advanced-python/index.md b/contrib/advanced-python/index.md index b95e4b9..febcbbe 100644 --- a/contrib/advanced-python/index.md +++ b/contrib/advanced-python/index.md @@ -7,3 +7,4 @@ - [Regular Expressions in Python](regular_expressions.md) - [JSON module](json-module.md) - [Map Function](map-function.md) +- [Exception Handling in Python](exception-handling.md)