Merge branch 'main' into main

pull/774/head
Arihant Yadav 2024-06-02 21:57:57 +05:30 zatwierdzone przez GitHub
commit 6462ebed70
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
31 zmienionych plików z 1269 dodań i 10 usunięć

Wyświetl plik

@ -0,0 +1,87 @@
# Generators
## Introduction
Generators in Python are a sophisticated feature that enables the creation of iterators without the need to construct a full list in memory. They allow you to generate values on-the-fly, which is particularly beneficial for working with large datasets or infinite sequences. We will explore generators in depth, covering their types, mathematical formulation, advantages, disadvantages, and implementation examples.
## Function Generators
Function generators are created using the `yield` keyword within a function. When invoked, a function generator returns a generator iterator, allowing you to iterate over the values generated by the function.
### Mathematical Formulation
Function generators can be represented mathematically using set-builder notation. The general form is:
```
{expression | variable in iterable, condition}
```
Where:
- `expression` is the expression to generate values.
- `variable` is the variable used in the expression.
- `iterable` is the sequence of values to iterate over.
- `condition` is an optional condition that filters the values.
### Advantages of Function Generators
1. **Memory Efficiency**: Function generators produce values lazily, meaning they generate values only when needed, saving memory compared to constructing an entire sequence upfront.
2. **Lazy Evaluation**: Values are generated on-the-fly as they are consumed, leading to improved performance and reduced overhead, especially when dealing with large datasets.
3. **Infinite Sequences**: Function generators can represent infinite sequences, such as the Fibonacci sequence, allowing you to work with data streams of arbitrary length without consuming excessive memory.
### Disadvantages of Function Generators
1. **Single Iteration**: Once a function generator is exhausted, it cannot be reused. If you need to iterate over the sequence again, you'll have to create a new generator.
2. **Limited Random Access**: Function generators do not support random access like lists. They only allow sequential access, which might be a limitation depending on the use case.
### Implementation Example
```python
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Usage
fib_gen = fibonacci()
for _ in range(10):
print(next(fib_gen))
```
## Generator Expressions
Generator expressions are similar to list comprehensions but return a generator object instead of a list. They offer a concise way to create generators without the need for a separate function.
### Mathematical Formulation
Generator expressions can also be represented mathematically using set-builder notation. The general form is the same as for function generators.
### Advantages of Generator Expressions
1. **Memory Efficiency**: Generator expressions produce values lazily, similar to function generators, resulting in memory savings.
2. **Lazy Evaluation**: Values are generated on-the-fly as they are consumed, providing improved performance and reduced overhead.
### Disadvantages of Generator Expressions
1. **Single Iteration**: Like function generators, once a generator expression is exhausted, it cannot be reused.
2. **Limited Random Access**: Generator expressions, similar to function generators, do not support random access.
### Implementation Example
```python
# Generate squares of numbers from 0 to 9
square_gen = (x**2 for x in range(10))
# Usage
for num in square_gen:
print(num)
```
## Conclusion
Generators offer a powerful mechanism for creating iterators efficiently in Python. By understanding the differences between function generators and generator expressions, along with their mathematical formulation, advantages, and disadvantages, you can leverage them effectively in various scenarios. Whether you're dealing with large datasets or need to work with infinite sequences, generators provide a memory-efficient solution with lazy evaluation capabilities, contributing to more elegant and scalable code.

Wyświetl plik

@ -1,11 +1,13 @@
# List of sections
- [OOPs](OOPs.md)
- [OOPs](oops.md)
- [Decorators/\*args/**kwargs](decorator-kwargs-args.md)
- [Lambda Function](lambda-function.md)
- [Working with Dates & Times in Python](dates_and_times.md)
- [Regular Expressions in Python](regular_expressions.md)
- [JSON module](json-module.md)
- [Map Function](map-function.md)
- [Protocols](protocols.md)
- [Exception Handling in Python](exception-handling.md)
- [Generators](generators.md)
- [Closures](closures.md)

Wyświetl plik

@ -0,0 +1,243 @@
# Protocols in Python
Python can establish informal interfaces using protocols In order to improve code structure, reusability, and type checking. Protocols allow for progressive adoption and are more flexible than standard interfaces in other programming languages like JAVA, which are tight contracts that specify the methods and attributes a class must implement.
>Before going into depth of this topic let's understand another topic which is pre-requisite od this topic \#TypingModule
## Typing Module
This is a module in python which provides
1. Provides classes, functions, and type aliases.
2. Allows adding type annotations to our code.
3. Enhances code readability.
4. Helps in catching errors early.
### Type Hints in Python:
Type hints allow you to specify the expected data types of variables, function parameters, and return values. This can improve code readability and help with debugging.
Here is a simple function that adds two numbers:
```python
def add(a,b):
return a + b
add(10,20)
```
>Output: 30
While this works fine, adding type hints makes the code more understandable and serves as documentation:
```python
def add(a:int, b:int)->int:
return a + b
print(add(1,10))
```
>Output: 11
In this version, `a` and `b` are expected to be integers, and the function is expected to return an integer. This makes the function's purpose and usage clearer.
#### let's see another example
The function given below takes an iterable (it can be any off list, tuple, dict, set, frozeset, String... etc) and print it's content in a single line along with it's type.
```python
from typing import Iterable
# type alias
def print_all(l: Iterable)->None:
print(type(l),end=' ')
for i in l:
print(i,end=' ')
print()
l = [1,2,3,4,5] # type: List[int]
s = {1,2,3,4,5} # type: Set[int]
t = (1,2,3,4,5) # type: Tuple[int]
for iter_obj in [l,s,t]:
print_all(iter_obj)
```
Output:
> <class 'list'> 1 2 3 4 5
> <class 'set'> 1 2 3 4 5
> <class 'tuple'> 1 2 3 4 5
and now lets try calling the function `print_all` using a non-iterable object `int` as argument.
```python
a = 10
print_all(a) # This will raise an error
```
Output:
>TypeError: 'int' object is not iterable
This error occurs because `a` is an `integer`, and the `integer` class does not have any methods or attributes that make it work like an iterable. In other words, the integer class does not conform to the `Iterable` protocol.
**Benefits of Type Hints**
Using type hints helps in several ways:
1. **Error Detection**: Tools like mypy can catch type-related problems during development, decreasing runtime errors.
2. **Code Readability**: Type hints serve as documentation, making it easy to comprehend what data types are anticipated and returned.
3. **Improved Maintenance**: With unambiguous type expectations, maintaining and updating code becomes easier, especially in huge codebases.
Now that we have understood about type hints and typing module let's dive deep into protocols.
## Understanding Protocols
In Python, protocols define interfaces similar to Java interfaces. They let you specify methods and attributes that an object must implement without requiring inheritance from a base class. Protocols are part of the `typing` module and provide a way to enforce certain structures in your classes, enhancing type safety and code clarity.
### What is a Protocol?
A protocol specifies one or more method signatures that a class must implement to be considered as conforming to the protocol.
This concept is often referred to as "structural subtyping" or "duck typing," meaning that if an object implements the required methods and attributes, it can be treated as an instance of the protocol.
Let's write our own protocol:
```python
from typing import Protocol
# Define a Printable protocol
class Printable(Protocol):
def print(self) -> None:
"""Print the object"""
pass
# Book class implements the Printable protocol
class Book:
def __init__(self, title: str):
self.title = title
def print(self) -> None:
print(f"Book Title: {self.title}")
# print_object function takes a Printable object and calls its print method
def print_object(obj: Printable) -> None:
obj.print()
book = Book("Python Programming")
print_object(book)
```
Output:
> Book Title: Python Programming
In this example:
1. **Printable Protocol:** Defines an interface with a single method print.
2. **Book Class:** Implements the Printable protocol by providing a print method.
3. **print_object Function:** Accepts any object that conforms to the Printable protocol and calls its print method.
we got our output because the class `Book` confirms to the protocols `printable`.
similarly When you pass an object to `print_object` that does not conform to the Printable protocol, an error will occur. This is because the object does not implement the required `print` method.
Let's see an example:
```python
class Team:
def huddle(self) -> None:
print("Team Huddle")
c = Team()
print_object(c) # This will raise an error
```
Output:
>AttributeError: 'Team' object has no attribute 'print'
In this case:
- The `Team` class has a `huddle` method but does not have a `print` method.
- When `print_object` tries to call the `print` method on a `Team` instance, it raises an `AttributeError`.
> This is an important aspect of using protocols: they ensure that objects provide the necessary methods, leading to more predictable and reliable code.
**Ensuring Protocol Conformance**
To avoid such errors, you need to ensure that any object passed to `print_object` implements the `Printable` protocol. Here's how you can modify the `Team` class to conform to the protocol:
```python
class Team:
def __init__(self, name: str):
self.name = name
def huddle(self) -> None:
print("Team Huddle")
def print(self) -> None:
print(f"Team Name: {self.name}")
c = Team("Dream Team")
print_object(c)
```
Output:
>Team Name: Dream Team
The `Team` class now implements the `print` method, conforming to the `Printable` protocol. and hence, no longer raises an error.
### Protocols and Inheritance:
Protocols can also be used in combination with inheritance to create more complex interfaces.
we can do that by following these steps:
**Step 1 - Base protocol**: Define a base protocol that specifies a common set of methods and attributes.
**Step 2 - Derived Protocols**: Create derives protocols that extends the base protocol with addition requirements
**Step 3 - Polymorphism**: Objects can then conform to multiple protocols, allowing for Polymorphic behavior.
Let's see an example on this as well:
```python
from typing import Protocol
# Base Protocols
class Printable(Protocol):
def print(self) -> None:
"""Print the object"""
pass
# Base Protocols-2
class Serializable(Protocol):
def serialize(self) -> str:
pass
# Derived Protocol
class PrintableAndSerializable(Printable, Serializable):
pass
# class with implementation of both Printable and Serializable
class Book_serialize:
def __init__(self, title: str):
self.title = title
def print(self) -> None:
print(f"Book Title: {self.title}")
def serialize(self) -> None:
print(f"serialize: {self.title}")
# function accepts the object which implements PrintableAndSerializable
def test(obj: PrintableAndSerializable):
obj.print()
obj.serialize()
book = Book_serialize("lean-in")
test(book)
```
Output:
> Book Title: lean-in
serialize: lean-in
In this example:
**Printable Protocol:** Specifies a `print` method.
**Serializable Protocol:** Specifies a `serialize` method.
**PrintableAndSerializable Protocol:** Combines both `Printable` and `Serializable`.
**Book Class**: Implements both `print` and `serialize` methods, conforming to `PrintableAndSerializable`.
**test Function:** Accepts any object that implements the `PrintableAndSerializable` protocol.
If you try to pass an object that does not conform to the `PrintableAndSerializable` protocol to the test function, it will raise an `error`. Let's see an example:
```python
class Team:
def huddle(self) -> None:
print("Team Huddle")
c = Team()
test(c) # This will raise an error
```
output:
> AttributeError: 'Team' object has no attribute 'print'
In this case:
The `Team` class has a `huddle` method but does not implement `print` or `serialize` methods.
When test tries to call `print` and `serialize` on a `Team` instance, it raises an `AttributeError`.
**In Conclusion:**
>Python protocols offer a versatile and powerful means of defining interfaces, encouraging the decoupling of code, improving readability, and facilitating static type checking. They are particularly handy for scenarios involving file-like objects, bespoke containers, and any case where you wish to enforce certain behaviors without requiring inheritance from a specific base class. Ensuring that classes conform to protocols reduces runtime problems and makes your code more robust and maintainable.

Wyświetl plik

@ -10,5 +10,6 @@
- [Greedy Algorithms](greedy-algorithms.md)
- [Dynamic Programming](dynamic-programming.md)
- [Linked list](linked-list.md)
- [Stacks in Python](stacks.md)
- [Sliding Window Technique](sliding-window.md)
- [Trie](trie.md)

Wyświetl plik

@ -0,0 +1,116 @@
# Stacks in Python
In Data Structures and Algorithms, a stack is a linear data structure that complies with the Last In, First Out (LIFO) rule. It works by use of two fundamental techniques: **PUSH** which inserts an element on top of the stack and **POP** which takes out the topmost element.This concept is similar to a stack of plates in a cafeteria. Stacks are usually used for handling function calls, expression evaluation, and parsing in programming. Indeed, they are efficient in managing memory as well as tracking program state.
## Points to be Remebered
- A stack is a collection of data items that can be accessed at only one end, called **TOP**.
- Items can be inserted and deleted in a stack only at the TOP.
- The last item inserted in a stack is the first one to be deleted.
- Therefore, a stack is called a **Last-In-First-Out (LIFO)** data structure.
## Real Life Examples of Stacks
- **PILE OF BOOKS** - Suppose a set of books are placed one over the other in a pile. When you remove books from the pile, the topmost book will be removed first. Similarly, when you have to add a book to the pile, the book will be placed at the top of the file.
- **PILE OF PLATES** - The first plate begins the pile. The second plate is placed on the top of the first plate and the third plate is placed on the top of the second plate, and so on. In general, if you want to add a plate to the pile, you can keep it on the top of the pile. Similarly, if you want to remove a plate, you can remove the plate from the top of the pile.
- **BANGLES IN A HAND** - When a person wears bangles, the last bangle worn is the first one to be removed.
## Applications of Stacks
Stacks are widely used in Computer Science:
- Function call management
- Maintaining the UNDO list for the application
- Web browser *history management*
- Evaluating expressions
- Checking the nesting of parentheses in an expression
- Backtracking algorithms (Recursion)
Understanding these applications is essential for Software Development.
## Operations on a Stack
Key operations on a stack include:
- **PUSH** - It is the process of inserting a new element on the top of a stack.
- **OVERFLOW** - A situation when we are pushing an item in a stack that is full.
- **POP** - It is the process of deleting an element from the top of a stack.
- **UNDERFLOW** - A situation when we are popping item from an empty stack.
- **PEEK** - It is the process of getting the most recent value of stack *(i.e. the value at the top of the stack)*
- **isEMPTY** - It is the function which return true if stack is empty else false.
- **SHOW** -Displaying stack items.
## Implementing Stacks in Python
```python
def isEmpty(S):
if len(S) == 0:
return True
else:
return False
def Push(S, item):
S.append(item)
def Pop(S):
if isEmpty(S):
return "Underflow"
else:
val = S.pop()
return val
def Peek(S):
if isEmpty(S):
return "Underflow"
else:
top = len(S) - 1
return S[top]
def Show(S):
if isEmpty(S):
print("Sorry, No items in Stack")
else:
print("(Top)", end=' ')
t = len(S) - 1
while t >= 0:
print(S[t], "<", end=' ')
t -= 1
print()
stack = [] # initially stack is empty
Push(stack, 5)
Push(stack, 10)
Push(stack, 15)
print("Stack after Push operations:")
Show(stack)
print("Peek operation:", Peek(stack))
print("Pop operation:", Pop(stack))
print("Stack after Pop operation:")
Show(stack)
```
## Output
```markdown
Stack after Push operations:
(Top) 15 < 10 < 5 <
Peek operation: 15
Pop operation: 15
Stack after Pop operation:
(Top) 10 < 5 <
```
## Complexity Analysis
- **Worst case**: `O(n)` This occurs when the stack is full, it is dominated by the usage of Show operation.
- **Best case**: `O(1)` When the operations like isEmpty, Push, Pop and Peek are used, they have a constant time complexity of O(1).
- **Average case**: `O(n)` The average complexity is likely to be lower than O(n), as the stack is not always full.

Wyświetl plik

@ -0,0 +1,235 @@
# Cost Functions in Machine Learning
Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data.
## Types of Cost Functions
### 1. Mean Squared Error (MSE)
**Explanation:**
MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values.
**Mathematical Formulation:**
The MSE is defined as:
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Where:
- `n` is the number of samples.
- $y_i$ is the actual value.
- $\hat{y}_i$ is the predicted value.
**Advantages:**
- Sensitive to large errors due to squaring.
- Differentiable and convex, facilitating optimization.
**Disadvantages:**
- Sensitive to outliers, as the squared term amplifies their impact.
**Python Implementation:**
```python
import numpy as np
def mean_squared_error(y_true, y_pred):
n = len(y_true)
return np.mean((y_true - y_pred) ** 2)
```
### 2. Mean Absolute Error (MAE)
**Explanation:**
MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values.
**Mathematical Formulation:**
The MAE is defined as:
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
Where:
- `n` is the number of samples.
- $y_i$ is the actual value.
- $\hat{y}_i$ is the predicted value.
**Advantages:**
- Less sensitive to outliers compared to MSE.
- Provides a linear error term, which can be easier to interpret.
**Disadvantages:**
- Not differentiable at zero, which can complicate optimization.
**Python Implementation:**
```python
import numpy as np
def mean_absolute_error(y_true, y_pred):
n = len(y_true)
return np.mean(np.abs(y_true - y_pred))
```
### 3. Cross-Entropy Loss (Binary)
**Explanation:**
Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
**Mathematical Formulation:**
For binary classification, the cross-entropy loss is defined as:
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
Where:
- `n` is the number of samples.
- $y_i$ is the actual class label (0 or 1).
- $\hat{y}_i$ is the predicted probability of the positive class.
**Advantages:**
- Penalizes confident wrong predictions heavily.
- Suitable for probabilistic outputs.
**Disadvantages:**
- Sensitive to class imbalance.
**Python Implementation:**
```python
import numpy as np
def binary_cross_entropy(y_true, y_pred):
n = len(y_true)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
```
### 4. Cross-Entropy Loss (Multiclass)
**Explanation:**
For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
**Mathematical Formulation:**
The multiclass cross-entropy loss is defined as:
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
Where:
- `n` is the number of samples.
- `C` is the number of classes.
- $y_{i,c}$ is the indicator function for the true class of sample `i`.
- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`.
**Advantages:**
- Handles multiple classes effectively.
- Encourages the model to assign high probabilities to the correct classes.
**Disadvantages:**
- Requires one-hot encoding for class labels, which can increase computational complexity.
**Python Implementation:**
```python
import numpy as np
def categorical_cross_entropy(y_true, y_pred):
n = len(y_true)
return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
```
### 5. Hinge Loss (SVM)
**Explanation:**
Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
**Mathematical Formulation:**
For binary classification, the hinge loss is defined as:
$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
Where:
- `n` is the number of samples.
- $y_i$ is the actual class label (-1 or 1).
- $\hat{y}_i$ is the predicted score for sample \( i \).
**Advantages:**
- Encourages margin maximization in SVMs.
- Robust to outliers due to the linear penalty.
**Disadvantages:**
- Not differentiable at the margin, which can complicate optimization.
**Python Implementation:**
```python
import numpy as np
def hinge_loss(y_true, y_pred):
n = len(y_true)
loss = np.maximum(0, 1 - y_true * y_pred)
return np.mean(loss)
```
### 6. Huber Loss
**Explanation:**
Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors.
**Mathematical Formulation:**
The Huber loss is defined as:
$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
\begin{array}{ll}
\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
\end{array}
\right.$$
Where:
- `n` is the number of samples.
- $\delta$ is a threshold parameter.
**Advantages:**
- Provides a smooth loss function.
- Less sensitive to outliers than MSE.
**Disadvantages:**
- Requires tuning of the threshold parameter.
**Python Implementation:**
```python
import numpy as np
def huber_loss(y_true, y_pred, delta):
error = y_true - y_pred
loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta))
return np.mean(loss)
```
### 7. Log-Cosh Loss
**Explanation:**
Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
**Mathematical Formulation:**
The Log-Cosh loss is defined as:
$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
Where:
- `n` is the number of samples.
**Advantages:**
- Smooth and differentiable everywhere.
- Less sensitive to outliers.
**Disadvantages:**
- Computationally more expensive than simple losses like MSE.
**Python Implementation:**
```python
import numpy as np
def logcosh_loss(y_true, y_pred):
error = y_true - y_pred
loss = np.log(np.cosh(error))
return np.mean(loss)
```
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.

Wyświetl plik

@ -1,16 +1,18 @@
# List of sections
- [Binomial Distribution](binomial_distribution.md)
- [Regression in Machine Learning](Regression.md)
- [Introduction to scikit-learn](sklearn-introduction.md)
- [Binomial Distribution](binomial-distribution.md)
- [Regression in Machine Learning](regression.md)
- [Confusion Matrix](confusion-matrix.md)
- [Decision Tree Learning](Decision-Tree.md)
- [Decision Tree Learning](decision-tree.md)
- [Random Forest](random-forest.md)
- [Support Vector Machine Algorithm](support-vector-machine.md)
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
- [Artificial Neural Network from the Ground Up](ann.md)
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
- [TensorFlow.md](tensorFlow.md)
- [TensorFlow.md](tensorflow.md)
- [PyTorch.md](pytorch.md)
- [Types of optimizers](Types_of_optimizers.md)
- [Types of optimizers](types-of-optimizers.md)
- [Logistic Regression](logistic-regression.md)
- [Types_of_Cost_Functions](cost-functions.md)
- [Clustering](clustering.md)
- [Grid Search](grid-search.md)

Wyświetl plik

@ -0,0 +1,144 @@
# scikit-learn (sklearn) Python Library
## Overview
scikit-learn, also known as sklearn, is a popular open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on NumPy, SciPy, and matplotlib. The library is designed to interoperate with the Python numerical and scientific libraries.
## Key Features
- **Classification**: Identifying which category an object belongs to. Example algorithms include SVM, nearest neighbors, random forest.
- **Regression**: Predicting a continuous-valued attribute associated with an object. Example algorithms include support vector regression (SVR), ridge regression, Lasso.
- **Clustering**: Automatic grouping of similar objects into sets. Example algorithms include k-means, spectral clustering, mean-shift.
- **Dimensionality Reduction**: Reducing the number of random variables to consider. Example algorithms include PCA, feature selection, non-negative matrix factorization.
- **Model Selection**: Comparing, validating, and choosing parameters and models. Example methods include grid search, cross-validation, metrics.
- **Preprocessing**: Feature extraction and normalization.
## When to Use scikit-learn
- **Use scikit-learn if**:
- You are working on machine learning tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
- You need an easy-to-use, well-documented library.
- You require tools that are compatible with NumPy and SciPy.
- **Do not use scikit-learn if**:
- You need to perform deep learning tasks. In such cases, consider using TensorFlow or PyTorch.
- You need out-of-the-box support for large-scale data. scikit-learn is designed to work with in-memory data, so for very large datasets, you might want to consider libraries like Dask-ML.
## Installation
You can install scikit-learn using pip:
```bash
pip install scikit-learn
```
Or via conda:
```bash
conda install scikit-learn
```
## Basic Usage with Code Snippets
### Importing the Library
```python
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
```
### Loading Data
For illustration, let's create a simple synthetic dataset:
```python
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
```
### Splitting Data
Split the dataset into training and testing sets:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
### Preprocessing
Standardizing the features:
```python
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
### Training a Model
Train a Logistic Regression model:
```python
model = LogisticRegression()
model.fit(X_train, y_train)
```
### Making Predictions
Make predictions on the test set:
```python
y_pred = model.predict(X_test)
```
### Evaluating the Model
Evaluate the accuracy of the model:
```python
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
```
### Putting it All Together
Here is a complete example from data loading to model evaluation:
```python
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Preprocess data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
```
## Conclusion
scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively.

Wyświetl plik

@ -1,6 +1,7 @@
# List of sections
- [Pandas Introduction and Dataframes in Pandas](introduction.md)
- [Viewing data in pandas](viewing-data.md)
- [Pandas Series Vs NumPy ndarray](pandas-series-vs-numpy-ndarray.md)
- [Pandas Descriptive Statistics](descriptive-statistics.md)
- [Group By Functions with Pandas](groupby-functions.md)

Wyświetl plik

@ -0,0 +1,67 @@
# Viewing rows of the frame
## `head()` method
The pandas library in Python provides a convenient method called `head()` that allows you to view the first few rows of a DataFrame. Let me explain how it works:
- The `head()` function returns the first n rows of a DataFrame or Series.
- By default, it displays the first 5 rows, but you can specify a different number of rows using the n parameter.
### Syntax
```python
dataframe.head(n)
```
`n` is the Optional value. The number of rows to return. Default value is `5`.
### Example
```python
import pandas as pd
df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion','tiger','rabit','dog','fox','monkey','elephant']})
df.head(n=5)
```
#### Output
```
animal
0 alligator
1 bee
2 falcon
3 lion
4 tiger
```
## `tail()` method
The `tail()` function in Python displays the last five rows of the dataframe by default. It takes in a single parameter: the number of rows. We can use this parameter to display the number of rows of our choice.
- The `tail()` function returns the last n rows of a DataFrame or Series.
- By default, it displays the last 5 rows, but you can specify a different number of rows using the n parameter.
### Syntax
```python
dataframe.tail(n)
```
`n` is the Optional value. The number of rows to return. Default value is `5`.
### Example
```python
import pandas as pd
df = pd.DataFrame({'fruits': ['mongo', 'orange', 'apple', 'lemon','banana','water melon','papaya','grapes','cherry','coconut']})
df.tail(n=5)
```
#### Output
```
fruits
5 water melon
6 papaya
7 grapes
8 cherry
9 coconut
```

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 1.2 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 28 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 14 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 16 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 22 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 19 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 53 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 14 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 18 KiB

Wyświetl plik

@ -4,3 +4,6 @@
- [Introducing Matplotlib](matplotlib-introduction.md)
- [Bar Plots in Matplotlib](matplotlib-bar-plots.md)
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
- [Line Charts in Matplotlib](matplotlib-line-plots.md)
- [Introduction to Seaborn and Installation](seaborn-intro.md)
- [Getting started with Seaborn](seaborn-basics.md)

Wyświetl plik

@ -0,0 +1,278 @@
# Line Chart in Matplotlib
A line chart is a simple way to visualize data where we connect individual data points. It helps us to see trends and patterns over time or across categories.
This type of chart is particularly useful for:
- Comparing Data: Comparing multiple datasets on the same axes.
- Highlighting Changes: Illustrating changes and patterns in data.
- Visualizing Trends: Showing trends over time or other continuous variables.
## Prerequisites
Line plots can be created in Python with Matplotlib's `pyplot` library. To build a line plot, first import `matplotlib`. It is a standard convention to import Matplotlib's pyplot library as `plt`.
```python
import matplotlib.pyplot as plt
```
## Creating a simple Line Plot
First import matplotlib and numpy, these are useful for charting.
You can use the `plot(x,y)` method to create a line chart.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
print(x)
y = 2*x + 1
plt.plot(x, y)
plt.show()
```
When executed, this will show the following line plot:
![Basic line Chart](images/simple_line.png)
## Curved line
The `plot()` method also works for other types of line charts. It doesnt need to be a straight line, y can have any type of values.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y = 2**x + 1
plt.plot(x, y)
plt.show()
```
When executed, this will show the following Curved line plot:
![Curved line](images/line-curve.png)
## Line with Labels
To know what you are looking at, you need meta data. Labels are a type of meta data. They show what the chart is about. The chart has an `x label`, `y label` and `title`.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y1 = 2*x + 1
y2 = 2**x + 1
plt.figure()
plt.plot(x, y1)
plt.xlabel("I am x")
plt.ylabel("I am y")
plt.title("With Labels")
plt.show()
```
When executed, this will show the following line with labels plot:
![line with labels](images/line-labels.png)
## Multiple lines
More than one line can be in the plot. To add another line, just call the `plot(x,y)` function again. In the example below we have two different values for `y(y1,y2)` that are plotted onto the chart.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y1 = 2*x + 1
y2 = 2**x + 1
plt.figure(num = 3, figsize=(8, 5))
plt.plot(x, y2)
plt.plot(x, y1,
color='red',
linewidth=1.0,
linestyle='--'
)
plt.show()
```
When executed, this will show the following Multiple lines plot:
![multiple lines](images/two-lines.png)
## Dotted line
Lines can be in the form of dots like the image below. Instead of calling `plot(x,y)` call the `scatter(x,y)` method. The `scatter(x,y)` method can also be used to (randomly) plot points onto the chart.
```python
import matplotlib.pyplot as plt
import numpy as np
n = 1024
X = np.random.normal(0, 1, n)
Y = np.random.normal(0, 1, n)
T = np.arctan2(X, Y)
plt.scatter(np.arange(5), np.arange(5))
plt.xticks(())
plt.yticks(())
plt.show()
```
When executed, this will show the following Dotted line plot:
![dotted lines](images/dot-line.png)
## Line ticks
You can change the ticks on the plot. Set them on the `x-axis`, `y-axis` or even change their color. The line can be more thick and have an alpha value.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y = 2*x - 1
plt.figure(figsize=(12, 8))
plt.plot(x, y, color='r', linewidth=10.0, alpha=0.5)
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['bottom'].set_position(('data', 0))
ax.spines['left'].set_position(('data', 0))
for label in ax.get_xticklabels() + ax.get_yticklabels():
label.set_fontsize(12)
label.set_bbox(dict(facecolor='y', edgecolor='None', alpha=0.7))
plt.show()
```
When executed, this will show the following line ticks plot:
![line ticks](images/line-ticks.png)
## Line with asymptote
An asymptote can be added to the plot. To do that, use `plt.annotate()`. Theres lso a dotted line in the plot below. You can play around with the code to see how it works.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y1 = 2*x + 1
y2 = 2**x + 1
plt.figure(figsize=(12, 8))
plt.plot(x, y2)
plt.plot(x, y1, color='red', linewidth=1.0, linestyle='--')
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['bottom'].set_position(('data', 0))
ax.spines['left'].set_position(('data', 0))
x0 = 1
y0 = 2*x0 + 1
plt.scatter(x0, y0, s = 66, color = 'b')
plt.plot([x0, x0], [y0, 0], 'k-.', lw= 2.5)
plt.annotate(r'$2x+1=%s$' %
y0,
xy=(x0, y0),
xycoords='data',
xytext=(+30, -30),
textcoords='offset points',
fontsize=16,
arrowprops=dict(arrowstyle='->',connectionstyle='arc3,rad=.2')
)
plt.text(0, 3,
r'$This\ is\ a\ good\ idea.\ \mu\ \sigma_i\ \alpha_t$',
fontdict={'size':16,'color':'r'})
plt.show()
```
When executed, this will show the following Line with asymptote plot:
![Line with asymptote](images/line-asymptote.png)
## Line with text scale
It doesnt have to be a numeric scale. The scale can also contain textual words like the example below. In `plt.yticks()` we just pass a list with text values. These values are then show against the `y axis`.
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y1 = 2*x + 1
y2 = 2**x + 1
plt.figure(num = 3, figsize=(8, 5))
plt.plot(x, y2)
plt.plot(x, y1,
color='red',
linewidth=1.0,
linestyle='--'
)
plt.xlim((-1, 2))
plt.ylim((1, 3))
new_ticks = np.linspace(-1, 2, 5)
plt.xticks(new_ticks)
plt.yticks([-2, -1.8, -1, 1.22, 3],
[r'$really\ bad$', r'$bad$', r'$normal$', r'$good$', r'$readly\ good$'])
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['bottom'].set_position(('data', 0))
ax.spines['left'].set_position(('data', 0))
plt.show()
```
When executed, this will show the following Line with text scale plot:
![Line with text scale](images/line-with-text-scale.png)

Wyświetl plik

@ -0,0 +1,39 @@
Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.
Heres an example of what seaborn can do:
```Python
# Import seaborn
import seaborn as sns
# Apply the default theme
sns.set_theme()
# Load an example dataset
tips = sns.load_dataset("tips")
# Create a visualization
sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
```
Below is the output for the above code snippet:
![Seaborn intro image](images/seaborn-basics1.png)
```Python
# Load an example dataset
tips = sns.load_dataset("tips")
```
Most code in the docs will use the `load_dataset()` function to get quick access to an example dataset. Theres nothing special about these datasets: they are just pandas data frames, and we could have loaded them with `pandas.read_csv()` or build them by hand. Many users specify data using pandas data frames, but Seaborn is very flexible about the data structures that it accepts.
```Python
# Create a visualization
sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
```
This plot shows the relationship between five variables in the tips dataset using a single call to the seaborn function `relplot()`. Notice how only the names of the variables and their roles in the plot are provided. Unlike when using matplotlib directly, it wasnt necessary to specify attributes of the plot elements in terms of the color values or marker codes. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that Matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib.

Wyświetl plik

@ -0,0 +1,41 @@
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
## Seaborn Installation
Before installing Matplotlib, ensure you have Python installed on your system. You can download and install Python from the [official Python website](https://www.python.org/).
Below are the steps to install and setup Seaborn:
1. Open your terminal or command prompt and run the following command to install Seaborn using `pip`:
```bash
pip install seaborn
```
2. The basic invocation of `pip` will install seaborn and, if necessary, its mandatory dependencies. It is possible to include optional dependencies that give access to a few advanced features:
```bash
pip install seaborn[stats]
```
3. The library is also included as part of the Anaconda distribution, and it can be installed with `conda`:
```bash
conda install seaborn
```
4. As the main Anaconda repository can be slow to add new releases, you may prefer using the conda-forge channel:
```bash
conda install seaborn -c conda-forge
```
## Dependencies
### Supported Python versions
- Python 3.8+
### Mandatory Dependencies
- [numpy](https://numpy.org/)
- [pandas](https://pandas.pydata.org/)
- [matplotlib](https://matplotlib.org/)
### Optional Dependencies
- [statsmodels](https://www.statsmodels.org/stable/index.html) for advanced regression plots
- [scipy](https://scipy.org/) for clustering matrices and some advanced options
- [fastcluster](https://pypi.org/project/fastcluster/) for faster clustering of large matrices