Merge branch 'main' into main
|
@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along
|
|||
- Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping)
|
||||
- API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development)
|
||||
- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms)
|
||||
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects)
|
||||
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank)
|
||||
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)**
|
||||
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)**
|
||||
|
||||
You can check out some content ideas below.
|
||||
|
||||
|
|
|
@ -0,0 +1,192 @@
|
|||
# Exception Handling in Python
|
||||
|
||||
Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user.
|
||||
|
||||
- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute.
|
||||
- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing.
|
||||
|
||||
## Python Built-in Exceptions
|
||||
|
||||
There are plenty of built-in exceptions in Python that are raised when a corresponding error occur.
|
||||
We can view all the built-in exceptions using the built-in `local()` function as follows:
|
||||
|
||||
```python
|
||||
print(dir(locals()['__builtins__']))
|
||||
```
|
||||
|
||||
|**S.No**|**Exception**|**Description**|
|
||||
|---|---|---|
|
||||
|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.|
|
||||
|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.|
|
||||
|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.|
|
||||
|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.|
|
||||
|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.|
|
||||
|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)|
|
||||
|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.|
|
||||
|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.|
|
||||
|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.|
|
||||
|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.|
|
||||
|
||||
## Try and Except Statement - Catching Exception
|
||||
|
||||
The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust.
|
||||
|
||||
Here's an example to explain this:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise an exception
|
||||
result = 10 / 0
|
||||
except:
|
||||
print("An error occured!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured!
|
||||
```
|
||||
|
||||
In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it.
|
||||
|
||||
## Specific Exception Handling
|
||||
|
||||
You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types.
|
||||
|
||||
Here's an example:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise ZeroDivisionError or NameError
|
||||
result = 10 / 0
|
||||
name = undefined_variable
|
||||
except ZeroDivisionError:
|
||||
print("Oops! You tried to divide by zero.")
|
||||
except NameError:
|
||||
print("There's a variable named 'undefined_variable' that hasn't been defined yet.")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
Oops! You tried to divide by zero.
|
||||
```
|
||||
|
||||
If you comment on the line `result = 10 / 0`, then the output will be:
|
||||
|
||||
```markdown
|
||||
There's a variable named 'undefined_variable' that hasn't been defined yet.
|
||||
```
|
||||
|
||||
## Important Note
|
||||
|
||||
In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise ZeroDivisionError or NameError
|
||||
result = 10 / 0
|
||||
name = undefined_variable
|
||||
except (ZeroDivisionError, NameError):
|
||||
print("An error occured!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured!
|
||||
```
|
||||
|
||||
## Try with Else Clause
|
||||
|
||||
The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block.
|
||||
|
||||
Here's an example to understand this:
|
||||
|
||||
```python
|
||||
def calculate_average(numbers):
|
||||
if len(numbers) == 0: # Handle empty list case seperately (optional)
|
||||
return None
|
||||
try:
|
||||
total = sum(numbers)
|
||||
average = total / len(numbers)
|
||||
except ZeroDivisionError:
|
||||
print("Cannot calculate average for a list containing zero.")
|
||||
else:
|
||||
print("The average is:", average)
|
||||
return average #Optionally return the average here
|
||||
|
||||
# Example usage
|
||||
numbers = [10, 20, 30]
|
||||
result = calculate_average(numbers)
|
||||
|
||||
if result is not None: # Check if result is available (handles empty list case)
|
||||
print("Calculation succesfull!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
The average is: 20.0
|
||||
```
|
||||
|
||||
## Finally Keyword in Python
|
||||
|
||||
The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not.
|
||||
|
||||
To understand this, let us take an example:
|
||||
|
||||
```python
|
||||
try:
|
||||
a = 10 // 0
|
||||
print(a)
|
||||
except ZeroDivisionError:
|
||||
print("Cannot be divided by zero.")
|
||||
finally:
|
||||
print("Program executed!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
Cannot be divided by zero.
|
||||
Program executed!
|
||||
```
|
||||
|
||||
## Raise Keyword in Python
|
||||
|
||||
In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception.
|
||||
|
||||
Let us take an example:
|
||||
|
||||
```python
|
||||
def divide(x, y):
|
||||
if y == 0:
|
||||
raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message
|
||||
result = x / y
|
||||
return result
|
||||
|
||||
try:
|
||||
division_result = divide(10, 0)
|
||||
print("Result:", division_result)
|
||||
except ZeroDivisionError as e:
|
||||
print("An error occured:", e) # Handle the exception and print the message
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured: Can't divide by zero!
|
||||
```
|
||||
|
||||
## Advantages of Exception Handling
|
||||
|
||||
- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience.
|
||||
- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses.
|
||||
- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed.
|
||||
|
||||
## Disadvantages of Exception Handling
|
||||
|
||||
- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling.
|
||||
- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications.
|
||||
- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations.
|
|
@ -8,3 +8,4 @@
|
|||
- [JSON module](json-module.md)
|
||||
- [Map Function](map-function.md)
|
||||
- [Protocols](protocols.md)
|
||||
- [Exception Handling in Python](exception-handling.md)
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# List of sections
|
||||
|
||||
- [API Methods](api-methods.md)
|
||||
- [FastAPI](fast-api.md)
|
||||
- [FastAPI](fast-api.md)
|
||||
|
|
|
@ -10,3 +10,6 @@
|
|||
- [Greedy Algorithms](greedy-algorithms.md)
|
||||
- [Dynamic Programming](dynamic-programming.md)
|
||||
- [Linked list](linked-list.md)
|
||||
- [Stacks in Python](stacks.md)
|
||||
- [Sliding Window Technique](sliding-window.md)
|
||||
- [Trie](trie.md)
|
||||
|
|
|
@ -0,0 +1,249 @@
|
|||
# Sliding Window Technique
|
||||
|
||||
The sliding window technique is a fundamental approach used to solve problems involving arrays, lists, or sequences. It's particularly useful when you need to calculate something over a subarray or sublist of fixed size that slides over the entire array.
|
||||
|
||||
In easy words, It is the transformation of the nested loops into the single loop
|
||||
## Concept
|
||||
|
||||
The sliding window technique involves creating a window (a subarray or sublist) that moves or "slides" across the entire array. This window can either be fixed in size or dynamically resized. By maintaining and updating this window as it moves, you can optimize certain computations, reducing time complexity.
|
||||
|
||||
## Types of Sliding Windows
|
||||
|
||||
1. **Fixed Size Window**: The window size remains constant as it slides from the start to the end of the array.
|
||||
2. **Variable Size Window**: The window size can change based on certain conditions, such as the sum of elements within the window meeting a specified target.
|
||||
|
||||
## Steps to Implement a Sliding Window
|
||||
|
||||
1. **Initialize the Window**: Set the initial position of the window and any required variables (like sum, count, etc.).
|
||||
2. **Expand the Window**: Add the next element to the window and update the relevant variables.
|
||||
3. **Shrink the Window**: If needed, remove elements from the start of the window and update the variables.
|
||||
4. **Slide the Window**: Move the window one position to the right by including the next element and possibly excluding the first element.
|
||||
5. **Repeat**: Continue expanding, shrinking, and sliding the window until you reach the end of the array.
|
||||
|
||||
## Example Problems
|
||||
|
||||
### 1. Maximum Sum Subarray of Fixed Size K
|
||||
|
||||
Given an array of integers and an integer k, find the maximum sum of a subarray of size k.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Initialize the sum of the first k elements.
|
||||
2. Slide the window from the start of the array to the end, updating the sum by subtracting the element that is left behind and adding the new element.
|
||||
3. Track the maximum sum encountered.
|
||||
|
||||
**Python Code:**
|
||||
|
||||
```python
|
||||
def max_sum_subarray(arr, k):
|
||||
n = len(arr)
|
||||
if n < k:
|
||||
return None
|
||||
|
||||
# Compute the sum of the first window
|
||||
window_sum = sum(arr[:k])
|
||||
max_sum = window_sum
|
||||
|
||||
# Slide the window from start to end
|
||||
for i in range(n - k):
|
||||
window_sum = window_sum - arr[i] + arr[i + k]
|
||||
max_sum = max(max_sum, window_sum)
|
||||
|
||||
return max_sum
|
||||
|
||||
# Example usage:
|
||||
arr = [1, 3, 2, 5, 1, 1, 6, 2, 8, 5]
|
||||
k = 3
|
||||
print(max_sum_subarray(arr, k)) # Output: 16
|
||||
```
|
||||
|
||||
### 2. Longest Substring Without Repeating Characters
|
||||
|
||||
Given a string, find the length of the longest substring without repeating characters.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Use two pointers to represent the current window.
|
||||
2. Use a set to track characters in the current window.
|
||||
3. Expand the window by moving the right pointer.
|
||||
4. If a duplicate character is found, shrink the window by moving the left pointer until the duplicate is removed.
|
||||
|
||||
**Python Code:**
|
||||
|
||||
```python
|
||||
def longest_unique_substring(s):
|
||||
n = len(s)
|
||||
char_set = set()
|
||||
left = 0
|
||||
max_length = 0
|
||||
|
||||
for right in range(n):
|
||||
while s[right] in char_set:
|
||||
char_set.remove(s[left])
|
||||
left += 1
|
||||
char_set.add(s[right])
|
||||
max_length = max(max_length, right - left + 1)
|
||||
|
||||
return max_length
|
||||
|
||||
# Example usage:
|
||||
s = "abcabcbb"
|
||||
print(longest_unique_substring(s)) # Output: 3
|
||||
```
|
||||
## 3. Minimum Size Subarray Sum
|
||||
|
||||
Given an array of positive integers and a positive integer `s`, find the minimal length of a contiguous subarray of which the sum is at least `s`. If there isn't one, return 0 instead.
|
||||
|
||||
### Steps:
|
||||
1. Use two pointers, `left` and `right`, to define the current window.
|
||||
2. Expand the window by moving `right` and adding `arr[right]` to `current_sum`.
|
||||
3. If `current_sum` is greater than or equal to `s`, update `min_length` and shrink the window from the left by moving `left` and subtracting `arr[left]` from `current_sum`.
|
||||
4. Repeat until `right` has traversed the array.
|
||||
|
||||
### Python Code:
|
||||
```python
|
||||
def min_subarray_len(s, arr):
|
||||
n = len(arr)
|
||||
left = 0
|
||||
current_sum = 0
|
||||
min_length = float('inf')
|
||||
|
||||
for right in range(n):
|
||||
current_sum += arr[right]
|
||||
|
||||
while current_sum >= s:
|
||||
min_length = min(min_length, right - left + 1)
|
||||
current_sum -= arr[left]
|
||||
left += 1
|
||||
|
||||
return min_length if min_length != float('inf') else 0
|
||||
|
||||
# Example usage:
|
||||
arr = [2, 3, 1, 2, 4, 3]
|
||||
s = 7
|
||||
print(min_subarray_len(s, arr)) # Output: 2 (subarray [4, 3])
|
||||
```
|
||||
|
||||
## 4. Longest Substring with At Most K Distinct Characters
|
||||
|
||||
Given a string `s` and an integer `k`, find the length of the longest substring that contains at most `k` distinct characters.
|
||||
|
||||
### Steps:
|
||||
1. Use two pointers, `left` and `right`, to define the current window.
|
||||
2. Use a dictionary `char_count` to count characters in the window.
|
||||
3. Expand the window by moving `right` and updating `char_count`.
|
||||
4. If `char_count` has more than `k` distinct characters, shrink the window from the left by moving `left` and updating `char_count`.
|
||||
5. Keep track of the maximum length of the window with at most `k` distinct characters.
|
||||
|
||||
### Python Code:
|
||||
```python
|
||||
def longest_substring_k_distinct(s, k):
|
||||
n = len(s)
|
||||
char_count = {}
|
||||
left = 0
|
||||
max_length = 0
|
||||
|
||||
for right in range(n):
|
||||
char_count[s[right]] = char_count.get(s[right], 0) + 1
|
||||
|
||||
while len(char_count) > k:
|
||||
char_count[s[left]] -= 1
|
||||
if char_count[s[left]] == 0:
|
||||
del char_count[s[left]]
|
||||
left += 1
|
||||
|
||||
max_length = max(max_length, right - left + 1)
|
||||
|
||||
return max_length
|
||||
|
||||
# Example usage:
|
||||
s = "eceba"
|
||||
k = 2
|
||||
print(longest_substring_k_distinct(s, k)) # Output: 3 (substring "ece")
|
||||
```
|
||||
|
||||
## 5. Maximum Number of Vowels in a Substring of Given Length
|
||||
|
||||
Given a string `s` and an integer `k`, return the maximum number of vowel letters in any substring of `s` with length `k`.
|
||||
|
||||
### Steps:
|
||||
1. Use a sliding window of size `k`.
|
||||
2. Keep track of the number of vowels in the current window.
|
||||
3. Expand the window by adding the next character and update the count if it's a vowel.
|
||||
4. If the window size exceeds `k`, remove the leftmost character and update the count if it's a vowel.
|
||||
5. Track the maximum number of vowels found in any window of size `k`.
|
||||
|
||||
### Python Code:
|
||||
```python
|
||||
def max_vowels(s, k):
|
||||
vowels = set('aeiou')
|
||||
max_vowel_count = 0
|
||||
current_vowel_count = 0
|
||||
|
||||
for i in range(len(s)):
|
||||
if s[i] in vowels:
|
||||
current_vowel_count += 1
|
||||
if i >= k:
|
||||
if s[i - k] in vowels:
|
||||
current_vowel_count -= 1
|
||||
max_vowel_count = max(max_vowel_count, current_vowel_count)
|
||||
|
||||
return max_vowel_count
|
||||
|
||||
# Example usage:
|
||||
s = "abciiidef"
|
||||
k = 3
|
||||
print(max_vowels(s, k)) # Output: 3 (substring "iii")
|
||||
```
|
||||
|
||||
## 6. Subarray Product Less Than K
|
||||
|
||||
Given an array of positive integers `nums` and an integer `k`, return the number of contiguous subarrays where the product of all the elements in the subarray is less than `k`.
|
||||
|
||||
### Steps:
|
||||
1. Use two pointers, `left` and `right`, to define the current window.
|
||||
2. Expand the window by moving `right` and multiplying `product` by `nums[right]`.
|
||||
3. If `product` is greater than or equal to `k`, shrink the window from the left by moving `left` and dividing `product` by `nums[left]`.
|
||||
4. For each position of `right`, the number of valid subarray ending at `right` is `right - left + 1`.
|
||||
5. Sum these counts to get the total number of subarray with product less than `k`.
|
||||
|
||||
### Python Code:
|
||||
```python
|
||||
def num_subarray_product_less_than_k(nums, k):
|
||||
if k <= 1:
|
||||
return 0
|
||||
|
||||
product = 1
|
||||
left = 0
|
||||
count = 0
|
||||
|
||||
for right in range(len(nums)):
|
||||
product *= nums[right]
|
||||
|
||||
while product >= k:
|
||||
product /= nums[left]
|
||||
left += 1
|
||||
|
||||
count += right - left + 1
|
||||
|
||||
return count
|
||||
|
||||
# Example usage:
|
||||
nums = [10, 5, 2, 6]
|
||||
k = 100
|
||||
print(num_subarray_product_less_than_k(nums, k)) # Output: 8
|
||||
```
|
||||
|
||||
## Advantages
|
||||
|
||||
- **Efficiency**: Reduces the time complexity from O(n^2) to O(n) for many problems.
|
||||
- **Simplicity**: Provides a straightforward way to manage subarrays/substrings with overlapping elements.
|
||||
|
||||
## Applications
|
||||
|
||||
- Finding the maximum or minimum sum of subarrays of fixed size.
|
||||
- Detecting unique elements in a sequence.
|
||||
- Solving problems related to dynamic programming with fixed constraints.
|
||||
- Efficiently managing and processing streaming data or real-time analytics.
|
||||
|
||||
By using the sliding window technique, you can tackle a wide range of problems in a more efficient manner.
|
|
@ -0,0 +1,116 @@
|
|||
# Stacks in Python
|
||||
|
||||
In Data Structures and Algorithms, a stack is a linear data structure that complies with the Last In, First Out (LIFO) rule. It works by use of two fundamental techniques: **PUSH** which inserts an element on top of the stack and **POP** which takes out the topmost element.This concept is similar to a stack of plates in a cafeteria. Stacks are usually used for handling function calls, expression evaluation, and parsing in programming. Indeed, they are efficient in managing memory as well as tracking program state.
|
||||
|
||||
## Points to be Remebered
|
||||
|
||||
- A stack is a collection of data items that can be accessed at only one end, called **TOP**.
|
||||
- Items can be inserted and deleted in a stack only at the TOP.
|
||||
- The last item inserted in a stack is the first one to be deleted.
|
||||
- Therefore, a stack is called a **Last-In-First-Out (LIFO)** data structure.
|
||||
|
||||
## Real Life Examples of Stacks
|
||||
|
||||
- **PILE OF BOOKS** - Suppose a set of books are placed one over the other in a pile. When you remove books from the pile, the topmost book will be removed first. Similarly, when you have to add a book to the pile, the book will be placed at the top of the file.
|
||||
|
||||
- **PILE OF PLATES** - The first plate begins the pile. The second plate is placed on the top of the first plate and the third plate is placed on the top of the second plate, and so on. In general, if you want to add a plate to the pile, you can keep it on the top of the pile. Similarly, if you want to remove a plate, you can remove the plate from the top of the pile.
|
||||
|
||||
- **BANGLES IN A HAND** - When a person wears bangles, the last bangle worn is the first one to be removed.
|
||||
|
||||
## Applications of Stacks
|
||||
|
||||
Stacks are widely used in Computer Science:
|
||||
|
||||
- Function call management
|
||||
- Maintaining the UNDO list for the application
|
||||
- Web browser *history management*
|
||||
- Evaluating expressions
|
||||
- Checking the nesting of parentheses in an expression
|
||||
- Backtracking algorithms (Recursion)
|
||||
|
||||
Understanding these applications is essential for Software Development.
|
||||
|
||||
## Operations on a Stack
|
||||
|
||||
Key operations on a stack include:
|
||||
|
||||
- **PUSH** - It is the process of inserting a new element on the top of a stack.
|
||||
- **OVERFLOW** - A situation when we are pushing an item in a stack that is full.
|
||||
- **POP** - It is the process of deleting an element from the top of a stack.
|
||||
- **UNDERFLOW** - A situation when we are popping item from an empty stack.
|
||||
- **PEEK** - It is the process of getting the most recent value of stack *(i.e. the value at the top of the stack)*
|
||||
- **isEMPTY** - It is the function which return true if stack is empty else false.
|
||||
- **SHOW** -Displaying stack items.
|
||||
|
||||
## Implementing Stacks in Python
|
||||
|
||||
```python
|
||||
def isEmpty(S):
|
||||
if len(S) == 0:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
def Push(S, item):
|
||||
S.append(item)
|
||||
|
||||
def Pop(S):
|
||||
if isEmpty(S):
|
||||
return "Underflow"
|
||||
else:
|
||||
val = S.pop()
|
||||
return val
|
||||
|
||||
def Peek(S):
|
||||
if isEmpty(S):
|
||||
return "Underflow"
|
||||
else:
|
||||
top = len(S) - 1
|
||||
return S[top]
|
||||
|
||||
def Show(S):
|
||||
if isEmpty(S):
|
||||
print("Sorry, No items in Stack")
|
||||
else:
|
||||
print("(Top)", end=' ')
|
||||
t = len(S) - 1
|
||||
while t >= 0:
|
||||
print(S[t], "<", end=' ')
|
||||
t -= 1
|
||||
print()
|
||||
|
||||
stack = [] # initially stack is empty
|
||||
|
||||
Push(stack, 5)
|
||||
Push(stack, 10)
|
||||
Push(stack, 15)
|
||||
|
||||
print("Stack after Push operations:")
|
||||
Show(stack)
|
||||
print("Peek operation:", Peek(stack))
|
||||
print("Pop operation:", Pop(stack))
|
||||
print("Stack after Pop operation:")
|
||||
Show(stack)
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
```markdown
|
||||
Stack after Push operations:
|
||||
|
||||
(Top) 15 < 10 < 5 <
|
||||
|
||||
Peek operation: 15
|
||||
|
||||
Pop operation: 15
|
||||
|
||||
Stack after Pop operation:
|
||||
|
||||
(Top) 10 < 5 <
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
- **Worst case**: `O(n)` This occurs when the stack is full, it is dominated by the usage of Show operation.
|
||||
- **Best case**: `O(1)` When the operations like isEmpty, Push, Pop and Peek are used, they have a constant time complexity of O(1).
|
||||
- **Average case**: `O(n)` The average complexity is likely to be lower than O(n), as the stack is not always full.
|
|
@ -0,0 +1,152 @@
|
|||
# Trie
|
||||
|
||||
A Trie is a tree-like data structure used for storing a dynamic set of strings where the keys are usually strings. It is also known as prefix tree or digital tree.
|
||||
|
||||
>Trie is a type of search tree, where each node represents a single character of a string.
|
||||
|
||||
>Nodes are linked in such a way that they form a tree, where each path from the root to a leaf node represents a unique string stored in the Trie.
|
||||
|
||||
## Characteristics of Trie
|
||||
- **Prefix Matching**: Tries are particularly useful for prefix matching operations. Any node in the Trie represents a common prefix of all strings below it.
|
||||
- **Space Efficiency**: Tries can be more space-efficient than other data structures like hash tables for storing large sets of strings with common prefixes.
|
||||
- **Time Complexity**: Insertion, deletion, and search operations in a Trie have a time complexity of
|
||||
𝑂(𝑚), where m is the length of the string. This makes Tries very efficient for these operations.
|
||||
|
||||
## Structure of Trie
|
||||
|
||||
Trie mainly consists of three parts:
|
||||
- **Root**: The root of a Trie is an empty node that does not contain any character.
|
||||
- **Edges**: Each edge in the Trie represents a character in the alphabet of the stored strings.
|
||||
- **Nodes**: Each node contains a character and possibly additional information, such as a boolean flag indicating if the node represents the end of a valid string.
|
||||
|
||||
To implement the nodes of trie. We use Classes in Python. Each node is an object of the Node Class.
|
||||
|
||||
Node Class have mainly two components
|
||||
- *Array of size 26*: It is used to represent the 26 alphabets. Initially all are None. While inserting the words, then array will be filled with object of child nodes.
|
||||
- *End of word*: It is used to represent the end of word while inserting.
|
||||
|
||||
Code Block of Node Class :
|
||||
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self):
|
||||
self.alphabets = [None] * 26
|
||||
self.end_of_word = 0
|
||||
```
|
||||
|
||||
Now we need to implement Trie. We create another class named Trie with some methods like Insertion, Searching and Deletion.
|
||||
|
||||
**Initialization:** In this, we initializes the Trie with a `root` node.
|
||||
|
||||
Code Implementation of Initialization:
|
||||
|
||||
```python
|
||||
class Trie:
|
||||
def __init__(self):
|
||||
self.root = Node()
|
||||
```
|
||||
|
||||
## Operations on Trie
|
||||
|
||||
1. **Insertion**: Inserts the word into the Trie. This method takes `word` as parameter. For each character in the word, it checks if there is a corresponding child node. If not, it creates a new `Node`. After processing all the characters in word, it increments the `end_of_word` value of the last node.
|
||||
|
||||
Code Implementation of Insertion:
|
||||
```python
|
||||
def insert(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
if not node.alphabets[index]:
|
||||
node.alphabets[index] = Node()
|
||||
node = node.alphabets[index]
|
||||
node.end_of_word += 1
|
||||
```
|
||||
|
||||
2. **Searching**: Search the `word` in trie. Searching process starts from the `root` node. Each character of the `word` is processed. After traversing the whole word in trie, it return the count of words.
|
||||
|
||||
There are two cases in Searching:
|
||||
- *Word Not found*: It happens when the word we search not present in the trie. This case will occur, if the value of `alphabets` array at that character is `None` or if the value of `end_of_word` of the node, reached after traversing the whole word is `0`.
|
||||
- *Word found*: It happens when the search word is present in the Trie. This case will occur, when the `end_of_word` value is greater than `0` of the node after traversing the whole word.
|
||||
|
||||
Code Implementation of Searching:
|
||||
```python
|
||||
def Search(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
if not node.alphabets[index]:
|
||||
return 0
|
||||
node = node.alphabets[index]
|
||||
return node.end_of_word
|
||||
```
|
||||
|
||||
3. **Deletion**: To delete a string, follow the path of the string. If the end node is reached and `end_of_word` is greater than `0` then decrement the value.
|
||||
|
||||
Code Implementation of Deletion:
|
||||
|
||||
```python
|
||||
def delete(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
node = node.alphabets[index]
|
||||
if node.end_of_word:
|
||||
node.end_of_word-=1
|
||||
```
|
||||
|
||||
Python Code to implement Trie:
|
||||
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self):
|
||||
self.alphabets = [None] * 26
|
||||
self.end_of_word = 0
|
||||
|
||||
class Trie:
|
||||
def __init__(self):
|
||||
self.root = Node()
|
||||
|
||||
def insert(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
if not node.alphabets[index]:
|
||||
node.alphabets[index] = Node()
|
||||
node = node.alphabets[index]
|
||||
node.end_of_word += 1
|
||||
|
||||
def Search(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
if not node.alphabets[index]:
|
||||
return 0
|
||||
node = node.alphabets[index]
|
||||
return node.end_of_word
|
||||
|
||||
def delete(self, word):
|
||||
node = self.root
|
||||
for char in word:
|
||||
index = ord(char) - ord('a')
|
||||
node = node.alphabets[index]
|
||||
if node.end_of_word:
|
||||
node.end_of_word-=1
|
||||
|
||||
if __name__ == "__main__":
|
||||
trie = Trie()
|
||||
|
||||
word1 = "apple"
|
||||
word2 = "app"
|
||||
word3 = "bat"
|
||||
|
||||
trie.insert(word1)
|
||||
trie.insert(word2)
|
||||
trie.insert(word3)
|
||||
|
||||
print(trie.Search(word1))
|
||||
print(trie.Search(word2))
|
||||
print(trie.Search(word3))
|
||||
|
||||
trie.delete(word2)
|
||||
print(trie.Search(word2))
|
||||
```
|
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 37 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 32 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 54 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 92 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 96 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 78 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 74 KiB |
|
@ -0,0 +1,96 @@
|
|||
# Clustering
|
||||
|
||||
Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python.
|
||||
|
||||
## Introduction
|
||||
|
||||
Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Centroid
|
||||
|
||||
A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster.
|
||||
|
||||
### Distance Measure
|
||||
|
||||
Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.
|
||||
|
||||
### Inertia
|
||||
|
||||
Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center.
|
||||
|
||||
## Types of Clustering
|
||||
|
||||
1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all.
|
||||
2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership.
|
||||
|
||||
## Clustering Algorithms
|
||||
|
||||
### K-Means Clustering
|
||||
|
||||
K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps:
|
||||
1. Initialize k centroids randomly.
|
||||
2. Assign each data point to the nearest centroid.
|
||||
3. Recalculate the centroids as the mean of all data points assigned to each cluster.
|
||||
4. Repeat steps 2 and 3 until convergence.
|
||||
|
||||
### Hierarchical Clustering
|
||||
|
||||
Hierarchical clustering builds a tree of clusters. There are two types:
|
||||
- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively.
|
||||
- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters.
|
||||
|
||||
### DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
|
||||
|
||||
DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Using Scikit-learn
|
||||
|
||||
Scikit-learn is a popular machine learning library in Python that provides tools for clustering.
|
||||
|
||||
### Code Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.cluster import KMeans
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn.metrics import silhouette_score
|
||||
|
||||
# Load dataset
|
||||
data = pd.read_csv('path/to/your/dataset.csv')
|
||||
|
||||
# Preprocess the data
|
||||
scaler = StandardScaler()
|
||||
data_scaled = scaler.fit_transform(data)
|
||||
|
||||
# Initialize and fit KMeans model
|
||||
kmeans = KMeans(n_clusters=3, random_state=42)
|
||||
kmeans.fit(data_scaled)
|
||||
|
||||
# Get cluster labels
|
||||
labels = kmeans.labels_
|
||||
|
||||
# Calculate silhouette score
|
||||
silhouette_avg = silhouette_score(data_scaled, labels)
|
||||
print("Silhouette Score:", silhouette_avg)
|
||||
|
||||
# Add cluster labels to the original data
|
||||
data['Cluster'] = labels
|
||||
|
||||
print(data.head())
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters.
|
||||
- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters.
|
||||
- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it.
|
||||
- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem.
|
|
@ -0,0 +1,235 @@
|
|||
|
||||
# Cost Functions in Machine Learning
|
||||
|
||||
Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data.
|
||||
|
||||
## Types of Cost Functions
|
||||
|
||||
### 1. Mean Squared Error (MSE)
|
||||
|
||||
**Explanation:**
|
||||
MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
The MSE is defined as:
|
||||
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- $y_i$ is the actual value.
|
||||
- $\hat{y}_i$ is the predicted value.
|
||||
|
||||
**Advantages:**
|
||||
- Sensitive to large errors due to squaring.
|
||||
- Differentiable and convex, facilitating optimization.
|
||||
|
||||
**Disadvantages:**
|
||||
- Sensitive to outliers, as the squared term amplifies their impact.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def mean_squared_error(y_true, y_pred):
|
||||
n = len(y_true)
|
||||
return np.mean((y_true - y_pred) ** 2)
|
||||
```
|
||||
|
||||
### 2. Mean Absolute Error (MAE)
|
||||
|
||||
**Explanation:**
|
||||
MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
The MAE is defined as:
|
||||
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- $y_i$ is the actual value.
|
||||
- $\hat{y}_i$ is the predicted value.
|
||||
|
||||
**Advantages:**
|
||||
- Less sensitive to outliers compared to MSE.
|
||||
- Provides a linear error term, which can be easier to interpret.
|
||||
|
||||
|
||||
**Disadvantages:**
|
||||
- Not differentiable at zero, which can complicate optimization.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def mean_absolute_error(y_true, y_pred):
|
||||
n = len(y_true)
|
||||
return np.mean(np.abs(y_true - y_pred))
|
||||
```
|
||||
|
||||
### 3. Cross-Entropy Loss (Binary)
|
||||
|
||||
**Explanation:**
|
||||
Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
For binary classification, the cross-entropy loss is defined as:
|
||||
|
||||
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
|
||||
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- $y_i$ is the actual class label (0 or 1).
|
||||
- $\hat{y}_i$ is the predicted probability of the positive class.
|
||||
|
||||
|
||||
**Advantages:**
|
||||
- Penalizes confident wrong predictions heavily.
|
||||
- Suitable for probabilistic outputs.
|
||||
|
||||
**Disadvantages:**
|
||||
- Sensitive to class imbalance.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def binary_cross_entropy(y_true, y_pred):
|
||||
n = len(y_true)
|
||||
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
|
||||
```
|
||||
|
||||
### 4. Cross-Entropy Loss (Multiclass)
|
||||
|
||||
**Explanation:**
|
||||
For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
The multiclass cross-entropy loss is defined as:
|
||||
|
||||
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
|
||||
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- `C` is the number of classes.
|
||||
- $y_{i,c}$ is the indicator function for the true class of sample `i`.
|
||||
- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`.
|
||||
|
||||
**Advantages:**
|
||||
- Handles multiple classes effectively.
|
||||
- Encourages the model to assign high probabilities to the correct classes.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires one-hot encoding for class labels, which can increase computational complexity.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def categorical_cross_entropy(y_true, y_pred):
|
||||
n = len(y_true)
|
||||
return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
|
||||
```
|
||||
|
||||
### 5. Hinge Loss (SVM)
|
||||
|
||||
**Explanation:**
|
||||
Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
For binary classification, the hinge loss is defined as:
|
||||
|
||||
$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
|
||||
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- $y_i$ is the actual class label (-1 or 1).
|
||||
- $\hat{y}_i$ is the predicted score for sample \( i \).
|
||||
|
||||
**Advantages:**
|
||||
- Encourages margin maximization in SVMs.
|
||||
- Robust to outliers due to the linear penalty.
|
||||
|
||||
**Disadvantages:**
|
||||
- Not differentiable at the margin, which can complicate optimization.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def hinge_loss(y_true, y_pred):
|
||||
n = len(y_true)
|
||||
loss = np.maximum(0, 1 - y_true * y_pred)
|
||||
return np.mean(loss)
|
||||
```
|
||||
|
||||
### 6. Huber Loss
|
||||
|
||||
**Explanation:**
|
||||
Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
The Huber loss is defined as:
|
||||
|
||||
|
||||
$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
|
||||
\begin{array}{ll}
|
||||
\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
|
||||
\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
|
||||
\end{array}
|
||||
\right.$$
|
||||
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
- $\delta$ is a threshold parameter.
|
||||
|
||||
**Advantages:**
|
||||
- Provides a smooth loss function.
|
||||
- Less sensitive to outliers than MSE.
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires tuning of the threshold parameter.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def huber_loss(y_true, y_pred, delta):
|
||||
error = y_true - y_pred
|
||||
loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta))
|
||||
return np.mean(loss)
|
||||
```
|
||||
|
||||
### 7. Log-Cosh Loss
|
||||
|
||||
**Explanation:**
|
||||
Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
|
||||
|
||||
**Mathematical Formulation:**
|
||||
|
||||
The Log-Cosh loss is defined as:
|
||||
|
||||
$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
|
||||
|
||||
Where:
|
||||
- `n` is the number of samples.
|
||||
|
||||
**Advantages:**
|
||||
- Smooth and differentiable everywhere.
|
||||
- Less sensitive to outliers.
|
||||
|
||||
**Disadvantages:**
|
||||
- Computationally more expensive than simple losses like MSE.
|
||||
|
||||
**Python Implementation:**
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
def logcosh_loss(y_true, y_pred):
|
||||
error = y_true - y_pred
|
||||
loss = np.log(np.cosh(error))
|
||||
return np.mean(loss)
|
||||
```
|
||||
|
||||
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
|
|
@ -254,4 +254,4 @@ The final decision tree classifies instances based on the following rules:
|
|||
- If Outlook is Rain and Wind is Weak, PlayTennis is Yes
|
||||
- If Outlook is Rain and Wind is Strong, PlayTennis is No
|
||||
|
||||
> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used.
|
||||
> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used.
|
|
@ -1,13 +1,18 @@
|
|||
# List of sections
|
||||
|
||||
- [Binomial Distribution](binomial_distribution.md)
|
||||
- [Regression in Machine Learning](Regression.md)
|
||||
- [Introduction to scikit-learn](sklearn-introduction.md)
|
||||
- [Binomial Distribution](binomial-distribution.md)
|
||||
- [Regression in Machine Learning](regression.md)
|
||||
- [Confusion Matrix](confusion-matrix.md)
|
||||
- [Decision Tree Learning](Decision-Tree.md)
|
||||
- [Decision Tree Learning](decision-tree.md)
|
||||
- [Random Forest](random-forest.md)
|
||||
- [Support Vector Machine Algorithm](support-vector-machine.md)
|
||||
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
|
||||
- [TensorFlow.md](tensorFlow.md)
|
||||
- [Artificial Neural Network from the Ground Up](ann.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow.md](tensorflow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [Types of optimizers](Types_of_optimizers.md)
|
||||
- [Types of optimizers](types-of-optimizers.md)
|
||||
- [Logistic Regression](logistic-regression.md)
|
||||
- [Types_of_Cost_Functions](cost-functions.md)
|
||||
- [Clustering](clustering.md)
|
||||
- [Grid Search](grid-search.md)
|
||||
|
|
|
@ -0,0 +1,225 @@
|
|||
# Understanding Convolutional Neural Networks (CNN)
|
||||
|
||||
## Introduction
|
||||
Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images.
|
||||
|
||||
### Why CNNs are Superior to Traditional Neural Networks
|
||||
1. **Localized Receptive Fields**: CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks.
|
||||
2. **Parameter Sharing**: In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting.
|
||||
3. **Translation Invariance**: Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image.
|
||||
4. **Hierarchical Feature Learning**: CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming.
|
||||
|
||||
### Use Cases of CNNs
|
||||
- **Image Classification**: Identifying objects within an image (e.g., classifying a picture as containing a cat or a dog).
|
||||
- **Object Detection**: Detecting and locating objects within an image (e.g., finding faces in a photo).
|
||||
- **Image Segmentation**: Partitioning an image into segments or regions (e.g., dividing an image into different objects and background).
|
||||
- **Medical Imaging**: Analyzing medical scans like MRI, CT, and X-rays for diagnosis.
|
||||
|
||||
> This guide will walk you through the fundamentals of CNNs and their implementation in Python. We'll build a simple CNN from scratch, explaining each component to help you understand how CNNs process images and extract features.
|
||||
|
||||
### Let's start by understanding the basic architecture of CNNs.
|
||||
|
||||
## CNN Architecture
|
||||
Convolution layers, pooling layers, and fully connected layers are just a few of the many building blocks that CNNs use to automatically and adaptively learn spatial hierarchies of information through backpropagation.
|
||||
|
||||
### Convolutional Layer
|
||||
The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field but extend through the full depth of the input volume.
|
||||
|
||||
#### Input Shape
|
||||
The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images).
|
||||

|
||||
|
||||
- The input matrix is a binary image of handwritten digits,
|
||||
where '1' marks the pixels containing the digit (ink/grayscale area) and '0' marks the background pixels (empty space).
|
||||
- The first matrix shows the represnetation of 1 and 0, which can be depicted as a vertical line and a closed loop.
|
||||
- The second matrix represents 9, combining the loop and line.
|
||||
|
||||
#### Strides
|
||||
The step size with which the filter moves across the input image.
|
||||

|
||||
|
||||
- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2).
|
||||
- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix.
|
||||
- However, with larger strides (like 3 or 2 in this example), the filter may not cover all elements, potentially missing some information.
|
||||
- While this might seem like a drawback, higher strides are often used to reduce computational cost and decrease the output size, which can be beneficial in speeding up the training process and preventing overfitting.
|
||||
|
||||
#### Padding
|
||||
Determines whether the output size is the same as the input size ('same') or reduced ('valid').
|
||||

|
||||
|
||||
- `Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features.
|
||||
- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions.
|
||||
- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size
|
||||
|
||||
#### Filters
|
||||
Small matrices that slide over the input data to extract features.
|
||||

|
||||
|
||||
- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'.
|
||||
- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes.
|
||||
- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'.
|
||||
|
||||
#### Output
|
||||
A set of feature maps that represent the presence of different features in the input.
|
||||

|
||||
|
||||
- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features.
|
||||
- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage.
|
||||
- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps.
|
||||
- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$
|
||||
where,
|
||||
n<sub>in</sub> = number of input features
|
||||
p = padding
|
||||
k = kernel size
|
||||
s = stride
|
||||
|
||||
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
|
||||
where,
|
||||
n<sub>c</sub> = number of input channels
|
||||
k x k = kernel size
|
||||
f = number of filters
|
||||
an additional f is added for bias
|
||||
|
||||
### Pooling Layer
|
||||
Pooling layers reduce the dimensionality of each feature map while retaining the most critical information. The most common form of pooling is max pooling.
|
||||
- **Input Shape:** The dimensions of the feature map from the convolutional layer.
|
||||
- **Pooling Size:** The size of the pooling window (e.g., 2x2).
|
||||
- **Strides:** The step size for the pooling operation.
|
||||
- **Output:** A reduced feature map highlighting the most important features.
|
||||
<div align='center'>
|
||||
<img src='assets/cnn-pooling.png' width='800'></img>
|
||||
</div>
|
||||
|
||||
- The high values (8) indicate that the "closed loop" filter found a strong match in those regions.
|
||||
- First matrix of size 6x4 represents a downsampled version of the input.
|
||||
- While the second matrix with 3x2, resulting in more aggressive downsampling.
|
||||
|
||||
### Flatten Layer
|
||||
The flatten layer converts the 2D matrix data to a 1D vector, which can be fed into a fully connected (dense) layer.
|
||||
- **Input Shape:** The 2D feature maps from the previous layer.
|
||||
- **Output:** A 1D vector that represents the same data in a flattened format.
|
||||

|
||||
|
||||
### Dropout Layer
|
||||
Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting a fraction of input units to zero at each update during training time.
|
||||
- **Input Shape:** The data from the previous layer.
|
||||
- **Dropout Rate:** The fraction of units to drop (e.g., 0.5 for 50% dropout).
|
||||
- **Output:** The same shape as the input, with some units set to zero.
|
||||

|
||||
|
||||
- The updated 0 values represents the dropped units.
|
||||
|
||||
## Implementation
|
||||
|
||||
Below is the implementation of a simple CNN in Python. Each function within the `CNN` class corresponds to a layer in the network.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
class CNN:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def convLayer(self, input_shape, channels, strides, padding, filter_size):
|
||||
height, width = input_shape
|
||||
input_shape_with_channels = (height, width, channels)
|
||||
print("Input Shape (with channels):", input_shape_with_channels)
|
||||
|
||||
# Generate random input and filter matrices
|
||||
input_matrix = np.random.randint(0, 10, size=input_shape_with_channels)
|
||||
filter_matrix = np.random.randint(0, 5, size=(filter_size[0], filter_size[1], channels))
|
||||
|
||||
print("\nInput Matrix:\n", input_matrix[:, :, 0])
|
||||
print("\nFilter Matrix:\n", filter_matrix[:, :, 0])
|
||||
|
||||
padding = padding.lower()
|
||||
|
||||
if padding == 'same':
|
||||
# Calculate padding needed for each dimension
|
||||
pad_height = filter_size[0] // 2
|
||||
pad_width = filter_size[1] // 2
|
||||
|
||||
# Apply padding to the input matrix
|
||||
input_matrix = np.pad(input_matrix, ((pad_height, pad_height), (pad_width, pad_width), (0, 0)), mode='constant')
|
||||
|
||||
# Adjust height and width to consider the padding
|
||||
height += 2 * pad_height
|
||||
width += 2 * pad_width
|
||||
|
||||
elif padding == 'valid':
|
||||
pass
|
||||
|
||||
else:
|
||||
return "Invalid Padding!!"
|
||||
|
||||
# Output dimensions
|
||||
conv_height = (height - filter_size[0]) // strides[0] + 1
|
||||
conv_width = (width - filter_size[1]) // strides[1] + 1
|
||||
output_matrix = np.zeros((conv_height, conv_width, channels))
|
||||
|
||||
# Convolution Operation
|
||||
for i in range(0, height - filter_size[0] + 1, strides[0]):
|
||||
for j in range(0, width - filter_size[1] + 1, strides[1]):
|
||||
receptive_field = input_matrix[i:i + filter_size[0], j:j + filter_size[1], :]
|
||||
output_matrix[i // strides[0], j // strides[1], :] = np.sum(receptive_field * filter_matrix, axis=(0, 1, 2))
|
||||
|
||||
return output_matrix
|
||||
|
||||
def maxPooling(self, input_matrix, pool_size=(2, 2), strides_pooling=(2, 2)):
|
||||
input_height, input_width, input_channels = input_matrix.shape
|
||||
pool_height, pool_width = pool_size
|
||||
stride_height, stride_width = strides_pooling
|
||||
|
||||
# Calculate output dimensions
|
||||
pooled_height = (input_height - pool_height) // stride_height + 1
|
||||
pooled_width = (input_width - pool_width) // stride_width + 1
|
||||
|
||||
# Initialize output
|
||||
pooled_matrix = np.zeros((pooled_height, pooled_width, input_channels))
|
||||
|
||||
# Perform max pooling
|
||||
for c in range(input_channels):
|
||||
for i in range(0, input_height - pool_height + 1, stride_height):
|
||||
for j in range(0, input_width - pool_width + 1, stride_width):
|
||||
patch = input_matrix[i:i + pool_height, j:j + pool_width, c]
|
||||
pooled_matrix[i // stride_height, j // stride_width, c] = np.max(patch)
|
||||
|
||||
return pooled_matrix
|
||||
|
||||
def flatten(self, input_matrix):
|
||||
return input_matrix.flatten()
|
||||
|
||||
def dropout(self, input_matrix, dropout_rate=0.5):
|
||||
assert 0 <= dropout_rate < 1, "Dropout rate must be in [0, 1)."
|
||||
dropout_mask = np.random.binomial(1, 1 - dropout_rate, size=input_matrix.shape)
|
||||
return input_matrix * dropout_mask
|
||||
```
|
||||
|
||||
Run the below command to generate output with random input and filter matrices, depending on the given size.
|
||||
|
||||
```python
|
||||
input_shape = (5, 5)
|
||||
channels = 1
|
||||
strides = (1, 1)
|
||||
padding = 'valid'
|
||||
filter_size = (3, 3)
|
||||
|
||||
cnn_model = CNN()
|
||||
|
||||
conv_output = cnn_model.convLayer(input_shape, channels, strides, padding, filter_size)
|
||||
print("\nConvolution Output:\n", conv_output[:, :, 0])
|
||||
|
||||
pool_size = (2, 2)
|
||||
strides_pooling = (1, 1)
|
||||
|
||||
maxPool_output = cnn_model.maxPooling(conv_output, pool_size, strides_pooling)
|
||||
print("\nMax Pooling Output:\n", maxPool_output[:, :, 0])
|
||||
|
||||
flattened_output = cnn_model.flatten(maxPool_output)
|
||||
print("\nFlattened Output:\n", flattened_output)
|
||||
|
||||
dropout_output = cnn_model.dropout(flattened_output, dropout_rate=0.3)
|
||||
print("\nDropout Output:\n", dropout_output)
|
||||
```
|
||||
|
||||
Feel free to play around with the parameters!
|
|
@ -0,0 +1,171 @@
|
|||
# Random Forest
|
||||
|
||||
Random Forest is a versatile machine learning algorithm capable of performing both regression and classification tasks. It is an ensemble method that operates by constructing a multitude of decision trees during training and outputting the average prediction of the individual trees (for regression) or the mode of the classes (for classification).
|
||||
|
||||
## Introduction
|
||||
Random Forest is an ensemble learning method used for classification and regression tasks. It is built from multiple decision trees and combines their outputs to improve the model's accuracy and control over-fitting.
|
||||
|
||||
## How Random Forest Works
|
||||
### 1. Bootstrap Sampling:
|
||||
* Random subsets of the training dataset are created with replacement. Each subset is used to train an individual tree.
|
||||
### 2. Decision Trees:
|
||||
* Multiple decision trees are trained on these subsets.
|
||||
### 3. Feature Selection:
|
||||
* At each split in the decision tree, a random selection of features is chosen. This randomness helps create diverse trees.
|
||||
### 4. Voting/Averaging:
|
||||
For classification, the mode of the classes predicted by individual trees is taken (majority vote).
|
||||
For regression, the average of the outputs of the individual trees is taken.
|
||||
### Detailed Working Mechanism
|
||||
#### Step 1: Bootstrap Sampling:
|
||||
Each tree is trained on a random sample of the original data, drawn with replacement (bootstrap sample). This means some data points may appear multiple times in a sample while others may not appear at all.
|
||||
#### Step 2: Tree Construction:
|
||||
Each node in the tree is split using the best split among a random subset of the features. This process adds an additional layer of randomness, contributing to the robustness of the model.
|
||||
#### Step 3: Aggregation:
|
||||
For classification tasks, the final prediction is based on the majority vote from all the trees. For regression tasks, the final prediction is the average of all the tree predictions.
|
||||
### Advantages and Disadvantages
|
||||
#### Advantages
|
||||
* Robustness: Reduces overfitting and generalizes well due to the law of large numbers.
|
||||
* Accuracy: Often provides high accuracy because of the ensemble method.
|
||||
* Versatility: Can be used for both classification and regression tasks.
|
||||
* Handles Missing Values: Can handle missing data better than many other algorithms.
|
||||
* Feature Importance: Provides estimates of feature importance, which can be valuable for understanding the model.
|
||||
#### Disadvantages
|
||||
* Complexity: More complex than individual decision trees, making interpretation difficult.
|
||||
* Computational Cost: Requires more computational resources due to multiple trees.
|
||||
* Training Time: Can be slow to train compared to simpler models, especially with large datasets.
|
||||
### Hyperparameters
|
||||
#### Key Hyperparameters
|
||||
* n_estimators: The number of trees in the forest.
|
||||
* max_features: The number of features to consider when looking for the best split.
|
||||
* max_depth: The maximum depth of the tree.
|
||||
* min_samples_split: The minimum number of samples required to split an internal node.
|
||||
* min_samples_leaf: The minimum number of samples required to be at a leaf node.
|
||||
* bootstrap: Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
|
||||
##### Tuning Hyperparameters
|
||||
Hyperparameter tuning can significantly improve the performance of a Random Forest model. Common techniques include Grid Search and Random Search.
|
||||
|
||||
### Code Examples
|
||||
#### Classification Example
|
||||
Below is a simple example of using Random Forest for a classification task with the Iris dataset.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.datasets import load_iris
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.metrics import accuracy_score, classification_report
|
||||
|
||||
|
||||
# Load dataset
|
||||
iris = load_iris()
|
||||
X, y = iris.data, iris.target
|
||||
|
||||
# Split dataset
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
|
||||
# Initialize Random Forest model
|
||||
clf = RandomForestClassifier(n_estimators=100, random_state=42)
|
||||
|
||||
# Train the model
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = clf.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
print("Classification Report:\n", classification_report(y_test, y_pred))
|
||||
|
||||
```
|
||||
|
||||
#### Feature Importance
|
||||
Random Forest provides a way to measure the importance of each feature in making predictions.
|
||||
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Get feature importances
|
||||
importances = clf.feature_importances_
|
||||
indices = np.argsort(importances)[::-1]
|
||||
|
||||
# Print feature ranking
|
||||
print("Feature ranking:")
|
||||
for f in range(X.shape[1]):
|
||||
print(f"{f + 1}. Feature {indices[f]} ({importances[indices[f]]})")
|
||||
|
||||
# Plot the feature importances
|
||||
plt.figure()
|
||||
plt.title("Feature importances")
|
||||
plt.bar(range(X.shape[1]), importances[indices], align='center')
|
||||
plt.xticks(range(X.shape[1]), indices)
|
||||
plt.xlim([-1, X.shape[1]])
|
||||
plt.show()
|
||||
```
|
||||
#### Hyperparameter Tuning
|
||||
Using Grid Search for hyperparameter tuning.
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
|
||||
# Define the parameter grid
|
||||
param_grid = {
|
||||
'n_estimators': [100, 200, 300],
|
||||
'max_features': ['auto', 'sqrt', 'log2'],
|
||||
'max_depth': [4, 6, 8, 10, 12],
|
||||
'criterion': ['gini', 'entropy']
|
||||
}
|
||||
|
||||
# Initialize the Grid Search model
|
||||
grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2)
|
||||
|
||||
# Fit the model
|
||||
grid_search.fit(X_train, y_train)
|
||||
|
||||
# Print the best parameters
|
||||
print("Best parameters found: ", grid_search.best_params_)
|
||||
```
|
||||
#### Regression Example
|
||||
Below is a simple example of using Random Forest for a regression task with the Boston housing dataset.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.datasets import load_boston
|
||||
from sklearn.ensemble import RandomForestRegressor
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.metrics import mean_squared_error, r2_score
|
||||
|
||||
# Load dataset
|
||||
boston = load_boston()
|
||||
X, y = boston.data, boston.target
|
||||
|
||||
# Split dataset
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
|
||||
# Initialize Random Forest model
|
||||
regr = RandomForestRegressor(n_estimators=100, random_state=42)
|
||||
|
||||
# Train the model
|
||||
regr.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = regr.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
mse = mean_squared_error(y_test, y_pred)
|
||||
r2 = r2_score(y_test, y_pred)
|
||||
print(f"Mean Squared Error: {mse:.2f}")
|
||||
print(f"R^2 Score: {r2:.2f}")
|
||||
```
|
||||
## Conclusion
|
||||
Random Forest is a powerful and flexible machine learning algorithm that can handle both classification and regression tasks. Its ability to create an ensemble of decision trees leads to robust and accurate models. However, it is important to be mindful of the computational cost associated with training multiple trees.
|
||||
|
||||
## References
|
||||
Scikit-learn Random Forest Documentation
|
||||
Wikipedia: Random Forest
|
||||
Machine Learning Mastery: Introduction to Random Forest
|
||||
Kaggle: Random Forest Guide
|
||||
Towards Data Science: Understanding Random Forests
|
|
@ -0,0 +1,144 @@
|
|||
# scikit-learn (sklearn) Python Library
|
||||
|
||||
## Overview
|
||||
|
||||
scikit-learn, also known as sklearn, is a popular open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on NumPy, SciPy, and matplotlib. The library is designed to interoperate with the Python numerical and scientific libraries.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Classification**: Identifying which category an object belongs to. Example algorithms include SVM, nearest neighbors, random forest.
|
||||
- **Regression**: Predicting a continuous-valued attribute associated with an object. Example algorithms include support vector regression (SVR), ridge regression, Lasso.
|
||||
- **Clustering**: Automatic grouping of similar objects into sets. Example algorithms include k-means, spectral clustering, mean-shift.
|
||||
- **Dimensionality Reduction**: Reducing the number of random variables to consider. Example algorithms include PCA, feature selection, non-negative matrix factorization.
|
||||
- **Model Selection**: Comparing, validating, and choosing parameters and models. Example methods include grid search, cross-validation, metrics.
|
||||
- **Preprocessing**: Feature extraction and normalization.
|
||||
|
||||
## When to Use scikit-learn
|
||||
|
||||
- **Use scikit-learn if**:
|
||||
- You are working on machine learning tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
|
||||
- You need an easy-to-use, well-documented library.
|
||||
- You require tools that are compatible with NumPy and SciPy.
|
||||
|
||||
- **Do not use scikit-learn if**:
|
||||
- You need to perform deep learning tasks. In such cases, consider using TensorFlow or PyTorch.
|
||||
- You need out-of-the-box support for large-scale data. scikit-learn is designed to work with in-memory data, so for very large datasets, you might want to consider libraries like Dask-ML.
|
||||
|
||||
## Installation
|
||||
|
||||
You can install scikit-learn using pip:
|
||||
|
||||
```bash
|
||||
pip install scikit-learn
|
||||
```
|
||||
|
||||
Or via conda:
|
||||
|
||||
```bash
|
||||
conda install scikit-learn
|
||||
```
|
||||
|
||||
## Basic Usage with Code Snippets
|
||||
|
||||
### Importing the Library
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.metrics import accuracy_score
|
||||
```
|
||||
|
||||
### Loading Data
|
||||
|
||||
For illustration, let's create a simple synthetic dataset:
|
||||
|
||||
```python
|
||||
from sklearn.datasets import make_classification
|
||||
|
||||
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
|
||||
```
|
||||
|
||||
### Splitting Data
|
||||
|
||||
Split the dataset into training and testing sets:
|
||||
|
||||
```python
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
```
|
||||
|
||||
### Preprocessing
|
||||
|
||||
Standardizing the features:
|
||||
|
||||
```python
|
||||
scaler = StandardScaler()
|
||||
X_train = scaler.fit_transform(X_train)
|
||||
X_test = scaler.transform(X_test)
|
||||
```
|
||||
|
||||
### Training a Model
|
||||
|
||||
Train a Logistic Regression model:
|
||||
|
||||
```python
|
||||
model = LogisticRegression()
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Making Predictions
|
||||
|
||||
Make predictions on the test set:
|
||||
|
||||
```python
|
||||
y_pred = model.predict(X_test)
|
||||
```
|
||||
|
||||
### Evaluating the Model
|
||||
|
||||
Evaluate the accuracy of the model:
|
||||
|
||||
```python
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
```
|
||||
|
||||
### Putting it All Together
|
||||
|
||||
Here is a complete example from data loading to model evaluation:
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from sklearn.datasets import make_classification
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.metrics import accuracy_score
|
||||
|
||||
# Load data
|
||||
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
|
||||
|
||||
# Split data
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
|
||||
# Preprocess data
|
||||
scaler = StandardScaler()
|
||||
X_train = scaler.fit_transform(X_train)
|
||||
X_test = scaler.transform(X_test)
|
||||
|
||||
# Train model
|
||||
model = LogisticRegression()
|
||||
model.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = model.predict(X_test)
|
||||
|
||||
# Evaluate model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively.
|
|
@ -61,4 +61,4 @@ TensorFlow is a great choice if you:
|
|||
## Example Use Cases
|
||||
|
||||
- Building and deploying complex neural networks for image recognition, natural language processing, or recommendation systems.
|
||||
- Developing models that need to be run on mobile or embedded devices.
|
||||
- Developing models that need to be run on mobile or embedded devices.
|
|
@ -1,10 +1,11 @@
|
|||
# List of sections
|
||||
|
||||
- [Pandas Introduction and Dataframes in Pandas](introduction.md)
|
||||
- [Pandas Series Vs NumPy ndarray](pandas_series_vs_numpy_ndarray.md)
|
||||
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
|
||||
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
|
||||
- [Excel using Pandas DataFrame](excel_with_pandas.md)
|
||||
- [Viewing data in pandas](viewing-data.md)
|
||||
- [Pandas Series Vs NumPy ndarray](pandas-series-vs-numpy-ndarray.md)
|
||||
- [Pandas Descriptive Statistics](descriptive-statistics.md)
|
||||
- [Group By Functions with Pandas](groupby-functions.md)
|
||||
- [Excel using Pandas DataFrame](excel-with-pandas.md)
|
||||
- [Working with Date & Time in Pandas](datetime.md)
|
||||
- [Importing and Exporting Data in Pandas](import-export.md)
|
||||
- [Handling Missing Values in Pandas](handling-missing-values.md)
|
||||
|
|
|
@ -0,0 +1,67 @@
|
|||
# Viewing rows of the frame
|
||||
|
||||
## `head()` method
|
||||
|
||||
The pandas library in Python provides a convenient method called `head()` that allows you to view the first few rows of a DataFrame. Let me explain how it works:
|
||||
- The `head()` function returns the first n rows of a DataFrame or Series.
|
||||
- By default, it displays the first 5 rows, but you can specify a different number of rows using the n parameter.
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
dataframe.head(n)
|
||||
```
|
||||
|
||||
`n` is the Optional value. The number of rows to return. Default value is `5`.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion','tiger','rabit','dog','fox','monkey','elephant']})
|
||||
df.head(n=5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
animal
|
||||
0 alligator
|
||||
1 bee
|
||||
2 falcon
|
||||
3 lion
|
||||
4 tiger
|
||||
```
|
||||
|
||||
## `tail()` method
|
||||
|
||||
The `tail()` function in Python displays the last five rows of the dataframe by default. It takes in a single parameter: the number of rows. We can use this parameter to display the number of rows of our choice.
|
||||
- The `tail()` function returns the last n rows of a DataFrame or Series.
|
||||
- By default, it displays the last 5 rows, but you can specify a different number of rows using the n parameter.
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
dataframe.tail(n)
|
||||
```
|
||||
|
||||
`n` is the Optional value. The number of rows to return. Default value is `5`.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
df = pd.DataFrame({'fruits': ['mongo', 'orange', 'apple', 'lemon','banana','water melon','papaya','grapes','cherry','coconut']})
|
||||
df.tail(n=5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
fruits
|
||||
5 water melon
|
||||
6 papaya
|
||||
7 grapes
|
||||
8 cherry
|
||||
9 coconut
|
||||
```
|
Po Szerokość: | Wysokość: | Rozmiar: 10 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 1.2 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 14 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 14 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 16 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 22 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 19 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 13 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 53 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 14 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 18 KiB |
|
@ -1,5 +1,9 @@
|
|||
# List of sections
|
||||
|
||||
- [Installing Matplotlib](matplotlib-installation.md)
|
||||
- [Introducing Matplotlib](matplotlib-introduction.md)
|
||||
- [Bar Plots in Matplotlib](matplotlib-bar-plots.md)
|
||||
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
|
||||
- [Line Charts in Matplotlib](matplotlib-line-plots.md)
|
||||
- [Introduction to Seaborn and Installation](seaborn-intro.md)
|
||||
- [Getting started with Seaborn](seaborn-basics.md)
|
||||
|
|
|
@ -0,0 +1,80 @@
|
|||
# Introducing MatplotLib
|
||||
|
||||
Data visualisation is the analysing and understanding the data via graphical representation of the data by the means of pie charts, histograms, scatterplots and line graphs.
|
||||
|
||||
To make this process of data visualization easier and clearer, matplotlib library is used.
|
||||
|
||||
## Features of MatplotLib library
|
||||
- MatplotLib library is one of the most popular python packages for 2D representation of data
|
||||
- Combination of matplotlib and numpy is used for easier computations and visualization of large arrays and data. Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB.
|
||||
|
||||
- Matplotlib has a procedural interface named the Pylab, which is designed to resemble MATLAB. However, it is completely independent of Matlab.
|
||||
|
||||
## Starting with Matplotlib
|
||||
|
||||
### 1. Install and import the neccasary libraries - mayplotlib.pylplot
|
||||
|
||||
```bash
|
||||
pip install matplotlib
|
||||
```
|
||||
|
||||
```python
|
||||
import maptplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
```
|
||||
|
||||
### 2. Scatter plot
|
||||
Scatter plot is a type of plot that uses the cartesian coordinates between x and y to describe the relation between them. It uses dots to represent relation between the data variables of the data set.
|
||||
|
||||
```python
|
||||
x = [5,4,5,8,9,8,6,7,3,2]
|
||||
y = [9,1,7,3,5,7,6,1,2,8]
|
||||
|
||||
plt.scatter(x,y, color = "red")
|
||||
|
||||
plt.title("Scatter plot")
|
||||
plt.xlabel("X values")
|
||||
plt.ylabel("Y values")
|
||||
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
### 3. Bar plot
|
||||
Bar plot is a type of plot that plots the frequency distrubution of the categorical variables. Each entity of the categoric variable is represented as a bar. The size of the bar represents its numeric value.
|
||||
|
||||
```python
|
||||
x = np.array(['A','B','C','D'])
|
||||
y = np.array([42,50,15,35])
|
||||
|
||||
plt.bar(x,y,color = "red")
|
||||
|
||||
plt.title("Bar plot")
|
||||
plt.xlabel("X values")
|
||||
plt.ylabel("Y values")
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
### 4. Histogram
|
||||
Histogram is the representation of frequency distribution of qualitative data. The height of each rectangle defines the amount, or how often that variable appears.
|
||||
|
||||
```python
|
||||
x = [9,1,7,3,5,7,6,1,2,8]
|
||||
|
||||
plt.hist(x, color = "red", edgecolor= "white", bins =5)
|
||||
|
||||
plt.title("Histogram")
|
||||
plt.xlabel("X values")
|
||||
plt.ylabel("Frequency Distribution")
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
|
|
@ -0,0 +1,278 @@
|
|||
# Line Chart in Matplotlib
|
||||
|
||||
A line chart is a simple way to visualize data where we connect individual data points. It helps us to see trends and patterns over time or across categories.
|
||||
|
||||
This type of chart is particularly useful for:
|
||||
- Comparing Data: Comparing multiple datasets on the same axes.
|
||||
- Highlighting Changes: Illustrating changes and patterns in data.
|
||||
- Visualizing Trends: Showing trends over time or other continuous variables.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Line plots can be created in Python with Matplotlib's `pyplot` library. To build a line plot, first import `matplotlib`. It is a standard convention to import Matplotlib's pyplot library as `plt`.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
```
|
||||
|
||||
## Creating a simple Line Plot
|
||||
|
||||
First import matplotlib and numpy, these are useful for charting.
|
||||
|
||||
You can use the `plot(x,y)` method to create a line chart.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
print(x)
|
||||
y = 2*x + 1
|
||||
|
||||
plt.plot(x, y)
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following line plot:
|
||||
|
||||

|
||||
|
||||
|
||||
## Curved line
|
||||
|
||||
The `plot()` method also works for other types of line charts. It doesn’t need to be a straight line, y can have any type of values.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y = 2**x + 1
|
||||
|
||||
plt.plot(x, y)
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Curved line plot:
|
||||
|
||||

|
||||
|
||||
|
||||
## Line with Labels
|
||||
|
||||
To know what you are looking at, you need meta data. Labels are a type of meta data. They show what the chart is about. The chart has an `x label`, `y label` and `title`.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y1 = 2*x + 1
|
||||
y2 = 2**x + 1
|
||||
|
||||
plt.figure()
|
||||
plt.plot(x, y1)
|
||||
|
||||
plt.xlabel("I am x")
|
||||
plt.ylabel("I am y")
|
||||
plt.title("With Labels")
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following line with labels plot:
|
||||
|
||||

|
||||
|
||||
## Multiple lines
|
||||
|
||||
More than one line can be in the plot. To add another line, just call the `plot(x,y)` function again. In the example below we have two different values for `y(y1,y2)` that are plotted onto the chart.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y1 = 2*x + 1
|
||||
y2 = 2**x + 1
|
||||
|
||||
plt.figure(num = 3, figsize=(8, 5))
|
||||
plt.plot(x, y2)
|
||||
plt.plot(x, y1,
|
||||
color='red',
|
||||
linewidth=1.0,
|
||||
linestyle='--'
|
||||
)
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Multiple lines plot:
|
||||
|
||||

|
||||
|
||||
|
||||
## Dotted line
|
||||
|
||||
Lines can be in the form of dots like the image below. Instead of calling `plot(x,y)` call the `scatter(x,y)` method. The `scatter(x,y)` method can also be used to (randomly) plot points onto the chart.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
n = 1024
|
||||
X = np.random.normal(0, 1, n)
|
||||
Y = np.random.normal(0, 1, n)
|
||||
T = np.arctan2(X, Y)
|
||||
|
||||
plt.scatter(np.arange(5), np.arange(5))
|
||||
|
||||
plt.xticks(())
|
||||
plt.yticks(())
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Dotted line plot:
|
||||
|
||||

|
||||
|
||||
## Line ticks
|
||||
|
||||
You can change the ticks on the plot. Set them on the `x-axis`, `y-axis` or even change their color. The line can be more thick and have an alpha value.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y = 2*x - 1
|
||||
|
||||
plt.figure(figsize=(12, 8))
|
||||
plt.plot(x, y, color='r', linewidth=10.0, alpha=0.5)
|
||||
|
||||
ax = plt.gca()
|
||||
|
||||
ax.spines['right'].set_color('none')
|
||||
ax.spines['top'].set_color('none')
|
||||
|
||||
ax.xaxis.set_ticks_position('bottom')
|
||||
ax.yaxis.set_ticks_position('left')
|
||||
|
||||
ax.spines['bottom'].set_position(('data', 0))
|
||||
ax.spines['left'].set_position(('data', 0))
|
||||
|
||||
for label in ax.get_xticklabels() + ax.get_yticklabels():
|
||||
label.set_fontsize(12)
|
||||
label.set_bbox(dict(facecolor='y', edgecolor='None', alpha=0.7))
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following line ticks plot:
|
||||
|
||||

|
||||
|
||||
## Line with asymptote
|
||||
|
||||
An asymptote can be added to the plot. To do that, use `plt.annotate()`. There’s lso a dotted line in the plot below. You can play around with the code to see how it works.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y1 = 2*x + 1
|
||||
y2 = 2**x + 1
|
||||
|
||||
plt.figure(figsize=(12, 8))
|
||||
plt.plot(x, y2)
|
||||
plt.plot(x, y1, color='red', linewidth=1.0, linestyle='--')
|
||||
|
||||
ax = plt.gca()
|
||||
|
||||
ax.spines['right'].set_color('none')
|
||||
ax.spines['top'].set_color('none')
|
||||
|
||||
ax.xaxis.set_ticks_position('bottom')
|
||||
ax.yaxis.set_ticks_position('left')
|
||||
|
||||
ax.spines['bottom'].set_position(('data', 0))
|
||||
ax.spines['left'].set_position(('data', 0))
|
||||
|
||||
|
||||
x0 = 1
|
||||
y0 = 2*x0 + 1
|
||||
|
||||
plt.scatter(x0, y0, s = 66, color = 'b')
|
||||
plt.plot([x0, x0], [y0, 0], 'k-.', lw= 2.5)
|
||||
|
||||
plt.annotate(r'$2x+1=%s$' %
|
||||
y0,
|
||||
xy=(x0, y0),
|
||||
xycoords='data',
|
||||
|
||||
xytext=(+30, -30),
|
||||
textcoords='offset points',
|
||||
fontsize=16,
|
||||
arrowprops=dict(arrowstyle='->',connectionstyle='arc3,rad=.2')
|
||||
)
|
||||
|
||||
plt.text(0, 3,
|
||||
r'$This\ is\ a\ good\ idea.\ \mu\ \sigma_i\ \alpha_t$',
|
||||
fontdict={'size':16,'color':'r'})
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Line with asymptote plot:
|
||||
|
||||

|
||||
|
||||
## Line with text scale
|
||||
|
||||
It doesn’t have to be a numeric scale. The scale can also contain textual words like the example below. In `plt.yticks()` we just pass a list with text values. These values are then show against the `y axis`.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.linspace(-1, 1, 50)
|
||||
y1 = 2*x + 1
|
||||
y2 = 2**x + 1
|
||||
|
||||
plt.figure(num = 3, figsize=(8, 5))
|
||||
plt.plot(x, y2)
|
||||
|
||||
plt.plot(x, y1,
|
||||
color='red',
|
||||
linewidth=1.0,
|
||||
linestyle='--'
|
||||
)
|
||||
|
||||
plt.xlim((-1, 2))
|
||||
plt.ylim((1, 3))
|
||||
|
||||
new_ticks = np.linspace(-1, 2, 5)
|
||||
plt.xticks(new_ticks)
|
||||
plt.yticks([-2, -1.8, -1, 1.22, 3],
|
||||
[r'$really\ bad$', r'$bad$', r'$normal$', r'$good$', r'$readly\ good$'])
|
||||
|
||||
ax = plt.gca()
|
||||
ax.spines['right'].set_color('none')
|
||||
ax.spines['top'].set_color('none')
|
||||
|
||||
ax.xaxis.set_ticks_position('bottom')
|
||||
ax.yaxis.set_ticks_position('left')
|
||||
|
||||
ax.spines['bottom'].set_position(('data', 0))
|
||||
ax.spines['left'].set_position(('data', 0))
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Line with text scale plot:
|
||||
|
||||

|
||||
|
||||
|
|
@ -0,0 +1,39 @@
|
|||
Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.
|
||||
|
||||
Here’s an example of what seaborn can do:
|
||||
```Python
|
||||
# Import seaborn
|
||||
import seaborn as sns
|
||||
|
||||
# Apply the default theme
|
||||
sns.set_theme()
|
||||
|
||||
# Load an example dataset
|
||||
tips = sns.load_dataset("tips")
|
||||
|
||||
# Create a visualization
|
||||
sns.relplot(
|
||||
data=tips,
|
||||
x="total_bill", y="tip", col="time",
|
||||
hue="smoker", style="smoker", size="size",
|
||||
)
|
||||
```
|
||||
Below is the output for the above code snippet:
|
||||
|
||||

|
||||
|
||||
```Python
|
||||
# Load an example dataset
|
||||
tips = sns.load_dataset("tips")
|
||||
```
|
||||
Most code in the docs will use the `load_dataset()` function to get quick access to an example dataset. There’s nothing special about these datasets: they are just pandas data frames, and we could have loaded them with `pandas.read_csv()` or build them by hand. Many users specify data using pandas data frames, but Seaborn is very flexible about the data structures that it accepts.
|
||||
|
||||
```Python
|
||||
# Create a visualization
|
||||
sns.relplot(
|
||||
data=tips,
|
||||
x="total_bill", y="tip", col="time",
|
||||
hue="smoker", style="smoker", size="size",
|
||||
)
|
||||
```
|
||||
This plot shows the relationship between five variables in the tips dataset using a single call to the seaborn function `relplot()`. Notice how only the names of the variables and their roles in the plot are provided. Unlike when using matplotlib directly, it wasn’t necessary to specify attributes of the plot elements in terms of the color values or marker codes. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that Matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib.
|
|
@ -0,0 +1,41 @@
|
|||
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
|
||||
|
||||
## Seaborn Installation
|
||||
Before installing Matplotlib, ensure you have Python installed on your system. You can download and install Python from the [official Python website](https://www.python.org/).
|
||||
|
||||
Below are the steps to install and setup Seaborn:
|
||||
|
||||
1. Open your terminal or command prompt and run the following command to install Seaborn using `pip`:
|
||||
|
||||
```bash
|
||||
pip install seaborn
|
||||
```
|
||||
|
||||
2. The basic invocation of `pip` will install seaborn and, if necessary, its mandatory dependencies. It is possible to include optional dependencies that give access to a few advanced features:
|
||||
```bash
|
||||
pip install seaborn[stats]
|
||||
```
|
||||
|
||||
3. The library is also included as part of the Anaconda distribution, and it can be installed with `conda`:
|
||||
```bash
|
||||
conda install seaborn
|
||||
```
|
||||
|
||||
4. As the main Anaconda repository can be slow to add new releases, you may prefer using the conda-forge channel:
|
||||
```bash
|
||||
conda install seaborn -c conda-forge
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
### Supported Python versions
|
||||
- Python 3.8+
|
||||
|
||||
### Mandatory Dependencies
|
||||
- [numpy](https://numpy.org/)
|
||||
- [pandas](https://pandas.pydata.org/)
|
||||
- [matplotlib](https://matplotlib.org/)
|
||||
|
||||
### Optional Dependencies
|
||||
- [statsmodels](https://www.statsmodels.org/stable/index.html) for advanced regression plots
|
||||
- [scipy](https://scipy.org/) for clustering matrices and some advanced options
|
||||
- [fastcluster](https://pypi.org/project/fastcluster/) for faster clustering of large matrices
|